31. LVS: Squid Realservers (poor man's L7 switch)

One of the first uses for LVS was to increase throughput of webcaches. A 550MHz PIII director can handle 120Mbps throughput.

A scheduler (-dh = destination hash) specially designed for webcaches is described in the section on ipvsadm and schedulers is in LVS derived from code posted to the LVS mailing list by Thomas Proell (about Oct 2000).

This section was written by andreas (dot) koenig (at) anima (dot) de Andreas J. Koenig and was posted to the mailing list.

An often lamented shortcoming of LVS clusters is that the realservers have to be configured to work identically. Thus if you want to build up a service with many servers that need to be configured differently for some reason, you cannot take advantage of the powerful LVS.

The following describes an LVS topology where not all servers in the pool of available servers are configured identical and where loadbalancing is content-based.

The goal is achieved by combining the features of Squid and LVS. The workhorses are running Apache, but any HTTP server would do.

31.1. Terminology

Before we start we need to introduce a bit of Squid terminology. A redirector (http://www.squid-cache.org/Doc/FAQ/FAQ-15.html) is a director that examines the URL and request method of an HTTP and is enabled to change the URL in any way it needs. An accelerator (http://www.squid-cache.org/Doc/FAQ/FAQ-20.html) plays the role os a buffer and cache. The accelerator handles a relatively big amount of slow connections to the clients on the internet with a relativly small amount of memory. It passes requests through to any number of back-end servers. It can be configured to cache the results of the back-end servers according to the HTTP headers.

31.2. Preview

In the following example installation, we will realize this configuration (real IP addresses anonymized):



Squid2      # Same box as Webserver2

Webserver2   # Same box as Squid2

Note that a squid and a webserver can coexist in a single box, that's why we have put Squid2 and Webserver7 into a single machine.

Note also that squids can cache webserver's output and thus reduce the work for them. We dedicate 24 GB disk to caching in Squid1 and 6 GB disk in Squid2.

And finally note that several squids can exchange digest information about cached data if they want. We haven't yet configured for this.

Strictly speaking, a single squid can take the role of an LVSdirector, but only for HTTP. It's slower, but it works. By accessing one of the squids in our setup directly, this can be easily demonstrated.

31.3. Let's start assembling

I'd suggest, the first thing to do is to setup the four apache on Webserver1..4. These servers are the working horses for the whole cluster. They are not what LVS terminology calls realservers though. The realservers according to LVS are the Squids.

We configure the apaches completely stardard. The only deviation from a standard installation here is that we specify

    Port 81

in the httpd.conf. Everything else is the default configuration file that comes with apache. In the choice of the port we are, of course, free to choose any port we like. It's an old habit of mine to select 81 if a squid is around to act as accelerator.

We finish this round of assembling with tests that only try to access Webserver1..4 on port 81 directly. For later testing, I recommend to activate the printenv CGI program that comes with Apache:

chmod 755 /usr/local/apache/cgi-bin/printenv

This program shows us, on which server the script is running (SERVER_ADDR) and which server appears as the requesting site (REMOTE_ADDR).

31.4. One squid

Next we should configure one Squid box. The second one will mostly be a replication of the first, so let's first nail that first one down.

When we compile the squid 2.3-STABLE4, we need already decide about compilation options. Personally I like the features associated with this configuration:

./configure --enable-heap-replacement --disable-http-violations \
            --enable-cache-digests    --enable-delay-pools

We can build and install squid with these settings. But before we start squid, we must go through a 2700 lines configuration file and set lots of options. The following is a collection of diffs between the squid.conf.default and my squid.conf with comments in between.

--- squid.conf.default  Mon Aug 14 12:04:33 2000
+++ squid.conf  Mon Aug 14 14:34:35 2000
@@ -47 +47 @@
-#http_port 3128
+http_port 80

Yes, we want this squid on port 80 because from outside it looks like a normal HTTP server.

@@ -54 +54 @@
-#icp_port 3130
+icp_port 0

In the demo installation I turned ICP off, but I'll turn it on again later. ICP is the protocol that the squids can use to exchange sibling information about what they have on their disks.

@@ -373 +373 @@
-#cache_mem  8 MB
+cache_mem 700 MB

This is the memory reserved for holding cache data. We have 1 GB total physical memory and 24 GB disk cache. To manage the disk cache, squid needs about 150 MB of memory (estimate 6 MB per GB for an average object size of 13kB). Once you're running, you can use squid's statistics to find out *your* average object size. I usually leave 1/6 of the memory for the operating system, but at least 100 MB.

@@ -389,2 +389,2 @@
-#cache_swap_low  90
-#cache_swap_high 95
+#cache_swap_low  94
+#cache_swap_high 96
@@ -404 +404 @@
-#maximum_object_size 4096 KB
+maximum_object_size 8192 KB

Please refer to squid's docs for these values.

@@ -463,0 +464,5 @@
+cache_dir ufs /var/squid01 5600 16 256
+cache_dir ufs /var/squid02 5600 16 256
+cache_dir ufs /var/squid03 5600 16 256
+cache_dir ufs /var/squid04 5600 16 256

You do not need bigger disks, you need many disks to speed up squid. Join the squid mailing list to find out about the efficiency of filesystem tuning like "noatime" or Reiser FS.

@@ -660 +665 @@
-#redirect_program none
+redirect_program /usr/local/squid/etc/redirector.pl

This is the meat of our usage of squid. This program can be as simple as you want or as powerful as you want. It can be implemented in any language and it will be run within a pool of daemons. My program is written in perl and looks something like the following:

    while (<>) {
      my($url,$host,$ident,$method) = split;
      my @redir = $url =~ /\bh=([\d,]+);?/ ?
                 split(/,/,$1) : (6,7,8,9); # last components of our IP numbers
      my $redir = $redir[int rand scalar @redir];
      $url =~ s/PLACEHOLDER:81/10.0.0.$redir\:81/i;
      print STDOUT "$url\n";

This is ideal for testing, because it allows me to request a single backend server or a set of backend servers to choose from via the CGI querystring. A request like

will then be served by backend apache

@@ -668 +673 @@
-#redirect_children 5
+redirect_children 10

The more complex the redirector program is, the more processes should be allocated to run it.

@@ -674 +679 @@
-#redirect_rewrites_host_header on
+redirect_rewrites_host_header off
@@ -879 +884 @@
-#replacement_policy LFUDA
+replacement_policy LFUDA
@@ -1168 +1173 @@
-acl Safe_ports port 80 21 443 563 70 210 1025-65535
+acl Safe_ports port 80 81 21 443 563 70 210 1025-65535
@@ -1204 +1209 @@
-http_access deny all
+http_access allow all

For all of the above changes, please refer to the squid.conf.default.

@@ -1370,2 +1375,3 @@
-#httpd_accel_host hostname
-#httpd_accel_port port
+# we will replace www.meta-list.net:81 with our host of choice
+httpd_accel_host PLACEHOLDER
+httpd_accel_port 81

As we are redirecting everything through the redirector, we can fill in anything we want. No real hostname, no real port is needed. The redirector program will have to know what we chose here.

@@ -1377 +1383 @@
-#httpd_accel_with_proxy off
+httpd_accel_with_proxy on

If we want ICP working (and we said, we would like to get it working), we need this turned on.

We're done with our first squid, we can start it and test it. If you send a request to this squid, one of the backend servers will answer according to the redirect policy of the redirector program.

Basically, at this point in time we have a fully working content based redirector. As already mentioned, we do not really need LVS to accomplish this. But the downside of this approach is:

- we are comparatively slow: squid is not famous for speed.

- we do not scale well: if the bottleneck is a the squid, we want LVS to scale up.

31.5. Another squid

So the next step in our demo is to build another squid. This is very trivial given that we have already one. We just copy the whole configuration and adjust a few parameters if there are any differences in the hardware.

31.6. Combining pieces with LVS

The rest of the story is to read the appropriate docs for LVS. I have used Horms's Ultra Monkey docs and there's nothing to be added for this kind of setup. Keep in mind that only the squids are to be known by the LVS box. They are the "realservers" in LVS terminology. The apache back end servers are only known to the squids' redirector program.

31.7. Problems

It has been said that LVS is fast and squid is slow, so people believe, they must implement a level 7 switch in LVS to have it faster. This remains to be proofed.

Squid is really slow compared to some of the HTTP servers that are tuned for speed. If you're serving static content with a hernel HTTP daemon, you definitely do not want to lose the speed by running it through a squid.

If you want persistent connections, you need to implemented them in your redirector. If you want to take dead servers out of the pool, you must implement it in your redirector. If you have a complicated redirector, you need more of them and thus need more ressources.

In the above setup, ldirectord monitors just the two squids. A failure of one of the apaches might go by unnoticed, so you need to do something about this.

If you have not many cacheable data like SSL or things that need to expire immediately or a high fraction of POST requests, the squid seems like a waste of resources. I'd say, in that case you just give it less disk space and memory.

Sites that prove unviewable through Squid are a real problem (Joe Cooper reports there is a stock ticker that doesn't work through squid). If you have contents that cannot be served through a squid, you're in big trouble--and as it seems, on your own.