45. LVS: Misc/FAQ/Wisdom from the mailing list

These topics were too short or not central enough to LVS operation to have their own section.

45.1. Having one director handling multiple LVS sites, Multiple VIPs

Multiple VIPs (and their associated services) can co-exist independantly on an LVS. On the director, add the extra IPs to a device facing the internet. On the realservers, for LVS-DR||VS-Tun, add the VIPs to a device and setup services listening to the ports. On the realservers, for LVS-NAT, add the extra services to the RIP.

Keith Rowland wrote:

Can I use Virtual Server to host multiple domains on the cluster? Can VS be setup to respond to multiple 10-20 different IP addresses and use the clusters to reposnd to any one of them with the proper web directory.

James CE Johnson jjohnson (at) mobsec (dot) com

If I understand the question correctly, then the answer is yes :-) I have one system that has two IP addresses and responds to two names:

  foo.mydomain.com  A.B.C.foo  eth1
  bar.mydomain.com  A.B.C.bar  eth1:0

On that system (kernel 2.0.36 BTW) I have LVS setup as:

  ippfvsadm -A -t A.B.C.foo:80 -R 192.168.42.50:80
  ippfvsadm -A -t A.B.C.bar:80 -R 192.168.42.100:80

To make matters even more confusing, 192.168.42.(50|100) are actually one system where eth0 is 192.168.42.100 and eth0:0 is 192.168.42.50. We'll call that 'node'.

Apache on 'node' is setup to serve foo.mydomain.com on ...100 and bar.mydomain.com on ...50.

It took me a while to sort it out but it all works quite nicely. I can easily move bar.mydomain.com to another node within the cluster by simply changing the ippfvsadm setup on the externally addressable node.

Tao Zhao 6 Nov 2001

what if I need multiple VIPs on the realserver?

Julian Anastasov ja (at) ssi (dot) bg 06 Nov 2001

for i in 180 182 182
do
	ip addr add X.Y.Z.$i dev dummy0
done

There is also an example for setting up multiple VIPs on HA.

45.2. Setting up a fake service on the realserver with inetd

Ratz ratz (at) tac (dot) ch

We're going to set up a LVS cluster from scratch. you need

  • 4 machines (2 realserver, 1 load balancer, 1 client) wired like described in various sketches throughout this howto.
  • fun and some spare time (actually quite some if it doesn't work out the first time like described)

The goal is to set up an loadbalanced tcp application. The application will consist of a own written shell script being invoked by inetd. As you might have guessed, security is very low priority, you should get the idea behind this. Of course I should take xinetd and of course I should use a tcpwrapper and maybe even SecurID authentication but here the goal is to understand the fundamental design principals of a LVS cluster and its deploy. All instructions will be done as root.

Setting up the realserver

Edit /etc/inetd.conf and add following line:
lvs-test        stream  tcp     nowait  root    /usr/bin/lvs-info       lvs-info

Edit /etc/services and add following line:
lvs-test        31337/tcp               # supersecure lvs-test port

Now you need to get inetd running. This is different for every Unix. So please have a look at it yourself. You verify if it's running with 'ps ax|grep [i]netd' And to verify if it really runs this port you do a 'netstat -an|grep LISTEN' and if there is a line:

tcp        0      0 0.0.0.0:31337           0.0.0.0:*               LISTEN

you're one step closer to the truth. Now we have to supply the script that will be called if you connect to realserver# port 31337. So simply do this on your command line (copy 'n' paste):

cat > /usr/bin/lvs-info << EOF && chmod 755 /usr/bin/lvs-info
#!/bin/sh

echo "This is a test of machine `ifconfig -a | grep HWaddr | awk '{print $1}'`"
echo
EOF

Now you can test if it really works with telnet or phatcat:

telnet localhost 31337
phatcat localhost 31337

This should spill out something like:

hog:/ # phatcat localhost 31337
This is a test of machine 192.168.1.11

hog:/ #

If it worked, do the same procedure to set up the second realserver. Now we're ready to set up the load balancer. These are the required commands to set it up for our example:

director:/etc/lvs# ipvsadm -A -t 192.168.1.100:31337 -s wrr
director:/etc/lvs# ipvsadm -a -t 192.168.1.100:31337 -r 192.168.1.11 -g -w 1
director:/etc/lvs# ipvsadm -a -t 192.168.1.100:31337 -r 192.168.1.12 -g -w 1

Check it with ipvsadm -L -n:

hog:~ # ipvsadm -L -n
IP Virtual Server version 0.9.14 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port          Forward Weight ActiveConn InActConn
TCP  192.168.1.100:31337 wrr
  -> 192.168.1.12:31337          Route   1      0          0
  -> 192.168.1.11:31337          Route   1      0          0
hog:~ #

Now if you connect from outside with the client node to the VIP=192.168.1.100 you should get to one of the two realserver (presumably to ~.12) Reconnect to the VIP again an you should get to the other realserver. If so, be happy, if not go back, check netstat -an, ifconfig -a, arp-problem, routing tables and so on ...

45.3. How to bring down a realserver for maintenance (eg swap disks)

I want to use virtual server functionality to allow switching over from one pool of server processes to another without an interruption in service to clients.

Michael Sparks sparks (at) mcc (dot) ac (dot) uk

current realservers : A,B,C servers to swap into the system instead D,E,F

  • Add servers D,E,F into the system all with fairly high weights (perhaps ramping the weights up slowly so as not to hit them too hard:-)
  • Change the weights of servers A,B,C to 0.
  • All new traffic should now go to D,E,F
  • When the number of connections through A,B,C reaches 0, remove them from the service. This can take time I know but...

Joe

A planned feature for ipvsadm will be to give a realserver a weight of 0 (now implemented). This realserver will not be sent any new connections and will continue serving its current connections till they close. You may have to wait a while if a user is downloading a 40M file from the realserver.

"Duran, Richard" RDuran (at) dallasairmotive (dot) com 01 Oct 2004

Is it possible to take a realserver offline in such a way that existing connections are immediately redirected to another realserver? We've had the need to do this before and don't know what else to do beyond either (1) setting the weight to "0" and iterating through a process of disconnecting "inactive" users/sessions and hoping that they don't reconnect within the 5 minute persistence_timeout, or (2) removing the host-specific entry from keepalived.conf (brutally disconnecting everyone).

Joe

If you're talking about transferring an existing tcpip connection: no

Malcolm Turnbull

A brutal disconnect is the usual way to go. ldirectord handles it cleanly

45.4. keepalived: temporarily removing a realserver from view of keepalived; abnormal termination of keepalived

Jacob Smullyan 2006-02-13

It is a tribute to lvs that I've forgotten most of what I once knew about it, because I set up LVS-DR with keepalived about three years ago and it has run without a hiccup ever since. As a result, I'm rusty, so forgive me if this is a stupid or frequently asked question.

How should I go about taking a server temporarily out of rotation? Since I am using keepalived, I know I can simply turn off the service it depends on -- but in fact I want to replace that service with a new application on the same port (which will go live a few minutes/hours later). I am aware of some alternatives:

  • configure keepalived's health check to rely on some aspect on the old application, then swap the configuration when I want to go live with the new application.
  • temporarily run the new application on a different ip.
  • simply comment out that realserver in keepalived.conf temporarily, or add a healthcheck that will never be satisfied.
  • directly delete the ipvsadm config for that server (but what about the backup director, and how if at all will keepalived interfere with that?)
  • get a job more suitable for a dim-witted person like myself.

But all these are workarounds; what I really want is to tell the director (or keepalived), "retain all configuration, but temporarily drop this realserver until I remove the block". Is there a way to do that?

Graeme Fowler graeme (at) graemef (dot) net 13 Feb 2006

On the director(s), assuming you use eth0 as the interface forwarding the packets to the realservers...

iptables -I OUTPUT -o eth0 -s $VIP -d $RIP -j REJECT

That'll stop keepalived doing any healthchecks whatsoever on the realserver you need to work on. Simply replace -I with -D when you're done.

The same thing can be achieved by "null" routing the RIP on the director too, but I'll leave that as an exercise :)

Mark msalists (at) gmx (dot) net 15 Feb 2006

I usually do one of the two:

You can use the commandline ipvsadm-command to get a list of all nodes and manipulate (add/remove) nodes. Use it to remove and later add the node again. This will not influence any other nodes of the configuration. If you get totally lost, just restart ldirectord and it will come back up with the regular configuration.

Or, as second option, use the http negotiation mechanism that uses a string comparison of a certain URL against an expected string pattern. Have it check against a dummy html page and put a flag in there that ldirectord checks against to determine if the host is supposed to be in the pool or not. Modify the flag manually to take the node out.

gastruc (at) steek (dot) com 4 Dec 2008

After an error in keepalived configuration on backup server, it is left with all the VIP addresses

Siim Poder windo (at) p6drad-teel (dot) net 05 Dec 2008

If - for example - you kill keepalived with -9, it will leave all addresses there. And if you restart keepalived later on, it will refuse to manage those addresses - probably for safety (not to touch what it hasn't created). IMO this causes more trouble than good - why would you be touching keepalived-managed addresses manually if you weren't asking for trouble? You can always change the configuration and force keepalived to make the changes you want.

There are/were some bugs with keepalived and reload and it doesn't handle configuration errors well so your problem certanly doesn't suprise me.

When this happens, I usually kill keepalived and remove all the (non-administrative) addresses manually (as you have a bunch, it may be wise to have a script laying around for that). Then restart keepalived to re-add the addresses (and claim ownership of them) or stay in standby if it's the backup node.

45.5. Howto turn your single node ftp/http server into an LVS without taking it off-line

e.g. if you want to test LVS on your BIG Sunserver and how to restore an LVS to a single node server again.

current ftp server:        standalone  A

planned LVS (using LVS-DR): realserver A
		           director    Z

Setup the LVS in the normal way with the director's VIP being a new IP for the network. The IP of the standalone server will now also be the IP for the realserver. You can access the realserver via the VIP while the outside users continue to connect to the original IP of A. When you are happy that the VIP gives the right service, change the DNS IP of your ftp site to the VIP. Over the next 24hrs as the new DNS information is propagated to the outside world, users will change over to the VIP to access the server.

To expand the number of servers (to A, B,...), add another server with duplicated files, add an extra entry into the director's tables with ipvsadm.

To restore - in your DNS, change the IP for the service to the realserver IP. When no-one is accessing the VIP anymore, unplug the director.

45.6. shutdown of LVS

You can't shutdown an LVS. However you can stop it forwarding by clearing the ipvsadm table (ipvsadm -C), then allow all connections to expire (check the active connections with ipvsadm) and then remove the ipvs modules (rmmod). Since ip_vs.o requires ip_vs_rr.o etc, you'll have to remove ip_vs_rr.o first.

Do you know how to shutdown LVS? I tried rmmod but it keeps saying that the device is busy.

Kjetil Torgrim Homme kjetilho (at) linpro (dot) no 18 Aug 2001

Run ipvsadm -C. You also need to remove the module(s) for the balancing algorithm(s) before rmmod ip_vs. Run lsmod to see which modules these are.

Roy Walker Roy (dot) Walker (at) GEZWM (dot) com 18 Mar 2002 could not cleanly shutdown his director (LVS 1.0, 2.4.18) which hung at "Send TERM signal". The suggested cure, was to bring down the LVS first (we haven't heard back if it works).

45.7. Other projects like LVS - Beowulf

The difference between a beowulf and an LVS:

The Beowulf project has to do with processor clustering over a network -- parallel computing... Basically putting 64 nodes up and running that all are a part of a collective of resources. Like SMP -- but between a whole bunch of machines with a fast ethernet as a backplane.

LVS, however, is about load-balancing on a network. Someone puts up a load balancer in front of a cluster of servers. Each one of those servers is independent and knows nothing about the rest of the servers in the farm. All requests for services go to the load balancer first. That load balancer then distributes requests to each server. Those servers respond as if the request came straight to them in the first place. So -- with the more servers one adds -- the less load goes to each server.

A person might go to a web site that is load balanced, and their requests would be balanced between four different machines. (Or perhaps all of their requests would go to one machine, and the next person's request would go to another machine)

However, a person who used a Beowulf system would actually be using one processing collaborative that was made up of multiple computers...

I know that's not the best explanation of each, and I apologize for that, but I hope it at least starts to make things a little clearer. Both projects could be expanded on to a great extent, but that might just confuse things farther.

(Joe) -

both use several (or a lot of) nodes.

A beowulf is a collection of nodes working on a single computation. The computation is broken into small pieces and passed to a node, which replies with the result. Eventually the whole computation is done. THe beowulf usually has a single user and the computations can run for weeks.

An LVS is a group of machines offering a service to a client. A dispatcher connects the client to a particular server for the request. When the request is completed, the dispatcher removes the connection between the client and server. The next request from the same client may go to a different server but the client cannot tell which server it has connected to. The connection between client and server may only be seconds long

from a posting to the beowulf mailing list by Alan Heirich -

Thomas Sterling and Donald Becker made "Beowulf" a registered service mark with specific requirements for use:

-- Beowulf is a cluster
-- the cluster runs Linux
-- the O/S and driver software are open source
-- the CPU is multiple sourced (currently, Intel and Alpha)

I assume they did this to prevent profit-hungry vendors from abusing this term; can't you just imagine Micro$oft pushing a "Beowulf" NT-cluster?

(Joe - I looked up the Registered Service Marks on the internet and Beowulf is not one of them.)

(Wensong) Beowulf is for parallel computing, Linux Virtual Server is for scalable network services.

They are quite different now. However, I think they may be unified under "single system image" some day. In the "single system image", every node can see a single system image (the same memory space, the same process space, the same external storage), and the processes/threads can be transparently migrated to other nodes in order to achieve load balance in the cluster. All the processes are checkpointed, they can be restarted in the node or the others if they fails, full fault tolerant can be made here. It will be easy for programmers to code because of single space, they don't need to statically partition jobs to different sites and let them communicate through PVM or MPI. They just need identify the parallelism of his scientific application, and fork the processes or generate threads, because processes/threads will be automatically load balanced on different nodes. For network services, the service daemons just need to fork the processes or generates threads, it is quite simple. I think it needs lots of investigation in how to implement these mechanisms and make the overhead as low as possible.

What Linux Virtual Server has done is very simple, Single IP Address, in which parallel services on different nodes is appeared as a virtual service on a single IP address. The different nodes have their own space, it is far from "single system image". It means that we have a long way to run. :)

45.8. Projects like LVS - Eddie

Eddie http://www.eddieware.org

(Jacek Kujawa blady (at) cnt (dot) pl) Eddie is a load balancing software, using NAT (only NAT), for webservers, written in language erlang. Eddie include intelligent HTTP gateway and Enhanced DNS.

(Joe) Erlang is a language for writing distrubuted applications.

45.9. Recommendations for a redundant file system, RAID

Shain Miley 4 Jun 2001

any recommendations for Level 5 SCSI RAID?

Matthew S. Crocker matthew (at) crocker (dot) com 04 Jun 2001

I have had very good luck with Mylex. We use the DAC960 which is a bit old now but if the newer stuff works as well as what I have I would highly recommend it. You might also want to think about putting your data on a NAS and seperate your CPU from your harddrives

Don Hinshaw dwh (at) openrecording (dot) com 04 Jun 2001

Mylex work well. I use ICP-Vortex (http://www.icp-vortex.com/index_e.html, link dead Jan 2003) which are supported by the Linux kernel. I've also had good luck with Adaptec 3200s and 3400si.

45.10. on the need for extended testing

(this must have been solved, no-one is complaining about memory leaks now :-)

Jerry Glomph Black black (at) real (dot) com

We have successfully used 2.0.36-vs (direct routing method), but it does fail at extremely high loads. Seems like a cumulative effect, after about a billion or so packets forwarded. Some kind of kernel memory leak, I'd guess.

45.11. Bringing down aliased devices

Note

This is no longer a problem if you use the new Policy Routing.

(without bringing them all down)

Problem: if down/delete an aliased device (eg eth0:1) you also bring down the other eth0 devices. This means that you can't bring down an alias remotely as you loose your connection (eth0) to that machine. You then have to go the console of the remote machine to fix it by rmmod'ing the device driver for the device and bring it up again.

The configure script handles this for you and will exit (with instructions on what to do next) if it finds that an aliased device needs to be removed by rmmod'ing the module for the NIC.

(I'm not sure that all of the following is accurate, please test yourself first).

(Stephen D. WIlliams sdw (at) lig (dot) net) whenever you want to down/delete an alias, first set its netmask to 255.255.255.255. This avoids also automatically downing aliases that are on the same netmask and are considered 'secondaries' by the kernel.

(Joe) To bring up an aliased device

$ifconfig eth0:1 192.168.1.10 netmask 255.255.255.0

to bring eth0:1 down without taking out eth0, you do it in 2 steps, first change the netmask

$ifconfig eth0:1 192.168.1.10 netmask 255.255.255.255

then down it

$ifconfig eth0:1 192.168.1.10 netmask 255.255.255.255 down

then eth0 device should be unaffected, but the eth0:1 device will be gone.

This works on one of my machines but not on another (both with 2.2.13 kernels). I will have to look into this. Here's the output from the machine for which this procedure doesn't work.

Examples: Starting setup. The realserver's regular IP/24 on eth0, the VIP/32 on eth0:1 and another IP/24 for illustration on eth0:2. Machine is SMP 2.2.13 net-tools 1.49

chuck:~# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:90:27:71:46:B1
          inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          RX packets:6071219 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6317319 errors:0 dropped:0 overruns:4 carrier:0
          collisions:757453 txqueuelen:100
          Interrupt:18 Base address:0x6000

eth0:1    Link encap:Ethernet  HWaddr 00:90:27:71:46:B1
          inet addr:192.168.1.110  Bcast:192.168.1.110  Mask:255.255.255.255
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          Interrupt:18 Base address:0x6000

eth0:2    Link encap:Ethernet  HWaddr 00:90:27:71:46:B1
          inet addr:192.168.1.240  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          Interrupt:18 Base address:0x6000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3924  Metric:1
          RX packets:299 errors:0 dropped:0 overruns:0 frame:0
          TX packets:299 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

chuck:~# netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
192.168.1.110   0.0.0.0         255.255.255.255 UH        0 0          0 eth0
192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 eth0
127.0.0.0       0.0.0.0         255.0.0.0       U         0 0          0 lo
0.0.0.0         192.168.1.1     0.0.0.0         UG        0 0          0 eth0

Deleting eth0:1 with netmask /32

chuck:~# ifconfig eth0:1 192.168.1.110 netmask 255.255.255.255 down
chuck:~# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:90:27:71:46:B1
          inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          RX packets:6071230 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6317335 errors:0 dropped:0 overruns:4 carrier:0
          collisions:757453 txqueuelen:100
          Interrupt:18 Base address:0x6000

eth0:2    Link encap:Ethernet  HWaddr 00:90:27:71:46:B1
          inet addr:192.168.1.240  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          Interrupt:18 Base address:0x6000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3924  Metric:1
          RX packets:299 errors:0 dropped:0 overruns:0 frame:0
          TX packets:299 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0


If you do the same thing with eth0:2 with the /24 netmask
			</para><para>
chuck:~# ifconfig eth0:2 192.168.1.240 netmask 255.255.255.0 down
chuck:~# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:90:27:71:46:B1
          inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          RX packets:6071237 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6317343 errors:0 dropped:0 overruns:4 carrier:0
          collisions:757453 txqueuelen:100
          Interrupt:18 Base address:0x6000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3924  Metric:1
          RX packets:299 errors:0 dropped:0 overruns:0 frame:0
          TX packets:299 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

tunl0     Link encap:IPIP Tunnel  HWaddr
          unspec addr:[NONE SET]  Mask:[NONE SET]
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

45.12. Multiple IPs on the Director

Michael Sparks

It's useful for the director to have 3 IP addresses. One which is the real machines base IP address, one which is the virtual service IP address, and then another virtual IP address for servicing the director. The reason for this is associated with director failover.

Suppose:

  • X realservers pinging director on real IP A (assume a heartbeat style monitor) serving pages off virtual IP V. (IP A would be in place of hostip above)

  • Director on IP A fails, backup director (*) on IP B comes online taking over the virtual IP V. By not taking over IP A, IP B can watch for IP A to come back online via the network, rather than via a serial link (etc).

  • Problem is the realservers are still sending to IP A for the heartbeat code to be valid on IP B, the realservers need to send their pings to IP B instead. IMO the easiest solution is to allocate a we need a "heartbeat"/monitor virtual IP. (this is the vhostip)

45.13. Testimonials

This isn't particularly inclusive. We don't pester people for testimonials as we don't want to scare people from posting to the mailing list and we don't want inflated praise. People seem to understand this and don't pester us with their performance data either. The quotes below aren't scientific data, but it is nice to hear. The people who don't like LVS presumably go somewhere else, and we don't hear any complaints from them.

"Daniel Erdös" 2 Feb 2000

How many connections did you really handled? What are your impressions and experiences in "real life"? What are the problems?

Michael Sparks zathras (at) epsilon3 (dot) mcc (dot) ac (dot) uk

Problems - LVS provides a load balancing mechanism, nothing more, nothing less, and does it *extremely* well. If your back end realservers are flakey in anyway, then unless you have monitoring systems in place to take those machines out of service as soon as there are problems with those servers, then users will experience glitches in service.

NB, this is essentially a realserver stability issue, not an LVS issue - you'd need good monitoring in place anyway if you weren't using LVS!

Another plus in LVS's favour in something like this over the commercial boxes, is the fact that the load balancer is a Unix type box - meaning your monitoring can be as complex or simple as you like. For example load balancing based on wlc could be supplemented by server info sent to the director.

Drew Streib ds (at) varesearch (dot) com 23 Mar 2000

I can vouch for all sorts of good performance from lvs. I've had single processor boxes handle thousands of simultaneous connections without problems, and yes, the 50,000 connections per second number from the VA cluster is true.

lvs powers SourceForge.net, Linux.com, Themes.org, and VALinux.com. SourceForge uses a single lvs server to support 22 machines, multiple types of load balancing, and an average 25Mbit/sec traffic. With 60Mbit/sec of traffic flowing through the director (and more than 1000 concurrent connections), the box was having no problems whatsoever, and in fact was using very little cpu.

Using DR mode, I've sent request traffic to an director box resulting in near gigabit traffic from the realservers. (Request traffic was on the order of 40Mbit.)

I can say without a doubt that lvs toasts F5/BigIP solutions, at least in our real world implementations. I wouldn't trade a good lvs box for a Cisco Local Director either.

The 50,000 figure is unsubstantiated and was _not_ claimed by anyone at VA Linux Systems. A cluster with 16 apache servers and 2 LVS servers in a was configured for Linux World New York but due to interconnect problems the performance was never measured - we weren't happy with the throughput of the NICs so there didn't seem to be a lot of point. This problem has been resolved and there should be an opportunity to test this again soon.

In recent tests, I've taken multinode clusters to tens of thousands of connections per second. Sorry for any confusion here. The exact 50,000 number from LWCE NY is unsubstantiated.

Jerry Glomph Black black (at) real (dot) com 23 Mar 2000

We ran a very simple LVS-DR arrangement with one PII-400 (2.2.14 kernel)directing about 20,000 HTTP requests/second to a bank of about 20 Web servers answering with tiny identical dummy responses for a few minutes. Worked just fine.

Now, at more terrestrial, but quite high real-world loads, the systems run just fine, for months on end. (using the weighted-least-connection algorithm, usually).

We tried virtually all of the commercial load balancers, LVS beats them all for reliability, cost, manageability, you-name-it.

45.14. Transport Layer Security(TLS)

Noma wrote Nov 2000

Are you going to implement TLS(Transport Layer Security) Ver1.0 on LVS?

Wensong

I haven't read the TLS protocol, so don't know if the TLS transmits IP address and/or port number in payload. In most cases, it should not, because SSL doesn't.

If it doesn't, you can use either of three VS/NAT, LVS-Tun and LVS-DR methods. If it does, LVS-Tun and LVS-DR can still work.

Ted Pavlic tpavlic (at) netwalk (dot) com, Nov 2000

I don't see any reason why LVS would have any bearing on TLS. As far as LVS was concerned, TLS connections would just be like any other connections.

Perhaps you are referring to HTTPS over TLS? Such a protocol has not been completed yet in general, and when it does it still will not need any extra work to be done in the LVS code.

The whole point of TLS is that one connects to the same port as usual and then "upgrades" to a higher level of security on that port. All the secure logic happens at a level so high that LVS wouldn't even notice a change. Things would still work as usual.

Julian Anastasov ja (at) ssi (dot) bg

This is an end-to-end protocol layered on another transport protocol. I'm not a TLS expert but as I understand TLS 1.0 is handled just like the SSL 3.0 and 2.0 are handled, i.e. they require only a support for persistent connections.

45.15. Setting up a hot spare server

Mark Miller markm (at) cravetechnology (dot) com 09 May 2001

We want a configuration where two Solaris based web servers will be setup in a primary and secondary configuration. Rather than load balancing between the two we really want the secondary to act as a hot spare for the primary.

Here is a quick diagram to help illustrate this question:

                  Internet		LD1,LD2 - Linux 2.4 kernel
                      |			RS1,RS2 - Solaris
                   Router
                      |
               -------+-------
               |             |
             -----         -----
             |LD1|         |LD2|
             -----         -----
               |             |
               -------+-------
                      |
                    Switch
                      |
               ---------------
               |             |
             -----         -----
             |RS1|         |RS1|
             -----         -----

Paul Baker pbaker (at) where2getit (dot) com 09 May 2001

Just use heartbeat on the two firewall machines and heartbeat on the two Solaris machines.

Horms horms (at) vergenet (dot) net 09 May 2001

You can either add and remove servers from the virtual service (using ipvsadm) or toggle the weights of the servers from zero to non-zero values.

Alexandre Cassen alexandre (dot) cassen (at) wanadoo (dot) fr 10 May 2001

For your 2 LDs you need to run a Hot standby protocol. Hearthbeat can be used, you can also use vrrp or hsrp. I am actually working on the IPSEC AH implementation for vrrp. That kind of protocol can be usefull because your LD backup server can be used even if it is in backup state (you simply create 2 LDs VIP and set default gateway of your serveur pool half on LD1 and half on LD2).

For your webserver hot-spare needs, you can use the next keepalived in which there will be "sorry server" facility. This mean exactly what you need => You have a RS server pool, if all the server of this RS server pool are down then the sorry server is placed into the ipvsadm table automaticaly. If you use keepalived keep in mind that you will use NAT topology.

Joe 11 May 2001

Unless there's something else going on that I don't know about, I expect this isn't a great idea. The hot spare is going to degrade (depreciate, disk wear out - although not quite as fast, software need upgrading) just as fast idle as doing work.

You may as well have both working all the time and for the few hours of down time a year that you'll need for planned maintenance, you can make do with one machine. If you only need the capacity of 1 machine, then you can use two smaller machines instead.

45.16. An LVS of LVSs

Since an LVS obeys unix client/server semantics, an LVS can replace a realserver (at least in principle, no-one has done this yet). Each LVS layer could have its own forwarding method, independantly of the other LVSs. The LVS of LVSs would look like this, with realserver_3 being in fact the director of another LVS and having no services running on it.

                        ________
                       |        |
                       | client |
                       |________|
			   |
                           |
                        (router)
                           |
			   |
                           |       ____________
                           |  DIP |            |
                           |------| director_1 |
                           |  VIP |____________|
                           |
                           |
                           |
         ------------------------------------
         |                 |                |
         |                 |                |
     RIP1, VIP         RIP2, VIP        RIP3, VIP
   ______________    ______________    _____________
  |              |  |              |  |             |
  | realserver1  |  | realserver2  |  | realserver3 |
  |              |  |              |  | =director_2 |
  |______________|  |______________|  |_____________|
                                            |
                                            |
         ------------------------------------
         |                 |                |
         |                 |                |
     RIP4, VIP         RIP5, VIP        RIP6, VIP
   ______________    ______________    ______________
  |              |  |              |  |              |
  | realserver4  |  | realserver5  |  | realserver6  |
  |              |  |              |  |              |
  |______________|  |______________|  |______________|

If all realservers were offering http and only realservers1..4 were offering ftp, then you would (presumably) setup the directors with the following weights for each service:

  • director_1: realserver1 http,ftp=1; realserver2 http,ftp=1;realserver3 http=3,ftp=1

  • director_2: realserver4 http,ftp=1; realserver5 http=1 (no ftp);realserver3 http=1 (no ftp)

You might want to do this if realservers4..6 were on a different network (i.e. geographically remote). In this case director_1 would be forwarding by LVS-Tun, while director_2 could use any forwarding method.

45.16.1. An LVS of LVSs: using Windows/Solaris machines with LVS-Tun

This is the sort of ideas we were having in the early days. It turns out that not many people are using LVS-Tun, most people are using Linux realservers, and not many people are using geographically distributed LVSs.

Joe, Jun 99

For the forseeable future many of the servers who could benefit from the LVS will be microsoft or Solaris. The problem is that they don't have tunneling. A solution would be to have a linux box in front of each realserver on the link from the director to the realserver. The linux box appears to be the server to the director (it has the real IP eg 192.168.1.2) but does not have the VIP (eg 192.168.1.110). The linux box decapsulates the packet from the director and now has a packet from the client to the VIP. Can the linux box route this packet to the realserver (presumably to an lo device on the realserver)?

The linux box could be a diskless 486 machine booting off a floppy with a patched kernel, like the machines in the Linux router project.

Wensong 29 Jun 1999

We can use nested (hyprid) LinuxDirector approach. For example,

    LVS-Tun   ---->   LVS-NAT ---->  RealServer1
         |                 |       ...
         |                 ----->  RealServer2
         |
         |           ....
         |
         |
         -------->   LVS-NAT  ....

Real Servers can run any OS. A LVS-NAT load balancer usually can schedule over 10 general servers. And, these LVS-NATs can be geographically distributed.

By the way, LinuxDirector in kernel 2.2 can use LVS-NAT, VS-TUN and LVS-DR together for servers in a single configuration.

45.17. LVS on a Linux/IBM mainframe

Kyle Sparger ksparger (at) dialtoneinternet (dot) net 18 Sep 2001

I'm familiar with the s/390; the zSeries 900 will be similar, but on a 'next-gen' scale -- It's 64-bit and I expect 2-3 times the maximum capacity.

  • The s/390 is ONLY, at most, a 12-way machine in a single frame, 24-way in a two-frame configuration. The CPU's are not super-powered; they're normal CPU's, so imagine a normal 12-24 way, and you have a good idea. It does have special crypto-processors built in, if you can find a way to use them.

  • The s/390, however, has an obnoxiously fast bus -- 24GByte/s. Yes, I did mean gigabytes. Also, I/O takes up almost no CPU time, as the machines have sub-processors to take care of it.

  • The s/390 is a 31bit machine -- yes, 31. One bit defines whether the code is 16 or 31 bit code. The z/900 is a 64-bit machine. Note that the s/390, afaik, suffers when attempting to access memory over a certain amount, like any 31/32 bit machine would -- 2 gigs can be addressed in a single clock cycle; greater than that takes longer to process, since it requires more than 32 bits to address.

  • From top to bottom, the entire machine is redundant. There is no single point of failure anywhere in the machine. According to IBM's docs, the MTBF is 30 years. It calls IBM when it's broken, and they come out and fix it. The refrigerator ad was no joke ;) Of course, this doesn't protect you from power outages, but interestingly enough, if I recall correctly, all RAM is either SRAM, or battery backed -- the machine will come back up and continue right where it left off when it lost power. No restarting instances or apps required. No data lost.

There are five premises for the cost-savings:

  • You don't have to design a redundant system -- it's already built in.

  • One machine is easier to manage than n number servers.

  • One machine uses less facilities than n number servers.

  • A single machine, split many ways, can result in higher utilization.

  • Linux, Linux, Linux. All the free software you can shake a stick at.

On the flip-side, there are some constraints:

  • If you have 500 servers, all at 80% CPU usage, there's no way you're going to cram them all onto the mainframe. Part of the premise is that most servers sit at only a fraction of their maximum capacity.

  • The software must be architecture compatible.

  • Mainframe administrators and programmers are rare and expensive.

The ideal situation for an s/390 or z/Series is an application which is not very CPU intensive, but is highly I/O intensive, that must _NEVER_ go down. Could that be why many companies do databases on them? Think airline ticketing systems, financial systems, inventory, etc :) Realize, however, that your cost of entry is probably going to be well over a million dollars, unless you want a crippled entry-level box. You probably don't want to buy this server to run your web site. You probably want to buy it to run your database. That being said, if you happen to order more than you really need -- a reasonably common phenomenon in IT shops -- you can now run Linux instances with that extra capacity. :)

45.18. mqseries

The LVS worked for a client connected directly to the director, but not from a client on the internet.

Carlos J. Ramos cjramos (at) genasys (dot) es 12 Mar 2002

Now, it seems to be solved by using static routes to hosts instead of using static routes to networks.

There is also another important note. Directors uses MQSeries from IBM, the starting sequence in haresources was mqseries masq.lvs (script for NAT), it looks that the 1 minute needed by mqseries to get up was confusing(!?) masq.lvs or ldirectord. We have just change the order to get up mqseries and masq.lvs, rising up first masq.lvs and finally mqseries.

With these two changes it works perfectly.

45.19. LVS log files

Chris Ruegger

Does LVS maintain a log file or can I configure it to use one so I can see a history of the requests that came in and how it forwarded them?

Joe 1 Apr 2002

It doesn't but it could. LVS does make statistics available.

Another question is whether logging is a good idea. The director is a router with slightly different rules than a regular router. It is designed to handle 1000's requests/sec and operate with no spinning media (eg on a flash card). There's no way you can log all connections to a disk and maintain throughput. You couldn't even review the contents of the logs. People do write filter rules, looking for likely problems and logging suspicious packets. Even reviewing those files overwhelmes most people.

Ratz 2 Apr 2002

LVS works on L4. Maybe the following command will make you happy:

echo 666 > /proc/sys/net/ipv4/vs/debug_level

Joe - is 666 the logging level of the beast?

Horms 06/20/2005

LVS is part of the kernel. And as such any logging is done through the kernel. If LVS was compiled with support for debugging information, then /proc/sys/net/ipv4/vs/debug_level will exist. If you run

echo 0 > /proc/sys/net/ipv4/vs/debug_level

then it will turn logging off. If you echo any value greater than 0, it will increase the verbosity of logging. I believe the useful range of values is from 1 - 12. That is, once you get to 12, you have as much debugging information as you will get, and increasing the value won't give you any more. For a running server, I'd suggest a value of 3 or less.

Graeme

The debug logs above are (at the higher levels) hugely detailed, far more so than most people would require, and (oddly enough) are best suited for debugging problems with the LVS module code itself than anything else.

If what you want is heartbeat or healthcheck monitoring, there are a number of applications which do this; the most common approaches are (in no particluar order):

  • heartbeat + mon
  • ldirectord
  • keepalived

45.20. LVS and linux vlan

Matt Stockdale

Does the current LVS code work in conjuction with the linux vlan code? We'd like to have a central load balancing device, that connects into our core switch w/ a dot1q trunk, and can have virtual interfaces on any of our many netblocks/vlans.

Benoit Gaussen bgaussen (at) fr (dot) colt (dot) net 20 Mar 2002

I tested it and it works. The only problem I encountered is a MTU problem with eepro100 driver and 8021q code. However there is a small patch on 8021q website. My config was linux 2.4.18/lvs 1.0.0 configured with LVS-NAT.

45.21. multi-home, multi-router LVS

Matthew S. Crocker matthew (at) crocker (dot) com 29 Oct 2002

I use LVS in a multi-homed, multi-router HSRP setup.

Each LVS is connected to a seperate switch Each Router is connected to each switch and my upstream providers. We use BGP4 to talk with our upstream providers. Routers use HSRP failover for an IP address that the LVS boxes use as a gateway address.

The LVS setup is pretty much a standard LVS-NAT install using keepalived. Each LVS has a default route pointing to an IP address which is a virtual IP and part of the HSRP router failover system.

The Routers are standard cisco 7500 series running BGP4 between themselves and my providers. They also run HSRP (Hot Swap Router Protocol) between their ethernet interfaces.

With my setup I can lose a link, a router, a switch or an LVS box and not go down.

45.22. Horror story, mostly from slow file system with disk intensive application

This was a long thread. The poster's application worked fine on a single server, but suffered intermittant freezes when moved to an LVS. Although many suggestions were offered, none helped and the poster had to figure it out by himself. In the meantime the poster changed from LVS-NAT to LVS-DR and rearranged his setup several times over a period of about 3 weeks.

Jan Abraham jan_abraham (at) gmx (dot) net 11 Nov 2003

I've solved the issue this morning. Combination of two independent problems:

  • Problem A:

    • Poorly written PHP application (lucky me, not my fault...) -> tons of PHP includes on every request (i.e. lots of disk accesses).
    • lack of noatime,nodiratime in /etc/fstab, more disk accesses.
    • Use of ext3 with the default data mode (ordered) I should be beaten for this. Still, it's a unknown issue why it worked well when running on a single server (without LVS).

      I'm not an expert in filesystems, but I can imagine that the ext3 journal ran out of space and holds the system until all entries were written on their respective places. Just an idea. "man mount" suggests to use "writeback" as data mode to improve performance, with the risk of having some files containing old data fragments after a crash.

      The reason I choosed a journaling file system was to minimize down time after a crash. For now, I've switched back to ext2, but I'll do some new attempts with the suggested writeback mode on ext3.

      I think it shall be written with bright red letters: "do not use journaling filesystem without noatime,nodiratime on a high traffic website".

  • Problem B:

    A switch that mysteriously sends packets to the wrong servers. We've replaced the switch two times, now all packets arrive where they should. I'll try to get back to LVS-DR tomorrow.

Jacob Coby jcoby (at) listingbook (dot) com 12 Nov 2003

If you aren't using it already, take a look at the PHP Accelerator (http://www.php-accelerator.co.uk/). It made a HUGE difference in our ability to serve dynamic content quickly. Our site is made up of about 75k loc of PHP (plus an additonal 20kloc of support code in php), of which about ~35k is used per page, including at least 8 includes per page. We serve up some 7 million pages/month (~110gb). We aren't a huge site, but we're able to support this with a dual PIII 733 running at max at 60%.

Using PHPA reduced the server load by about 50%, improved latency and page rendering times by anywhere from 50 - 300%, and allowed us to continue using our single web server for at least another 2 years without moving to multiple, load balanced servers. Load balancing is still in the future, but more or less for reduncancy than anything else.

45.23. RTNETLINK answers:

Son Nguyen, 8 Jul 2005

root@realserver [~]# ip route get from CIP to VIP iif tunl0
RTNETLINK answers: Invalid argument

Horms

I suspect that the route in question is unknown to the kernel. e.g. my box is 172.16.4.222 and the gateway is 172.16.0.1.

# ip route get from 172.16.4.222 to 172.16.0.1
172.16.0.1 from 172.16.4.222 dev eth0 
    cache  mtu 1500 rtt 3ms rttvar 5ms cwnd 2 advmss 1460 hoplimit 64
# ip route get from 172.16.4.223 to 172.16.0.1
RTNETLINK answers: Invalid argument

Reid Sutherland mofino (at) gmail (dot) com 4 Aug 2005

Enable IP Advanced Routing and then whatever else you need under that.

45.24. LVS chokes on 600+ connections

Andrei Taranchenko andrei (at) towerdata (dot) com 25 Aug 2005

We have an active director with 256 MB RAM, and two nodes. When I do a stress test, the client starts getting timeouts when the number of *inactive* connections hits 600 or so. If I take out a node, the same number of inactive connections is easily handled by a node, but it is still a problem for the director. The nodes and the director are connected on the hub, and the director is the default gateway. The nodes are also connected to the rest of the network on the other interface (they need to see the database, etc).

Horms

Perhaps there is some issue with the kernel's network stack, and it could be resolved by trying a different version. 256Mb of ram should be able to handle a lot more than 600 connections (they consome 100 or so bytes each). That is, unless your box is very low on memory for other reasons. I've done tests with LVS going up to 3,000,000 connections, on boxes with around 512Mb or ram (I don't remember exactly), so it shouldn't be an LVS problem.

45.25. Anti load balancing: all traffic required to go to one realserver

This is the opposite of what LVS is designed to do. Only a manager would ask for this.

Cristi

I have a LVS NAT setup running for some time now. I want, for management issues, that connections to the VIP from a certain host (i don't even need granularity) to always be redirected to RS01, for example. If this cannot be done via ipvs, could you please sugest another course of action?

Graeme 10 Nov 2008

Combine netfilter marks (fwmarks) and a virtual service based on mark values instead of VIP. Catch packets from 1.2.3.4 destined for the VIP service port and set a mark:

iptables -t mangle -I INPUT -s 1.2.3.4/32 -d $VIP -p tcp --dport $VIP_PORT -j MARK --set-mark 0x1234
ipvsadm -A -f 0x1234
ipvsadm -a -f 0x1234 -r 192.168.10.1:0 -m

This way, hopefully, all packets from 1.2.3.4 will end up being handled by 192.168.0.10 only. Give it a try.