This LVS-HOWTO is posted to the LVS-HOWTO homepage, http://www.austintek.com/LVS/LVS-HOWTO/ about once a month (although I do miss occasional months).
Some of the material is from my own testing and I've tried to make it into a coherent story. Much of the material is from the lvs-users mailing list and is listed chronologically (sometimes forward and sometimes backwards in time) and will thus look like a blog.
Contributions to this HOWTO came from the mailing list and are attibuted to the poster (with e-mail address). Postings may have been edited to fit into the flow of the HOWTO.
The LVS logo (Tux with 3 lighter shaded penguins behind him representing a director and 3 realservers) is by Mike Douglas spike (at) bayside (dot) net
LVS homepage is running on a machine donated by Horms. (Until Jul 2002, we used a machine donated by Voxel).
LVS mailing list is hosted by Lars in Germany lmb (at) suse (dot) de
To enable you to understand how a Linux Virtual Server (LVS) works.
The LVS-mini-HOWTO (http://www.austintek.com/LVS/LVS-HOWTO/mini-HOWTO/LVS-mini-HOWTO.html) tells you how to setup and install an LVS without understanding how the LVS works.
The material here covers directors and realservers with 2.2, 2.4 and 2.6 kernels.
The original material was written for 2.2.x kernels and ipchains. Not all material has been updated for 2.4.x kernels and iptables.
The layout of this HOWTO is almost flat - you go to the section you want information on. You aren't supposed to read it from start to finish. Within any section, newer information may be combined with older information that says different things. I just don't have time to edit everything - I'll be glad if you straighten me out. The only information one level up is
The code for 2.0.x kernels still works fine and was used on production systems when 2.0.x kernels were current, but is not being developed further. For 2.2 kernels, the Linux kernel networking code was rewritten, producing for us The Arp Problem. This changes the installation of LVS from a simple process that can be done by almost anyone, to a thought provoking, head scratching exercise, which requires detailed understanding of the workings of LVS. For 2.0 and 2.2, LVS is stand-alone code, based on ip_masquerading and doesn't integrate well with other uses of ip_masquerading. For 2.4 kernels, LVS was rewritten as much as possible to be a netfilter module, to allow it to fit into and be visible to other netfilter modules. Unfortunately the fit isn't perfect, but cooperation with netfilter does work in most cases. If ip_vs() was a real netfilter module, it would be really slow. (The original LVS-NAT code had problems when using your director as a firewall; see the Running a firewall on the director, but much of this has been fixed - Feb 2006.) Being a netfilter module, the latency and throughput are slightly worse for 2.4 LVS than for the 2.2 code. However with modern CPUs being running at 800MHz, the bottleneck now is network throughput rather than LVS throughput (you only need a small number of realservers to saturate 100Mbps ethernet).
In general ipvsadm commands and services have not changed between kernels.
The HOWTO was originally written in sgml. It is now xml. The char '&' found in C source code has to be written as & in sgml. If you swiped patches from the sgml rather than the html rendering, you would get code that needed to be edited to fix the &. Now that the HOWTO is in xml, this munging is not needed. Although I've tried to remove all munged ampersands, I expect some will persist for a while. Ampersands in URLs still have to be munged.
Well we hope so anyhow.
An article on spambots describes robots which ignore the robots.txt file and scan for e-mail addresses in readable files on websites. The author suggests removing any 'mailto:' strings and spam protecting e-mail addresses, by changing them from machine-readable to human-readable format. If you have a better scheme than implemented here, (and I can do it with vi) let me know.
(May 2002): BTW, 160 people have contributed to the HOWTO (as judged by unique e-mail addresses).
There are links to 180 urls in this HOWTO (May 2002), which came from postings to the LVS mailing list. If people move/rename/delete/change their webpages/links once a year, then I'm going to have to trackdown 15 websites each month. If a site is gone and it isn't in google, I'm not going to be able to find it.
If you use these terms when you mail us, we'll know what you're talking about.
Please use the first term in these lines. The other words are valid but less precise (or are redundant).
Here's the ipvsadm output of an LVS serving telnet and squid.
director:/etc/rc.d# ipvsadm IP Virtual Server version 0.9.4 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP lvs.mack.net:squid rr -> rs1.mack.net:squid Route 1 0 0 -> rs2.mack.net:squid Route 1 0 0 -> rs3.mack.net:squid Route 1 0 0 TCP lvs.mack.net:telnet rr -> rs1.mack.net:telnet Route 1 0 0 -> rs2.mack.net:telnet Route 1 0 0
In the above LVS, there are two virtual services, telnet and squid. There are also two virtual servers; a virtual server for telnet (which has 2 realservers) and a virtual server for squid (which has 3 realservers). This is what the client sees; two services (and two servers).
Connections to each virtual server are scheduled (here by "rr", round robin), to the realservers which belong to the scheduling group. Here the scheduling group for telnet is rs1,rs2. The scheduling group for quid is rs1,rs2,rs3. Connections to the telnet virtual server are scheduled independantly of connections to the squid virtual server.
The above nomenclature can be extended for firewall mark (fwmark).
We don't have a good name for this. Suggestions welcome. (We also don't talk much about this concept on the mailing list, so we've done without a name).
The director needs to know how to schedule packets from the client to the realservers. The smallest unit for LVS is a tcpip connection, i.e. all packets that are part of a single tcpip session from a client will be sent to the same realserver. For a tcp virtual service, each tcp connection is scheduled separately, with the first tcp connection going to one realserver, and the next tcp connection going to the next realserver assigned a connection from the scheduler. The virtual connection is the same as the tcp connection.
For a persistent connection all tcp connections that are separated by less than the timeout period are regarded as belonging to the same virtual connection and are scheduled to the same realserver.
For udp, there is no such thing as a connection or session and all packets from the client within a timeout period are scheduled to the same realserver. (People aren't using LVS for udp services a whole lot). The virtual connection then is all udp packets from a client within a certain time period.
The realservers sometimes are frontends to other backend servers. The client does not connect to these backend servers and they are not in the ipvsadm table.
These backend servers are setup separately from the LVS.
People sometimes call the director or the realservers, "the server". Since the whole LVS appears as a server to the client and since the realservers are also serving services, the term "server" is ambiguous. Do not use the term "the server" or "the lvs server" when talking about LVS. Most often you are referring to the "director" or the "realservers". Sometimes (e.g. when talking about throughput) you are talking about the (whole) virtual server.
I use "realserver" as I despair of finding a reference to a "realserver" in a webpage using the search keys "real" and "server". Horms and I (for reasons that neither of us can remember) have been pushing the term "real-server" for about a year, on the mailing list, and no-one has adopted it. We're going back to "realserver".
________ | | | client | (local or on internet) |________| CIP | -- (router) DGW | outside network | L VIP i ____|_____ n | | (director can have 1 or 2 NICs) u | director | x |__________| DIP (and PIP) V | i | DRIP network r ---------------------------------- t | | | u | | | a RIP1 RIP2 RIP3 l _____________ _____________ _____________ | | | | | | S | realserver1 | | realserver2 | | realserver3 | e |_____________| |_____________| |_____________| r v e r ---
The router has traditionally not been considered part of the LVS, because often you do not have control over the router. However if you're a paying customer, then the ISP will be glad to set up the router according to your specifications. If you have access to the router, it can solve The Arp Problem and can install filter rules.
Here are the names we use for the various IPs. If you use them when asking questions on the mailing list, we'll be able to answer your questions more easily.
client IP = CIP virtual IP = VIP - the IP on the director that the client connects to) director IP = DIP - the IP on the director in the DIP/RIP (DRIP) network (this is the realserver gateway for LVS-NAT) realserver IP = RIP (and RIP1, RIP2...) the IP on the realserver director GW = DGW - the director's gw (only needed for LVS-NAT) (this can be the realserver gateway for LVS-DR and LVS-Tun)
The VIP and DIP are setup as secondary IPs, (i.e. there is another primary IP on that NIC), so they can be moved to another duplicate director following director failover. For initial setup with a single director, setting up the VIP and DIP as secondary IPs will make the transition to a failover setup easier.
For a two director LVS (where directors failover), the IPs on the DRIP network are
primary director IP = PIP (the director which will be the master on bootup) secondary director IP = SIP (the director which will be the backup on bootup)
The DIP will be on the same NIC as PIP on bootup and will move to the same NIC as SIP on director failover.
We don't seem to need a name for the primary IP on the outside of the director - no-one ever talks about it.
We don't often need to explicitely name the networks in an LVS, but here's some suggestions
network facing the internet or the outside network: the network on the director which receives packets from the outside world. This shouldn't be called the VIP network, as the VIP is also in the DRIP network (but not replying to arp calls) on the realservers in LVS-DR and LVS-Tun.
The mailing list and HOWTOs cover information specific to LVS. The rest you have to handle yourself. All of us knew nothing about computers when we first started, we learnt it, and you can too (we're not saying it's easy). If you can't setup a simple LVS from the LVS-mini-HOWTO, without breaking into a major sweat (or being able to tell us what's wrong with the instructions), then you need to do some more homework. (Also see Help! My LVS doesn't work.)
Ratz ratz (at) tac (dot) ch
To be able to setup/maintain an LVS, you need to be able to
- know how to patch and compile a kernel
- the basics of shell-scripting
- have intermediate knowledge of TCP/IP
- have read the man-page, the online-documentation and LVS-HOWTO (this document) (and the LVS-mini-HOWTO)
- know basic system administration (e.g. iptables; syslog; find, compile, install code from source files; use cpan to find perl modules).
All of the people on the LVS mailing list are replying for free in their spare time. The best we can do is to give solutions to technical problems on setting up and running LVS. I give about 15secs to a posting to decide if I've got something useful to say. The posting has to indicate that the person has analysed the problem to a stage where an answer exists. If _they_ can't describe the problem, there's no point in replying - they won't understand the answer.
Please don't e-mail me privately with general questions (feel free to cc: me if you want). The mailing list will archive your question and the answer(s) which can be retrieved later. Other people may have more interesting, relevant or useful comments than I will. If you are writing to me in the hopes of avoiding the humiliation of publically showing your ignorance on the mailing list, it's not going to happen. We've had too many good ideas from "ignorant" people to let this happen. If your question has been answered many times before and it's in the HOWTO and the archives, you'll be told to read the HOWTO, that's all.
To get technical help:
Jakub Suchy jakub (at) rtfm (dot) cz 13 Jan 2005
Please read: smart questions (http://www.catb.org/~esr/faqs/smart-questions.html) before asking questions.
Please only post relevant lines of a debug dump. If you post the whole dump, because you don't understand it, then it will fill up the archive machine and everyone's mail box. If we need the whole debug, we'll ask for it and you can send it to us off-list.
It's hard to believe, but we get postings like
recompiling the kernel is hard (or I don't read HOWTOs), can't you guys cut me some slack and just tell me what to do?
I expect the people who post these statements don't read this HOWTO, so I may be wasting my time, but - No. The people on the mailing list answer questions for free, and have other important things to do, like keeping up with /. and checking our e-mail. When we're at home, we drink beer and watch Gilligan's Island re-runs.
can anybody tell me how to setup a windows realserver? thank you very much! I'm in a hurry.
robert (dot) gehr (at) web2cad (dot) de
I can't think of anyone who has set up lvs in a hurry :-)
RedHat have LVS in their standard distribution kernel. This gives people the idea that they can setup LVS from their standard RedHat distribution just by clicking on a few buttons or running some scripts. From reading the postings to the mailing list, it's more difficult than doing it our way. You still have to understand LVS and then afterwards, you have to figure out what RedHat did to it. One of the major wastes of time and source of aggravation for me personally on the LVS mailing list, is postings from people using RedHat LVS who assume that it's the same as LVS, and who post as if they're using our setup methods. Just saying that you're using a RedHat distribution doesn't tell us anything, since you can setup LVS our way in RedHat. Things you need to know before you post -
If you are setting up with RedHat and want help with it, make sure that you describe what you've done, that you're using the RedHat files and how you've set it up, otherwise we'll assume that you're setting up using our methods.
The LVS-NAT ftp helper bug took a long time to figure out. Since no-one else had seen the problem, we didn't know at first if it was a problem with LVS. It wasn't till 6 months later, when someone else had the same symptom, and found that it only occured when the ftp helper module was loaded, that we could do something.
I once needed to do something with iproute2 that I spent about 3 weeks trying to figure out. No-one on the list knew the answer. I had to post off-line to someone who could figure it out for me.
We may not have a useful answer.
If you post saying "I want to build an LVS with (list of hardware); do you think it will work?", all we can say is "probably".
Often when questions like this come up, there are people who are happy to share their experiences, so there's no harm in posting such a question. In general the people who've been working with LVS for years will expect you to have read the docs and know what LVS does before you post. In the time I alot for a reply, I don't have time to figure out whether in your case LVS is best for you - you should pay a consultant to do this if you can't do it yourself.
Your question may not be well posed.
We are reading the postings in our spare time. You will get at most 30secs of attention before we figure out whether we can help you, an answer will take a bit of thinking, or we can't help you.
If you have a long posting in which you haven't figured out which parts are causing the problem and which parts are working, then we aren't going to try to figure it out either. Post the minimum setup that will produce the problem.
Please edit the posting you're replying to, leaving only the parts relevant to your reply. We don't need to see material from previous posts irrelevant to the current posting, and the disk archive doesn't either.
Reply in-line, i.e. following each statement by the poster. Here's a posting on the subject from one of the kernel mailing lists.
Greg KH greg (at) kroah (dot) com 16 Nov 2005
A: http://en.wikipedia.org/wiki/Top_post Q: Were do I find info about this thing called top-posting? A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? A: No. Q: Should I include quotations after my reply?
In most cases when a problem is solved, there's enough info on the mailing list to see how it worked and we can write it up here for the next people. Occasionally, we get a posting "I've worked it out. Thanks for the help." When this happens we have no idea what the solution was and will have to reinvent it for the next person.
If you've got help from the unpaid people on the mailing list, who've given their spare time to help you, when they could instead have been watching Gilligan's Island reruns, please write it up for the HOWTO. When I write to people asking for their solution I don't want to hear that you're busy and have a job. We're busy, have jobs, kids, homework to do and tax forms to fill in and we stopped what we were doing to help you. Here's a template.
We occasionally get requests for people to do an install. The listing is a service to people looking for paid technical help (installs or anything else) and does not imply that I (Joe) or anyone connected to the LVS project endorse the services of the listees. If you want to know more about them, check their postings to the LVS mailing list.
Entries will be listed at no cost, in approximate order of the date I receive/post them. Entries will be listed for at least a year (HOWTOs come out at erratic intervals and new entries will be added/old entries deleted whenever the next HOWTO comes out). If you want to be listed again next year, send me an e-mail in a year. I'm too busy to keep much of an eye on what goes in here and your entry may stay longer than a year. If you really want people to know who you are, don't rely on this entry - make sure google knows about you.
To be listed, send me off-list
your URL (e.g. <http://www.foo.org>The Foo LVS service centre</a>) and/or e-mail, then a blurb of upto 80chars e.g. "We do it all", optionally including your location.
this will be minimum maintenance - I'm just going to mouse swipe your e-mail (i.e. don't plan on changing your URL in the year).
People available for paid technical help.
Oct 2007: http://www.dotnoc.com - solutions for hosting firstname.lastname@example.org. Linux load balancing and networking specialists
Oct 2007: Loadbalancer.org Ltd (http://www.loadbalancer.org/) - Specialise in high availability load balancers based on LVS. Happy for customers to have full access to the OS and source code and offer 24*7 support. However we don't do consultancy on home brew implementations. UK and USA offices.
Oct 2007: http://www.netdigix.com Linux solutions for business. email@example.com. We specialize in Linux networking and setup of LVS for hosting and mission critical infrastructures. Canada:British Columbia:Lower Mainland:Vancouver
Thanks to Hank Leininger for the mailing list archive which is searchable not only by subject and author, but also by strings in the body. Hank's resource has been of great help in assembling this HOWTO.
The mailing list is available for further questions. A single mailing list handles developers, new users and old users and has about 0-20 postings a day. You don't have to join the mailing list to read the archives. If you want to post questions, then you have to join. If you aren't subscribed and you post (or you post from an unsubscribed address), you'll get a reply saying that your posting is "awaiting moderator approval". It isn't; because of the volume of spam, we no longer review these messages - they're deleted.
Please send e-mail with straight ascii (not html) and turn line-wrap on (some mails come with each paragraph on a single long line).
If you're stuck with posting from a Windows machine or Lotus notes, or using Lookout, where each paragraph is sent as one line:
Francois JEANMOUGIN Francois (dot) JEANMOUGIN (at) 123multimedia (dot) com 09 Jul 2004
System manager -> Global Settings -> Internet Message Format -> Default (or the one used) -> Advanced -> word wrap
Please don't turn on your vacation message, intended only for your work mates, for messages from a list. e.g.
I will be out of the office starting 07/30/2004 and will not return until 08/03/2004.
The LVS mailing list doesn't want to know.
Dan Moljar Aug 2004
For Lotus Notes: The client is not configured correctly. In the 'Out of Office' enable dialog under the 'Exceptions' tab, there is a check box for 'Do not reply to Internet Addresses'. Check it. The server shouldn't do it to begin with, but you can make the client stop.
There's always new ideas and questions being posted on the mailing list. We don't expect this to stop. There are many complexities to LVS and we don't expect new people to understand any more about LVS that we did when we started. No-one is expected to know/understand everything in the docs but your questions will be better received, if you've done your homework, if you have setup the test configurations here, have at least perused this HOWTO (yes we know it's big), and have looked at the mail archives. We can't help you if you just tell us that you've read the documents and your LVS doesn't work. To you, all problems look the same ("it doesn't work"). To help you, we need more information. We at least need the forwarding method, the service(s) being forwarded, the number of networks and the output of ipvsadm in the problem state.
Before you come up on the mailing list -
Set up a simple LVS (3 nodes: client, director, realserver) with LVS-DR or LVS-NAT forwarding, with the service telnet using the instructions in the LVS-mini-HOWTO. You should be able to do this starting from a freshly downloaded kernel from ftp.kernel.org and the LVS patches (ipvs and the hidden patch if you have 2.4.x realservers).
Don't setup first with http, with filter rules, with firewalls, with complicated file systems (e.g. coda, nfs) or network accelators - debug all these nifty things after you have LVS working with telnet and with no filter rules.
Do use standard compilers (gcc-2.95.3), tools and utilities (ifconfig or iproute2).
Do not use non-standard tools particular to a distribution designed to capture market share (e.g. ifup).
If you don't understand your problem well, here's a suggested submission format from Roberto Nibali ratz (at) tac (dot) ch
hog:~ # uname -a Linux hog 2.2.18 #2 Sun Dec 24 15:27:49 CET 2000 i686 unknown hog:~ # <command>ipvsadm</command> -L -n | head -1 IP Virtual Server version 1.0.2 (size=4096) hog:~ # <command>ipvsadm</command> -h | head -1 <command>ipvsadm</command> v1.13 2000/12/17 (compiled with popt and IPVS v1.0.2)
Example for LVS-DR:
o Using LVS-DR, gatewaying method. o Load balancing port 80 (http) non-persistent. o Network Setup: ________ | | | client | |________| | CIP | (router) | | GEP (packetfilter, firewall) | GIP | __________ | DIP | | +------+ director | | VIP |__________| | +-----------------+----------------+ | | | RIP1, VIP RIP2, VIP RIP3, VIP ____________ ____________ ____________ | | | | | | |realserver1 | |realserver2 | |realserver3 | |____________| |____________| |____________| CIP = 18.104.22.168 GEP = 22.214.171.124 (external gateway, eth0) GIP = 192.168.1.1 (internal gateway, eth1, masq or NAT) DIP = 192.168.1.2 (eth0:1, or eth1:1) VIP1 = 192.168.1.110 (director: eth0:110, realserver: lo0:110) RIP1 = 192.168.1.11 RIP2 = 192.168.1.12 RIP3 = 192.168.1.13 DGW = 192.168.1.1 (GIP for all realserver) o ipvsadm -L -n hog:~ # ipvsadm -L -n IP Virtual Server version 1.0.2 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.1.10:80 wlc -> 192.168.1.13:80 Route 0 0 0 -> 192.168.1.12:80 Route 0 0 0 -> 192.168.1.11:80 Route 0 0 0
The output from ifconfig from all machines (abbreviated, just need the IP, netmask etc), and the output from netstat -rn.
ipchains -L -M -n (2.2.x) or cat /proc/net/ip_conntrack (2.4.x) echo 9 > /proc/sys/net/ipv4/vs/debug_level && tail -f /var/log/kernlog tcpdump -n -e -i eth0 tcp port 80 route -n netstat -an ifconfig -a
tcpdump listings are difficult to read. If you post one, please change the IPs to VIP, CIP, RIP1..n, DIP etc. Since you'll likely be on a switched network, tcpdump will only see packets to that NIC. Tell us which machine (director, realserver...) and the NIC (if there are two NICs on the machine) that it was run on.
It's wonderful to get an unsolicited bug fix. Please let us know what it does and why it's better than the current file. A new version of a file without any information about what it does, or what it fixes isn't much use to us.
Malcolm lists (at) loadbalancer (dot) org 23 Nov 2006
Willy Tarreau's written a nice article http://1wt.eu/articles/2006_lb/ - Making applications scalable with Load Balancing on load balancing that covers layer 4 and layer 7 options. I still don't think layer 7 can ever give high availability.. but its a good read:
Ratz 23 Nov 2006
A very nice and to the point introduction. Willy, among being a nice person and a good friend, is an excellent engineer with a lot of expertise in high available, high performance and secure web services, networking and packet filtering and much more. It would be nice to have Willy contributing to/on this list as well :).
Malcolm lists (at) loadbalancer (dot) org 01 Feb 2007
HAProxy http://haproxy.1wt.eu is a tcp proxy (fast) but flexible enough to do cookie insertion and SNAT etc.
from lvs (at) spiderhosting (dot) com a list of load balancers
Brent Cook busterb (at) mail (dot) utexas (dot) edu 28 Mar 2002
There's the http://www.bsdshell.net/ HighUpTime (HUT) projec (link dead Apr 2003). It's FreeBSD.
The HUT author, Sebastian Petit spe (at) selectbourse (dot) net has joined the LVS mailing list.
For L7 Switching see the DRWS project.
BSD load balancing:
Roberto Nibali ratz (at) tac (dot) ch 05 Nov 2003
As already mentioned by others, LVS will not work on FreeBSD as director due to the kernel part. Using FreeBSD on the RS is of course ok. The BSD folks have not shown bigger interest in adopting the LVS idea or parts of the code yet. If you're interested in load balancing and HA Solutions under FreeBSD, you could check out following links:
http://www.bsdshell.net/hut_fvrrpd.html http://www.backhand.org/wackamole/ http://unix.derkeiler.com/Mailing-Lists/FreeBSD/isp/2003-05/0026.html http://redundancy.redundancy.org/fbsd_lb.html
Gavin Henry ghenry (at) suretecsystems (dot) com 06/13/2005
ClusterIP by Harald Welte. What is the list's view on it?
Gavin Henry ghenry (at) suretecsystems (dot) com 13 Jun 2005
The man page for more recent versions of iptables says:
CLUSTERIP: This module allows you to configure a simple cluster of nodes that share a certain IP and MAC address without an explicit load balancer in front of them
Been there, done that. Works, but is it neccessary? LVS with upto 16 directors active (http://www.ultramonkey.org/papers/active_active/)
A set of postings on /. 2 Mar 2009 at Best Solution for HA and Network Loadbalancing (http://tech.slashdot.org/article.pl?sid=09/03/02/0231241) lists the following
Cahya Wirawan cwirawan (at) email (dot) archlab (dot) tuwien (dot) ac (dot) at 19 Feb 2004
I'm implementing proxy, smtp and webserver with LVS as local node, and I have tested it and it's running fine, but because someone from management section thinks that such an implementation is easy (just run setup.exe and everything is installed and ready to use), he pushed me to move the setup into production, and create another one as soon as possible. I want to tell him that such an implementation is not a trivial thing and needs time to setup and to test before we go into production. I want to show him a list of companies who have such complete solutions, so he can see the cost. Then he can understand that high availability and load balancing is not easy to setup, and will cost alot of money if we buy a complete solution.
Vendors just rub their hands with glee on finding management like this - see my review of the book "The IBM Way" (http://www.austintek.com/book_reviews/the_ibm_way.html) for how IBM handles the situation.
Prices at this level are negotiable. Who knows what you could pay?
http://www.linuxvirtualserver.org/ - $0.
There's plenty of people on list who can help you and your boss feel more comfortable with your setup. I'm sure if you posted something some people would be willing to help make you sleep better at night. BTW, you know about the http://www.ultramonkey.org/ and http://www.keepalived.org/ projects, right?
Joe Oct 2005: From a presentation by Radware) (http://www.radware.com/) given to North Carolina Systems Administrators (NCSA) (http://www.ncsysadmin.org/) on 10 Oct 2005. Unfortunately I was the guy getting the pizzas for the meeting, so I missed most of the talk (which I wanted to hear).
Radware is used by Ebay and Accuweather. Radware has a NAT loadbalancing director that appears to function similarly to an LVS-NAT director. The servers can have private IPs.
Radware's loadbalancing director is only a small part of their offering. Radware have boxes that filter based on packet content (looking for viruses) that sit in the flow of packets (possibly before the director, possibly after - didn't find this out). They have boxes which just handle SYN floods. They use SYN cookies and do a statistical analysis of the packets, letting some through to see which machines reply to the SYN-ACKs. Radware has a gui to control the loadbalancer, which can do things like shutting down some of the backend servers at sometime in the future (e.g. at 10pm later that night) for new connections, so that by 8am next morning these machine have few or no connections and can be taken offline for servicing. Much of their hardware is ASIC based.
Health checking seems to be done from the director, and checks are made through to 3rd-Tier components of the backend servers (e.g. database machines behind the webservers that the client doesn't directly connect to).
Each local NAT'ed load balancing setup is itself a member of a distributed DNS-based load balancer. So www.foo.net might have a loadbalanced set of servers in different sites eg London, New York, San Francisco and Tokyo. Each local setup has an authoritative nameserver for www.foo.net
The way is works is
Although I didn't get to ask how it works, if a client winds up at a more distant site (network wise), then http redirects will send the client to a closer site.
Radware SSL accelarators:
When I commented to the speaker that the main reason to use SSL accelarators is financial, i.e. to only have one copy of the certificate, rather than one on each realserver, they said "it's also for certificate management". Presumably some sites have large numbers of certificates. (They didn't disagree with my statement.)
The SSL accelarators in the Radware design don't sit between the director and the realservers (or in front of the director i.e. between the client and the director), but sit at the same level as the other realservers. The https request is balanced by the director to an accelarator, which decrypts the packets and sends the decrypted packet back to the director for loadbalancing as http traffic. Since the director is a NAT balancer, the return http traffic from the http servers, goes back through the director, and then recursively back to the SSL accelarator then back to the director at https traffic and then back to the client.
Being able to have the SSL accelarator as a realserver in LVS would require the realservers to be a client of the director, something that we can do for LVS-NAT, but not for LVS-DR. This is not a capability that we've paid much attention to for LVS. If you need a realserver to be in the path in both inward and outward directions (like an SSL accelarator) then you will have to use LVS-NAT.
Francois JEANMOUGIN Francois (dot) JEANMOUGIN (at) 123multimedia (dot) com 12 Oct 2005
Note that we removed our Radware appliance to use LVS instead. Load Balancing using DNS is _evil_, especially with mobile internet and all those misconfigured operator gateways. Most mobile gateway are written in Java, and I'm probably the only one who read the java.security file. Just have a look on this ugly stuff you can find in it and the unbelievable silly explanation given:
# The Java-level namelookup cache policy for successful lookups: # # any negative value: caching forever # any positive value: the number of seconds to cache an address for # zero: do not cache # # default value is forever (FOREVER). For security reasons, this # caching is made forever when a security manager is set. # # NOTE: setting this to anything other than the default value can have # serious security implications. Do not set it unless # you are sure you are not exposed to DNS spoofing attack. # #networkaddress.cache.ttl=-1
For security reasons! Guys! Well. So we removed radware. Note that we had other problem with radware. The DNS cache of the clients is one, the response time of the DNS was another. Several technical issues when you reach some trafic limits was the last.
still, geographic load balancing would be very nice to have and I cannot figure out another way to do it than involve DNS round-robin.
Round-Robin DNS could work if
My clients are mobile phones, basically points 1 to 4 are not OK :). And I have to deal with multiple sources for the same client (the transaction begin in the gallery gateway and continues in the standard surf gateway, and I have to use fwmarks to keep the session)... We used RadWare to try to load-balance between our two peers. It clearly was not working. Unfortunately, I don't have all the details.
If you want to distribute traffic between hosts that have fast, reliable links, like a LAN, then LVS is a good option. No, an excellent option. If you want to distribute traffic between geographically separated hosts, then you don't want something like LVS that channles packets through a single location then to another. Something DNS based is probably the way to go - though round robin is not nearly smart enough for my liking. In practice, if you do have geographically distributed sites, then each site should probably be an LVS cluster. So essentially you end up using two techniques to solve different parts of the same problem.
I wrote quite a lot of this on supersparrow.org once upon a time, its still there if people want to read/play/enhance/. (links through Super Sparrow Project).
bak bak (at) picklefactory (dot) org 09 Jan 2007
I've used Radware, F5, and HP SAs as an admin. My 2-minute executive overview take:
Radware is great for a switch-like, low-key experience. They're relatively cheap for hardware load balancers. You get extra functionality like SSL and link balancing with extra bits of hardware. Sometimes they can be pretty hard to troubleshoot. If you want global balancing/failover, that's part of all their "AS" type switches.
F5 is the other 'big name' option. These boxes are more like Brocade switches: it's running embedded Linux in there, and if you want to run tcpdump, you can. You get extra functionality by buying a 'bigger' box and then paying F5 for more licensing. If you want to do global balancing/failover, you have to get one of their DNS devices.
If you have money to wave around, I've found both Radware and F5 are more than happy to give you a demo unit for 2-4 weeks.
Karl Kopper has tackled this. Writing a book on a moving target like LVS is a difficult proposition - certainly more than I was prepared to take on.
The Linux Enterprise Cluster Karl Kopper Pub: No Starch Press ISBN 1593270364
The book is available at your usual suppliers.
I'm loath to mention the names of internet booksellers who require your e-mail address as part of your purchase, so that they can spam you later. I've been buying my books by phone at a marginally higher price since realising their business practices. However recently (Jul 2004) I've discovered disposable e-mail addresses e.g. the free service from Jetable.org (http://www.jetable.org/). They have a google-like (i.e. simple) interface. You give them your e-mail address, the required lifetime of the address (1-8days), and click. Up comes an e-mail address (test by sending a message to it) that you can give to your internet vendor, and mail will be forwarded to you for the period selected. After that time, no more mail will get to you. I've been using jetable since Jul 2004 (now Sep 2004) and have not got any spam from Jetable or from internet vendors.
"Wired" Magazine in Jun 2004 has a small article about LVS, illustrating the multinational cooperative nature of GPL software development. The page is here (http://www.wired.com/wired/archive/12.06/images/atlas_software.pdf), or a local copy of the article on this server.
Ultra Monkey is LVS and HA combined.
tong tong (at) csusb (dot) net 25 Jun 2003
Here's a step-by-step guide for setting up an LVS system with heartbeat (http://www.cula.net/cluster).
|This guide was published a year ago and we've only just heard about it. The author has never popped up on the mailing list to say hello.|
from lvs (at) spiderhosting (dot) com Super Sparrow Global Load Balancing using BGP routing information.
Ratz is documenting the 2.6 headers and calls with doxygen (http://www.drugphish.ch/~ratz/IPVS/index.html) whenever he has reason to fiddle with a piece of code (i.e. the documentation isn't exhaustive, at least yet).
From ratz, there's a write up on load imbalance with persistence and sticky bits at our friends at M$.
From ratz, Zero copy patches to the kernel to speed up network throughput, Dave Miller's patches, Rik van Riel's vm-patches and more of Rick van Riel's patches at http://www.linux-mm.org/ (link dead Dec 2003). The Zero copy patches may not work with LVS and may not work with netfilter either (from john (at) antefacto (dot) com).
From Michael Brown michael_e_brown (at) dell (dot) com, the TUX kernel level webserver.
Dustin Puryear dustin (at) puryear-it (dot) com gave a talk on LVS at LISA 2003. The tutorial, is avaialble at: LVS: Load Balancing and High Availability for Free (http://www.puryear-it.com/publications.htm#6).