<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD Docbook XML V4.1.2//EN"
"/usr/local/share/sgml/docbook/xml-dtd-4.1.2/docbookx.dtd">
<article>
	<articleinfo>
	<title>LVS-HOWTO</title>
	<author>
        	<firstname>Joseph</firstname>
        	<surname>Mack</surname>
        	<affiliation>
                	<orgname>mack (at) wm7d (dot) net </orgname>
                	<orgdiv></orgdiv>
        	</affiliation>
	</author>
	<pubdate>v2006.09 Sep 2006, released under GPL.</pubdate>
	<copyright>
        	<year>1999</year>
        	<year>2000</year>
        	<year>2001</year>
        	<year>2002</year>
        	<year>2003</year>
        	<year>2004</year>
        	<year>2005</year>
        	<year>2006</year>
        	<holder>Joseph Mack</holder>
	</copyright>
	<abstract>
	<para>
Install, testing and running of a Linux Virtual Server with 2.2.x, 2.4.x, 2.6.x kernels
	</para>
	<para>
<emphasis role="bold">search the LVS documentation</emphasis>
	</para>
	<itemizedlist>
		<listitem>
<ulink url="http://www.austintek.com/LVS/htdig/search/search.html">
search the LVS documenation</ulink> with htdig.
		</listitem>
		<listitem>
<ulink url="http://www.linuxvirtualserver.org/mailing.html">
search the two mailing list archives</ulink>
		</listitem>
	</itemizedlist>
	</abstract>
	</articleinfo>
<section id="LVS-HOWTO.introduction" xreflabel="LVS Introduction">
<title>LVS: Introduction</title>
<para>
This LVS-HOWTO is posted to the LVS-HOWTO homepage,
<ulink url="http://www.austintek.com/LVS/LVS-HOWTO/">
http://www.austintek.com/LVS/LVS-HOWTO/
</ulink> about once a month (although I do miss occasional months).
</para>
<para>
Some of the material is from my own testing and I've tried to make
it into a coherent story.
Much of the material is from the lvs-users mailing list
and is listed chronologically 
(sometimes forward and sometimes backwards in time)
and will thus look like a blog.
</para>
	<section id="thanks">
	<title>Thanks</title>
	<para>
Contributions to this HOWTO came from the mailing list and are
attibuted to the poster (with e-mail address). Postings may have
been edited to fit into the flow of the HOWTO.
	</para>
	<para>
The LVS logo (Tux with 3 lighter shaded penguins behind him
representing a director and 3 realservers) is by Mike Douglas <emphasis>spike (at) bayside (dot) net</emphasis>
	</para>
	<para>
<ulink url="http://www.linuxvirtualserver.org">LVS homepage</ulink> is running on
a machine donated by Horms. (Until Jul 2002, we used a machine donated by Voxel).
	</para>
	<para>
<ulink url="http://www.linuxvirtualserver.org">LVS mailing list</ulink> is hosted by
Lars in Germany <emphasis>lmb (at) suse (dot) de</emphasis>
	</para>
	</section>
	<section id="about">
	<title>About the HOWTO</title>
		<section id="purpose"><title>Purpose</title>
		<para>
To enable you to understand how a Linux Virtual Server (LVS) works.
		</para>
		<para id="mini-HOWTO">
The
<ulink url="http://www.austintek.com/LVS/LVS-HOWTO/mini-HOWTO/LVS-mini-HOWTO.html">LVS-mini-HOWTO</ulink>
(http://www.austintek.com/LVS/LVS-HOWTO/mini-HOWTO/LVS-mini-HOWTO.html)
tells you how to setup and install an LVS without understanding how the LVS works.
		</para>
		<para>
The material here covers directors and realservers with 2.2, 2.4 and 2.6 kernels.
		</para>
		<note>
		<para>
The original material was written for 2.2.x kernels and ipchains. Not
all material has been updated for 2.4.x kernels and iptables.
		</para>
		</note>
		<para>
The layout of this HOWTO is almost flat -
you go to the section you want information on.
You aren't supposed to read it from start to finish.
Within any section, newer information may be combined
with older information that says different things.
I just don't have time to edit everything - I'll
be glad if you straighten me out.
The only information one level up is
		</para>
		<itemizedlist>
			<listitem>
	how LVS works
(from <link linkend="what_is_an_LVS">this HOWTO</link> or
from documentation on the website, <emphasis>e.g.</emphasis>
Wensong's early documents)
			</listitem>
			<listitem>
setting up an LVS in the
<link linkend="mini-HOWTO">LVS-mini-HOWTO.html</link>.
			</listitem>
		</itemizedlist>
		<para>
The code for 2.0.x kernels still works fine and was used on production
systems when 2.0.x kernels were current, but is not being developed further.
For 2.2 kernels, the Linux kernel networking code was rewritten,
producing for us <xref linkend="LVS-HOWTO.arp_problem"/>.
This changes the installation of LVS from a simple
process that can be done by almost anyone,
to a thought provoking, head scratching exercise,
which requires detailed understanding of the workings of LVS.
For 2.0 and 2.2, LVS is stand-alone code, based on ip_masquerading and
doesn't integrate well with other uses of ip_masquerading.
For 2.4 kernels, LVS was rewritten as a netfilter module to allow it to
fit into and be visible to other netfilter modules.
Unfortunately the fit isn't perfect,
but cooperation with netfilter does work in most cases.
(You will have trouble using your director as a firewall;
see the <xref linkend="LVS-HOWTO.filter_rules"/>.)
Being a netfilter module, the latency and throughput are slightly worse
for 2.4 LVS than for the 2.2 code.
However with modern CPUs being running at 800MHz,
the bottleneck now is network throughput rather than LVS throughput
(you only need a small number of realservers to saturate 100Mbps ethernet).
		</para>
		<para>
In general <command>ipvsadm</command> commands and services have not changed between kernels.
		</para>
		</section>
		<section id="source_xml">
		<title>HOWTO source is xml</title>
		<para>
The HOWTO was originally written in sgml. It is now xml.
The char '&amp;' found in C source code
has to be written as &amp;amp; in sgml.
If you swiped patches from the sgml rather than the html rendering,
you would get code that needed to be edited to fix the &amp;.
Now that the HOWTO is in xml, this munging is not needed.
Although I've tried to remove all munged ampersands,
I expect some will persist for a while.
Ampersands in URLs still have to be munged.
		</para>
		</section>
		<section id="e-mail">
		<title>e-mail addresses in the HOWTO are spam protected</title>
		<para>
Well we hope so anyhow.
		</para>
		<para>
An article on <ulink url="http://www.neilgunton.com/spambot_trap/">spambots</ulink>
describes robots which ignore the robots.txt file and scan for e-mail addresses
in readable files on websites.
The author suggests removing any 'mailto:' strings and spam protecting e-mail addresses,
by changing them from machine-readable to human-readable format.
If you have a better scheme than implemented here, (and I can do it with vi) let me know.
		</para>
		<para>
(May 2002): BTW, 160 people have contributed to the HOWTO
(as judged by unique e-mail addresses).
		</para>
		</section>
		<section id="links">
		<title>Links die frequently</title>
		<para>
There are links to 180 urls in this HOWTO (May 2002),
which came from postings to the LVS mailing list.
If people move/rename/delete/change their webpages/links once a year,
then I'm going to have to trackdown 15 websites each month.
If a site is gone and it isn't in google, I'm not going to be able to find it.
		</para>
		</section>
	</section>
	<section id="nomenclature">
	<title>Nomenclature/Abbreviations</title>
	<para>
If you use these terms when you mail us, we'll know what you're talking about.
	</para>
		<section id="preferred_names">
		<title>Preferred names</title>
		<itemizedlist>
			<listitem>
<emphasis>IPVS,ipvs,ip_vs</emphasis>
the code that patches the linux kernel on the <emphasis>director</emphasis>.
			</listitem>
			<listitem>
<emphasis>LVS, linux virtual server</emphasis>
This is the <emphasis>director</emphasis> + <emphasis>realservers</emphasis>.
Together these machines are the <emphasis>virtual server</emphasis>,
which appears as one machine to the <emphasis>client(s)</emphasis>.
			</listitem>
			<listitem>
<emphasis>director</emphasis>: the node that runs the <emphasis>ipvs</emphasis> code.
<emphasis>Clients</emphasis> <emphasis>connect</emphasis> to the <emphasis>director</emphasis>.
The <emphasis>director</emphasis> <emphasis>forwards</emphasis> packets to the realservers.
The <emphasis>director</emphasis> is nothing but an IP router with special rules
that make the <emphasis>LVS</emphasis> work.
			</listitem>
			<listitem>
<emphasis>realservers</emphasis>: the hosts that have the <emphasis>services</emphasis>.
The <emphasis>realservers</emphasis> handle the requests from the clients.
			</listitem>
			<listitem>
<emphasis>client</emphasis> the host or user level process that connects to the <emphasis>VIP</emphasis>
on the <emphasis>director</emphasis>
			</listitem>
			<listitem>
<emphasis>forwarding method</emphasis>
(currently <xref linkend="LVS-HOWTO.LVS-NAT"/>,
<xref linkend="LVS-HOWTO.LVS-DR"/>,
<xref linkend="LVS-HOWTO.LVS-Tun"/>).
The <emphasis>director</emphasis>
is a router with somewhat different rules for forwarding
packets than a normal router.
The <emphasis>forwarding method</emphasis>
determines how the <emphasis>director</emphasis>
sends packets from the <emphasis>client</emphasis>
to the <emphasis>realservers</emphasis>.
			</listitem>
			<listitem>
<emphasis>scheduling</emphasis> (<xref linkend="LVS-HOWTO.ipvsadm"/>) -
the algorithm the <emphasis>director</emphasis> uses to select a
<emphasis>realserver</emphasis> to service a new connection request
from a <emphasis>client</emphasis>.
			</listitem>
		</itemizedlist>
		</section>
		<section id="synonyms">
		<title>synonyms</title>
		<para>
Please use the first term in these lines. The other words are valid but
less precise (or are redundant).
		</para>
		<itemizedlist>
			<listitem>
<emphasis>director</emphasis>: load balancer, dispatcher, redirector.
			</listitem>
			<listitem>
<emphasis>realserver</emphasis>: servers, realservers, real-servers.
			</listitem>
			<listitem>
<emphasis>LVS</emphasis>: the whole cluster, the (linux) virtual server (LVS)
			</listitem>
		</itemizedlist>
		</section>
		<section id="virtual_services">
		<title>virtual services, scheduling groups</title>
		<para>
Here's the <command>ipvsadm</command> output of an LVS serving telnet and squid.
		</para>
<programlisting><![CDATA[
director:/etc/rc.d# ipvsadm
IP Virtual Server version 0.9.4 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  lvs.mack.net:squid rr
  -> rs1.mack.net:squid        Route   1      0          0
  -> rs2.mack.net:squid        Route   1      0          0
  -> rs3.mack.net:squid        Route   1      0          0
TCP  lvs.mack.net:telnet rr
  -> rs1.mack.net:telnet       Route   1      0          0
  -> rs2.mack.net:telnet       Route   1      0          0
]]></programlisting>
		<para>
In the above LVS, there are two
<emphasis>virtual services</emphasis>, telnet and squid.
There are also two <emphasis>virtual servers</emphasis>;
a <emphasis>virtual server</emphasis> for telnet (which has 2 realservers)
and a <emphasis>virtual server</emphasis> for squid (which has 3 realservers).
This is what the client sees; two services (and two servers).
		</para>
		<para>
Connections to each
<emphasis>virtual server</emphasis> are <emphasis>scheduled</emphasis> (here by "rr", round robin),
to the realservers which belong to the <emphasis>scheduling group</emphasis>.
Here the <emphasis>scheduling group</emphasis> for telnet is rs1,rs2.
The <emphasis>scheduling group</emphasis> for quid is rs1,rs2,rs3.
Connections to the telnet <emphasis>virtual server</emphasis> are scheduled independantly
of connections to the squid <emphasis>virtual server</emphasis>.
		</para>
		<para>
The above nomenclature can be extended for <xref linkend="LVS-HOWTO.fwmark"/>.
		</para>
		</section>
		<section id="scheduled_unit">
		<title>scheduling instance, scheduled unit, virtual connection</title>
		<para>
We don't have a good name for this. Suggestions welcome.
(We also don't talk much about this concept on the mailing list,
so we've done without a name).
		</para>
		<para>
The director needs to know how to schedule packets from the client to
the realservers.
The smallest unit for LVS is a tcpip connection,
<emphasis>i.e.</emphasis> all
packets that are part of a single tcpip session from a client
will be sent to the same realserver.
For a tcp virtual service, each tcp connection is scheduled separately,
with the first tcp connection going to one realserver,
and the next tcp connection going to the next realserver
assigned a connection from the scheduler.
The <emphasis>virtual connection</emphasis> is the same as the tcp connection.
		</para>
		<para>
For a <xref linkend="LVS-HOWTO.persistent_connection"/>
all tcp connections that are separated by less than the timeout period
are regarded as belonging to the same <emphasis>virtual connection</emphasis> and
are scheduled to the same realserver.
		</para>
		<para>
For udp, there is no such thing as a connection or session and
all packets from the client within a timeout period are scheduled to the
same realserver. (People aren't using LVS for udp services a whole lot).
The <emphasis>virtual connection</emphasis> then is all udp packets from a client
within a certain time period.
		</para>
		</section>
		<section id="multi-tier_servers">
		<title>backend (multi-tier) servers</title>
		<para>
The <emphasis>realservers</emphasis> sometimes are frontends
to other <emphasis>backend</emphasis> servers.
The <emphasis>client</emphasis> does not connect
to these <emphasis>backend</emphasis> servers
and they are not in the <command>ipvsadm</command> table.
		</para>
		<para>
<emphasis>e.g.</emphasis>
		</para>
		<itemizedlist>
			<listitem>
a <emphasis>realserver</emphasis> may run a web application.
The web application in turn connects to a database
on another <emphasis>backend</emphasis> server.
			</listitem>
			<listitem>
a webcaching <emphasis>realserver</emphasis> (<emphasis>e.g.</emphasis> a squid).
The squid connects to <emphasis>backend</emphasis> webserver(s).
			</listitem>
		</itemizedlist>
		<para>
These <emphasis>backend</emphasis> servers are setup separately from the LVS.
		</para>
		</section>
		<section id="server_ambiguous"><title>the term "the server" is ambiguous</title>
		<para>
People sometimes call the <emphasis>director</emphasis> or the <emphasis>realservers</emphasis>,
"the server".
Since the whole LVS appears as a server to the <emphasis>client</emphasis>
and since the <emphasis>realservers</emphasis> are also serving services,
the term "server" is ambiguous.
Do not use the term "the server" or "the lvs server" when talking about LVS.
Most often you are referring to the "director" or the "realservers".
Sometimes (<emphasis>e.g.</emphasis> when talking about throughput)
you are talking about the (whole) virtual server.
		</para>
		<para>
I use "realserver" as I despair of finding a reference to a "realserver"
in a webpage using the search keys "real" and "server".
Horms and I (for reasons that neither of us can remember) have been
pushing the term "real-server" for about a year, on the mailing list,
and no-one has adopted it. We're going back to "realserver".
		</para>
		</section>
		<section id="names_of_IPs">
		<title>names of IPs/networks in an LVS</title>
		<para id="lvs-diagram">
		</para>
<programlisting><![CDATA[
                        ________
                       |        |
                       | client | (local or on internet)
                       |________|
                          CIP
                           |
--                      (router)
                          DGW
                           | outside network
                           |
L                         VIP
i                      ____|_____
n                     |          | (director can have 1 or 2 NICs)
u                     | director |
x                     |__________|
                      DIP (and PIP)
V                          |
i                          | DRIP network
r         ----------------------------------
t         |                |               |
u         |                |               |
a        RIP1             RIP2            RIP3
l    _____________   _____________   _____________
    |             | |             | |             |
S   | realserver1 | | realserver2 | | realserver3 |
e   |_____________| |_____________| |_____________|
r
v
e
r
---
]]></programlisting>
<para>
The router has traditionally not been considered part of the LVS, because
often you do not have control over the router. However if you're a paying
customer, then the ISP will be glad to set up the router according to
your specifications. If you have access to the router, it can solve
<xref linkend="LVS-HOWTO.arp_problem"/> and can install filter rules.
</para>
<para>
Here are the names we use for the various IPs.
If you use them when asking questions on the mailing list,
we'll be able to answer your questions more easily.
</para>
<programlisting><![CDATA[
client IP     = CIP
virtual IP    = VIP - the IP on the director that the client connects to)
director IP   = DIP - the IP on the director in the DIP/RIP (DRIP) network
   (this is the realserver gateway for LVS-NAT)
realserver IP = RIP (and RIP1, RIP2...) the IP on the realserver
director GW   = DGW - the director's gw (only needed for LVS-NAT)
   (this can be the realserver gateway for LVS-DR and LVS-Tun)
]]></programlisting>
		<para>
The VIP and DIP are setup as secondary IPs,
(<emphasis>i.e.</emphasis>
there is another primary IP on that NIC),
so they can be moved to another duplicate director
following director failover.
For initial setup with a single director,
setting up the VIP and DIP as secondary IPs will make the
transition to a failover setup easier.
		</para>
		<para>
For a two director LVS (where directors failover),
the IPs on the <link linkend="drip">DRIP network</link> are
		</para>
<programlisting><![CDATA[
primary director IP	= PIP (the director which will be the master on bootup)
secondary director IP	= SIP (the director which will be the backup on bootup)
]]></programlisting>
		<para>
The DIP will be on the same NIC as PIP on bootup and will move to the
same NIC as SIP on director failover.
		</para>
		<para>
We don't seem to need a name for the primary IP on the outside of the director
- no-one ever talks about it.
		</para>
		<para>
If the clients are on the internet then the VIP will be a public (routable) IP.
The RIPs will usually not be routable
(<emphasis>i.e.</emphasis> they will be private IPs like 192.168.x.x/24).
If the realservers need to contact machines
on the internet for other reasons
(<emphasis>e.g.</emphasis> <xref linkend="LVS-HOWTO.3-Tier"/>)
then they will need routable IPs (which can be the RIPs, or can be another IP
on another NIC on another network).
		</para>
		<para>
We don't often need to explicitely name the networks in an LVS, but
here's some suggestsions
		</para>
		<itemizedlist>
			<listitem>
				<para id="drip">
<emphasis role="bold">DRIP network</emphasis>: the network containing the DIP
and RIPs. (OK you come up with a better name.)
				</para>
			</listitem>
			<listitem>
				<para>
<emphasis role="bold">network facing the internet</emphasis>
or the <emphasis role="bold">outside network</emphasis>: the network
on the director which receives packets from the outside world.
This shouldn't be called the VIP network,
as the VIP is also in the DRIP network (but not replying to arp calls)
on the realservers in LVS-DR and LVS-Tun.
				</para>
			</listitem>
		</itemizedlist>
		</section>
	</section>
	<section id="what_is_an_LVS">
	<title>What is an LVS? Can I use an LVS?</title>
	<para>
A Linux Virtual Server (LVS) is a cluster of servers
which appears to be one server to an outside client.
This apparent single server is called here a "virtual server".
The individual servers (realservers) are under
the control of a director (or load balancer),
which runs a Linux kernel patched to include the <emphasis>ipvs</emphasis> code.
The ipvs code running on the director is
<emphasis>the</emphasis> essential feature of LVS.
Other user level code is used to manage the LVS
(set rules for services handled, handle failover).
The director is basically a layer 4 router 
with a modified set of routing rules.
	</para>
	<para>
When a new connection is requested from a client to a service
provided by the LVS (eg httpd), the director will choose a realserver
for the client.
From then, all packets from the client
will go through the director to that particular realserver.
The association between the client and the realserver will
last for only the life of the tcp connection (or udp exchange).
For the next tcp connection, the director will choose a new
realserver (which may or may not be the same as the first realserver).
Thus a web browser connecting to an LVS serving a webpage consisting
of several hits (images, html page), may get each hit from a separate
realserver.
	</para>
	<para>
Since the director will send the client to an arbitary
realserver, the services must be either read only
(<emphasis>e.g.</emphasis> web services) or if read/write
(<emphasis>e.g.</emphasis> an on-line shopping cart)
some mechanism external to LVS must be provided for
propagating the writes to the other realservers on 
a timescale appropriate for the service 
(<emphasis>i.e.</emphasis> purchase of an item 
must decrement the stock on all other nodes before
the next client attempts to purchase the same item).
At best LVS is read mostly.
	</para>
	<para>
If you just want one of several nodes to be up
at any one time, and the other node(s) to become
active on failure of the primary node, then
you don't need LVS: you need a high availability
setup <emphasis>e.g.</emphasis> Linux-HA (heartbeat), 
vrrp or carp.
	</para>
	<para>
If you want independant servers at different locations,
then you want a geographically distributed server like
<link linkend="supersparrow">Supersparrow</link>.
	</para>
	<para>
Here are some <xref linkend="rrd_images"/>
	</para>
		<section id="what_is_a_VIP">
		<title>What is a VIP?</title>
		<para>
The director presents an IP called the Virtual IP (VIP) to clients.
(When using <xref linkend="LVS-HOWTO.fwmark"/>, VIPs are agregated into
groups of IPs, but the same principles apply as for a single IP).
When a client connects to the VIP, the director forwards the client's
packets to one particular realserver for the
duration of the client's connection to the LVS. This connection is chosen
and managed by the director. The realservers serve services
(eg ftp, http, dns, telnet, nntp, smtp) such as are found in
/etc/services or inetd.conf. The LVS presents one IP on the director
(the virtual IP, VIP) to clients.
		</para>
		<para>
Peter Martin <emphasis>p (dot) martin (at) ies (dot) uk (dot) com</emphasis> and John Cronin <emphasis>jsc3 (at) havoc (dot) gtf (dot) org</emphasis> 05 Jul 2001
		</para>
		<blockquote>
		<para>
The VIP is the address which you want to load balance i.e. the address of
your website. The VIP is usually an alias (<emphasis>e.g.</emphasis> eth0:1)
so that the VIP can be swapped between two directors if a fault is detected on one.
		</para>
		<para>
The VIP is the IP address
of the "service", not the IP address of any of the particular systems
used in providing the service (ie the director and the realservers).
		</para>
		<para>
The VIP be moved from one director to another backup director
if a fault is directed
(typically this is done by using
<filename>mon</filename> and <filename>heartbeat</filename>,
or something similar).
The director can have <link linkend="multiple_VIPs">multiple VIPs</link>.
Each VIP can have one or more services associated with it
<emphasis>e.g.</emphasis> you could have HTTP/HTTPS
balanced using one VIP, and FTP service (or whatever) balanced using
another VIP, and calls to these VIPs can be answered by the same or different
realservers.
		</para>
		<para>
Groups of VIPs and/or ports can be setup with <xref linkend="LVS-HOWTO.fwmark"/>.
		</para>
		<para>
The realservers have to be configured to work with the VIPs on the director
(this includes handling the <xref linkend="LVS-HOWTO.arp_problem"/>).
		</para>
		<para>
There can be
<xref linkend="LVS-HOWTO.persistent_connection"/> issues,
if you are using cookies or https,
or anything else that expects the realserver fulfilling the requests
to have some connection state information.
This is also addressed on the
<ulink url="http://www.linuxvirtualserver.org/docs/persistence.html">
LVS persistence page
</ulink>
		</para>
		</blockquote>
		</section>
		<section id="where_used">
		<title>Where do you use an LVS?</title>
		<itemizedlist>
			<listitem>
For higher throughput.
The cost of increasing throughput by adding
realservers in an LVS increases linearly,
whereas the cost of increased throughput by buying a larger single machine
increases faster than linearly
			</listitem>
			<listitem>
for redundancy. Individual machines can be switched out of the LVS,
upgraded and brought back on line without interuption of service to the clients.
Machines can move to a new site and brought on line one at a time while machines
are removed from the old site, without interruption of service to the clients.
</listitem><listitem> for adaptability. If the throughput is expected to change gradually (as a
business builds up), or quickly (for an event), the number of servers can be
increased (and then decreased) transparently to the clients.
			</listitem>
		</itemizedlist>
		</section>
		<section id="client_server_relationship">
		<title>Client/Server relationship is preserved in an LVS</title>
		<itemizedlist>
			<listitem>
Client sees only one IP address and believes it is connecting
to a single machine. IPs of all servers is mapped to one IP (the VIP).
While the client is connected to only one machine at a time,
however subsequent connections will be assigned to a new and likely
different machine.
			</listitem>
			<listitem>
servers at different IP addresses believe
they are contacted directly by the client.
			</listitem>
		</itemizedlist>
		</section>
		<section id="L4_switch">
		<title>LVS director is an L4 switch</title>
		<para>
In the computer beastiary, the director is a layer 4 (L4) switch.
The director makes decisions at the IP layer and just sees a stream
of packets going between the client and the realservers.
In particular an L4 switch makes decisions based on the IP information
in the headers of the packets.
		</para>
		<para id="supersparrow" xreflabel="Super Sparrow Project">
Here's a description of an L4 switch from
<ulink url="http://www.supersparrow.org/ss_paper/">Super Sparrow Global Load Balancer documentation</ulink>
		</para>
		<blockquote>
Layer 4 Switching: Determining the path of packets based on
information available at layer 4 of the OSI 7 layer protocol stack.
In the context of the Internet, this implies that the IP address
and port are available as is the underlying protocol, TCP/IP or UCP/IP.
This is used to effect load balancing by keeping an affinity
for a client to a particular server for the duration of a connection.
		</blockquote>
		<para>
This is all fine except
		</para>
		<para>
Nevo Hed <emphasis>nevo (at) aviancommunications (dot) com</emphasis> 13 Jun 2001
		</para>
		<blockquote>
The IP layer is L3.
		</blockquote>
		<para>
Alright, I lied.
TCPIP is a 4 layer protocol and these layers do not map well onto
the 7 layers of the OSI model.
(As far as I can tell the 7 layer OSI model is only used to torture
students in classes.)
It seems that everyone has agreed to pretend that tcpip
uses the OSI model and that tcpip devices like the LVS director
should therefore be named according to the OSI model.
Because of this, the name "L4 switch" really isn't correct,
but we all use it anyhow.
		</para>
		<para>
The director does not inspect the content of the packets and cannot
make decisions based on the content of the packets
(<emphasis>e.g.</emphasis> if the packet contains a <link linkend="cookie">cookie</link>,
the director doesn't know about it and doesn't care).
The director doesn't know anything about the application
generating the packets or what the application is doing.
Because the director does not inspect the content of the packets (layer 7, L7)
it is not capable of session management or providing
service based on packet content. L7 capability would be a useful
feature for LVS and perhaps this will be developed in the future
(preliminary ktcpvs code is out - May 2001 -
<xref linkend="LVS-HOWTO.L7_switch"/>).
		</para>
		<para>
The director is basically a router, with routing tables set up
for the LVS function.
These tables allow the director to forward packets to
realservers for services that are being LVS'ed.
If http (port 80) is a service that is being LVS'ed
then the director will forward those packets.
The director does not
have a socket listener on VIP:80 (i.e. netstat won't see a listener).
		</para>
		<para>
John Cronin <emphasis>jsc3 (at) havoc (dot) gtf (dot) org</emphasis> (19 Oct 2000)
calls these types of servers
(i.e. lots of little boxes appearing to be one machine) "RAILS"
(Redundant Arrays of Inexpensive Linux|Little|Lightweight|L* Servers).
Lorn Kay <emphasis>lorn_kay (at) hotmail (dot) com</emphasis> calls them RAICs (C=computer),
pronounced "rake".
		</para>
		</section>
		<section id="forward_packets">
		<title>LVS forwards packets to realservers</title>
		<para>
The director uses 3 different methods of forwarding.
		</para>
		<itemizedlist>
			<listitem>
LVS-NAT based on network address translation (NAT)
			</listitem>
			<listitem>
LVS-DR (direct routing) where the MAC addresses on the
packet are changed and the packet forwarded to the realserver
			</listitem>
			<listitem>
LVS-Tun (tunnelling) where the packet is IPIP encapsulated
and forwarded to the realserver.
			</listitem>
		</itemizedlist>
		<para>
Some modification of the realserver's ifconfig
and routing tables will be needed for LVS-DR and LVS-Tun forwarding.
For LVS-NAT the realservers only need a functioning tcpip stack (<emphasis>i.e.</emphasis>
the realserver can be a networked printer).
		</para>
		<para>
LVS works with all services tested so far (single and 2 port services)
except that LVS-DR and LVS-Tun cannot work with services that
initiate connects from the realservers (so far; identd and rsh).
		</para>
		<para>
The realservers can be indentical, presenting the same service
(eg http, ftp) working off file systems which are kept in sync
for content. This type of LVS increases the number of clients
able to be served. Or the realservers can be different, presenting a
range of services from machines with different services or
operating systems, enabling the virtual server to present a
total set of services not available on any one server. The
realservers can be local/remote, running Linux (any kernel)
or other OS's. Some methods for setting up an LVS have fast
packet handling (eg LVS-DR which is good for http and ftp)
while others are easier to setup (eg transparent proxy) but
have slower packet throughput. In the latter case, if the
service is CPU or I/O bound, the slower packet throughput
may not be a problem.
		</para>
		<para>
For any one service (eg httpd at port 80) all the realservers
must present identical content since the client could be connected
to any one of them and over many connections/reconnections, will
cycle through the realservers. Thus if the LVS is providing
access to a farm of web, database, file or mail servers, all
realservers must have identical files/content. You cannot split
up a database amongst the realservers and access pieces of it
with LVS.
		</para>
		<para>
The simplest LVS to setup involved clients doing read-only
fetches (<emphasis>e.g.</emphasis> a webfarm).
If the client is allowed to write to the LVS (<emphasis>e.g.</emphasis>
database, mail farm), then some method is required
so that data written on one realserver is
transferred to other realservers before the client disconnects
and reconnects again. This need not be all that fast (you
can tell them that their mail won't be updated for 10mins),
but the simplest (and most expensive) is for the mail farm
to have a common file system for all servers. For a database,
the realservers can be running database clients which connect
to a single backend database, or else the realservers can
be running independant database daemons which replicate their
data.
		</para>
		</section>
		<section id="any_linux">
		<title>LVS runs on Linux and FreeBSD directors</title>
		<para>
LVS was developed on Linux and historically uses a Linux director.
The Intel and Dec Alpha versions of LVS are known to work.
The LVS code doesn't have any Intel specific
instructions and is expected to work on any machine that runs Linux.
		</para>
		<para>
In Apr 2005, LVS was ported to FreeBSD by Li Wang
		</para>
		<para>
Li Wang <emphasis>dragonfly (at) linux-vs (dot) org</emphasis> 2005/04/16
		</para>
		<para>
The URL is:
<ulink url="http://dragon.linux-vs.org/~dragonfly/htm/lvs_freebsd.htm">FreeBSD port of LVS</ulink>
(http://dragon.linux-vs.org/~dragonfly/htm/lvs_freebsd.htm).
Here's a 
<ulink url="http://dragon.linux-vs.org/~dragonfly/software/doc/ipvs_freebsd/performance.html">
performance test on FreeBSD(version 0.4.0)</ulink>
(http://dragon.linux-vs.org/~dragonfly/software/doc/ipvs_freebsd/performance.html).
		</para>
		</section>
		<section id="lvs_different_for_different_kernels">
		<title>Code for LVS is different for each kernel series</title>
		<para>
There are differences in the coding for LVS for the 2.0.x, 2.2.x,
2.4.x and 2.6.x kernels.
Development of LVS on 2.0.36 kernels has stopped (May 99).
Code for 2.6.x kernels is relatively new.
		</para>
		<para>
The 2.0.x and 2.2.x code is based on the masquerading code. Even if you
don't explicitely use ipchains (eg with LVS-DR or LVS-Tun),
you will see masquerading entries with `ipchains -M -L` (or `netstat -M`).
		</para>
		<para>
Code for 2.4.x kernels was rewritten
to be compatible with the netfilter code (i.e. its entries will
show up in netfilter tables).
It is now production level code.
Because of incompatibilities with LVS-NAT for 2.4.x LVS was in
development mode (till about Jan 2001) for LVS-NAT.
		</para>
		</section>
		<section id="2.4_SMP_kernel">
		<title>kernels from 2.4.x series are SMP for kernel code</title>
		<para>
2.4.x kernels are SMP for kernel code as well as user space
code, while 2.2.x kernels are only SMP for user space code.
LVS is all kernel code. A dual CPU director running a 2.4.x
kernel should be able to push packets at twice the rate
of the same machine running a 2.2 kernel (if other resources
on the director don't become limiting).
(Also see the section on <xref linkend="FAQ:smp_helps"/>.)
		</para>
		</section>
		<section id="realserver_OS">
		<title>OS for realservers</title>
		<para>
You can have almost any OS on the realservers (all are
expected to work, but we haven't tried them all yet).
The realservers only need a tcpip stack -
a networked printer can be a realserver.
		</para>
		</section>
		<section id="ethernet">
		<title>LVS works on ethernet</title>
		<para>
LVS works on
<ulink url="http://www.ethermanage.com/ethernet/ethernet.html">ethernet</ulink>.
		</para>
		<para>
		There are some limitations on using
<link linkend="ATM">ATM</link>.
		</para>
		<para>
Firewire: (from the Beowulf mailing list - Donald Becket 5 Dec 2002):
The firewire transport layer (IEEE1394) does run
<ulink url="http://developer.apple.com/firewire/IP_over_FireWire.html">
IP over FireWire</ulink>.
However firewire is designed for fixed size repeated frames (video or
continuous disk block reads), but has overhead for other communication.
Throughput is 400Mbps but worst case latency is high (msec range).
		</para>
		<para>
Oracle has released GPL libraries for clustering Linux boxes over FireWire
(http://www.ultraviolet.org/mail-archives/beowulf.2002/2977.html, link dead Dec 2003).
		</para>
		</section>
		<section id="ipv6"><title>LVS works on IPV6</title><para>
Seiji Tsuchiike <emphasis>tsuchiike (at) yggr-drasill (dot) com</emphasis> 02 Jun 2002
		<blockquote>
We just implemented IPv6 to lvs.
We think that Basic Mechanism is same.
(http://www.yggr-drasill.com/LVS6/documents.html. link dead Dec 2003,
but Sep 2004 Horms says its alive).
		</blockquote>
		</para>
		</section>
		<section id="ipvs_continually_developed">
		<title>LVS is continually being developed</title>
		<para>
LVS is continually being developed and usually only the more recent kernel
and kernel patches are supported. Usually development is incremental,
but with the 2.2.14 kernels the entries in the /proc file system changed
and all subsequent 2.2.x versions were incompatible with previous versions.
		</para>
		</section>
		<section id="64_bit">
		<title>LVS is 64 bit</title>
		<para>
Kenny Chamber
		</para>
		<blockquote>
Has anybody here successfully setup lvs-director on sparc64 machine?
I need to know which distro is OK for this.
		</blockquote>
		<para>
Ratz 16 Dec 2004
		</para>
		<para>
Yes. Just recently.
Debian is fine, I reckon Gentoo would do as well.
		</para>
		<para>
INFO: It could be that your ipvsadm binary that comes to instrument the 
kernel tables for LVS is broken with regard to 64bit'ness. You then need 
to download the latest sources and recompile adding '-m64' to the 
CFLAGS. That's all, other than that it seems to work nicely.
		</para>
		<para>
Btw: I took Debian testing, probably not too wise but on the other hand 
I needed more up to date tools. I wouldn't know of too many other 
Distros that have up to date Sparc64 support. Suse used to have, but 
they dropped support a while ago unfortunately.
		</para>
		<para>
Justin Ossevoort <emphasis>justin (at) snt (dot) utwente (dot) nl</emphasis> 16 Dec 2004
		</para>
		<para>
Well our plain debian-sarge here did it just as painlessly as our x86
based machines. So as long as your distro has ipvs (and of course a
sparc tree ;)) support you're in the green.
		</para>
		<para>
liuah
		</para>
		<blockquote>
I want to know whether LVS can work with 64-bit boxes.
If I use LVS-DR, how can I apply the hidden patch to 64-bit linux,
using kernel is 2.4.18?
		</blockquote>
		<para>
ratz 29 Nov 2003
		</para>
		<para>
Yes.

The only problem I see is if either the counters or the hashtable
handling has some bug with 32/64-bit signedness and wrong shift
operators. Just let us know if you experience flakyness on your director ;).
The hidden patch for your kernel is:
http://www.ssi.bg/~ja/hidden-2.4.19pre5-1.diff
I hope you are aware of the fact that 2.4.18 is really buggy in many
ways. I know that some 64-bit archs have been lagging behind in the 2.4.x
tree but if I was you I would upgrade to a newer kernel.
		</para>
		<para>
Peter Mueller <emphasis>pmueller (at) sidestep (dot) com</emphasis> 29 Nov 2003
		</para>
		<para>
The one for straight 2.4.18 is http://www.ssi.bg/~ja/hidden-2.4.5-1.diff.
Since he said 2.4.18 I would suspect he's running Debian.  If you want a
Debian kernel with LVS+hidden use the
<ulink url="http://www.ultramonkey.org/">
Ultramonkey kernel</ulink>
(http://www.ultramonkey.org/").
		</para>
		<para>
liuah <emphasis>liuah (at) langchaobj (dot) com (dot) cn</emphasis> 02 Dec 2003
		</para>
		<para>
The hidden patch compiles and runs on our 64-bit servers successfully.
		</para>
		</section>
		<section id="other_documentation">
		<title>Other documentation</title>
		<para>
For more documentation, look at the LVS web site
(eg a talk I gave on how LVS works on
<ulink url="http://www.linuxvirtualserver.org/Joseph.Mack/linuxexpo99/linuxexpo2.html">2.0.36 kernel directors</ulink>)
		</para>
		<para>
Julian has written
<ulink url="http://www.ssi.bg/~ja/">Netparse</ulink>
for which we don't have a lot of documentation yet.
		</para>
		<para>
For those who want more understanding of netfilter/iptables etc, here
are some starting places. These topics are also covered in many
other places.
		</para>
		<itemizedlist>
			<listitem>
<ulink url="http://www.sunbeam.franken.de/projects/packetjourney/">
Harald Welte (of the netfilter team) description of what happens to a packet under 2.4
</ulink>
			</listitem>
			<listitem>
			<para>
<ulink url="http://www.sunbeam.franken.de/projects/conntrack+nat-HOWTO/">
Harald Welte (of the netfilter team) conntrack HOWTO</ulink>.
			</para>
			<para>
Conntrack is used in filter rules as a way of accepting "related" packets, <emphasis>e.g.</emphasis>
the data packets associated with an established ftp connection.
Regular filter rules written for these data packets
would accept ftp data packets (port 20) even if
there were not in response to a PORT call from an already
established ftp connection on port 21.
In this case the filter rules would accept packets that are part of a DoS attack.
			</para>
			<para>
Conntrack is CPU intensive and lowers throughput
(see <link linkend="conntrack">effect of conntrack on throughput</link>).
			</para>
			</listitem>
			<listitem>
<ulink url="http://www.netfilter.org">the docs/FAQs/HOWTOs on the netfilter site</ulink>

			</listitem>
			<listitem>
<ulink url="http://www.tldp.org/LDP/nag2/">Linux Network Administrators Guide</ulink>

			</listitem>
		</itemizedlist>
		</section>
		<section id="lvs_is_not_simple">
		<title>LVS is not simple to install, get going or keep running</title>
		<para>
This is not a utility where you run
<command>../configure &amp;&amp; make &amp;&amp; make check &amp;&amp; make install</command>,
put a few values in a <filename>*.conf</filename> file and you're done.
LVS rearranges the way IP works so that a router and server (here called director and realserver),
reply to a client's IP packets as if they were one machine.
You will spend many days, weeks, months figuring out how it works.
LVS is a lifestyle, not a utility.
		</para>
		<para>
That said, you should be able to get a simple LVS-NAT setup working in a few hours without
really understanding a whole lot about what's going one (see the
<link linkend="mini-HOWTO">LVS-mini-HOWTO</link>).
		</para>
		</section>
		<section id="lvs_failure">
		<title>LVS Failure</title>
		<para>
LVS supplies high throughput using multiple identically configured machines.
You would like to be able to swap out machines for planned maintenance and
to automatically handle node failure (high availability).
		</para>
		<para>
The LVS itself does not provide high availability.
As you will read here, the software layer that provides high availability
should be logically separate to the layer that it monitors.
The writing of software that attempts to determine whether a machine is
working, is somewhat of a black art.
There are several packages used to help provide high availability for LVS
and these are discussed in the <link linkend="LVS-HOWTO.failover">High Availability LVS</link> section.
		</para>
		<para>
While it is relatively easy to monitor the functionality of the realservers,
fail-out of directors is more difficult.
An even greater problem is handling failure of nodes which are holding state information.
		</para>
		</section>
	</section>
	<section id="minimal_knowledge">
	<title>Minimal knowledge required</title>
	<para>
The mailing list and HOWTOs cover information specific to LVS.
The rest you have to handle yourself.
All of us knew nothing about computers when we first started,
we learnt it, and you can too (we're not saying it's easy).
If you can't setup a simple LVS from the
<link linkend="mini-HOWTO">LVS-mini-HOWTO</link>,
without breaking into a major sweat
(or being able to tell us what's wrong with the instructions),
then you need to do some more homework.
(Also see 
<ulink url="http://www.austintek.com/LVS/LVS-HOWTO/mini-HOWTO/#doesnt_work">Help! My LVS doesn't work</ulink>.)
	</para>
	<blockquote>
	<para>
Ratz <emphasis>ratz (at) tac (dot) ch</emphasis>
	</para>
	<para>
To be able to setup/maintain an LVS, you need to be able to
	</para>

	<itemizedlist>
		<listitem>
know how to patch and compile a kernel
		</listitem>
		<listitem>
the basics of shell-scripting
		</listitem>
		<listitem>
have intermediate knowledge of TCP/IP
		</listitem>
		<listitem>
have read the man-page, the online-documentation and LVS-HOWTO (this document)
(and the
<link linkend="mini-HOWTO">LVS-mini-HOWTO</link>)
		</listitem>
		<listitem>
know basic system administration (<emphasis>e.g.</emphasis> iptables; syslog; find, compile,
install code from source files; use cpan to find perl modules).
		</listitem>
	</itemizedlist>
	</blockquote>
	</section>
	<section id="getting_technical_help">
	<title>Getting Technical Help</title>
	<para>
All of the people on this list are replying for free in 
their spare time. The best we can do on this list is to give 
solutions to technical problems on setting up and running 
LVS. I give about 15secs to a posting to decide if I've got 
something useful to say. The posting has to indicate that 
the person has analysed the problem to a stage where an 
answer exists. If _they_ can't describe the problem, 
there's no point in replying - they won't understand the answer.
	</para>
	<para>
Please don't e-mail me privately with general questions
(feel free to cc: me if you want).
The mailing list will archive your question
and the answer(s) which can be retrieved later.
Other people may have more interesting,
relevant or useful comments than I will.
If you are writing to me in the hopes of avoiding the humiliation
of publically showing your ignorance on the mailing list, it's not going to happen.
We've had too many good ideas from "ignorant" people to let this happen.
If your question has been answered many times before 
and it's in the HOWTO and the archives,
you'll be told to read the HOWTO, that's all.
	</para>
	<para>
To get technical help:
	</para>
	<itemizedlist>
		<listitem> 
Read the docs on the website, the HOWTOs, and
search the mailing list archives. 
The HOWTO (at the top) has a link to a search engine of all known LVS documentation. 
It will probably return several webpages.
You'll have to find the entry from there.
		</listitem>
		<listitem> The
<link linkend="mini-HOWTO">
LVS-mini-HOWTO</link>
shows you how to setup a simple 3 node (client, director, realserver)
LVS without you needing to understand a whole lot about how an LVS works.
		</listitem>
		<listitem>
after you've done a search of the docs, then post to the mailing list.
		</listitem>
		<listitem> updates/problems/bugs - post to the mailing list 
		</listitem>
	</itemizedlist>
	<para>
Jakub Suchy <emphasis>jakub (at) rtfm (dot) cz</emphasis> 13 Jan 2005
	</para>
	<para>
Please read:
<ulink url="http://www.catb.org/~esr/faqs/smart-questions.html">smart questions</ulink>
(http://www.catb.org/~esr/faqs/smart-questions.html)
before asking questions.
	</para>
	<para>
Please only post relevant lines of a debug dump.
If you post the whole dump, because you don't understand it,
then it will fill up the archive machine and everyone's mail box. 
If we need the whole debug, we'll ask for it and you can send it to us off-list.
	</para>
		<section id="problem_people_1">
		<title>Problem people 1</title>
		<para>
It's hard to believe, but we get postings like
		</para>
		<blockquote>
recompiling the kernel is hard (or I don't read HOWTOs),
can't you guys cut me some slack and just tell me what to do?
		</blockquote>
		<para>
I expect the people who post these statements don't read this HOWTO,
so I may be wasting my time, but - No.
The people on the mailing list answer questions for free,
and have other important things to do, like keeping up with /. 
and checking our e-mail.
When we're at home, we drink beer and watch Gilligan's Island re-runs.
		</para>
		</section>
		<section id="problem_people_2">
		<title>Problem people 2</title>
		<blockquote>
can anybody tell me how to setup a windows realserver?
thank you very much! I'm in a  hurry.
		</blockquote>
		<para>
<emphasis>robert (dot) gehr (at) web2cad (dot) de</emphasis>
		</para>
		<para>
I can't think of anyone who has set up lvs in a hurry :-)
		</para>
		</section>
		<section id="RedHat">
		<title>Problem People 3: People using RedHat LVS</title>
		<para>
RedHat have LVS in their standard distribution kernel.
This gives people the idea that they can setup
LVS from their standard RedHat distribution just by clicking on a few
buttons or running some scripts.
From reading the postings to the mailing list,
it's more difficult than doing it our way.
You still have to understand LVS and then afterwards,
you have to figure out what RedHat did to it.
One of the major wastes of time
and source of aggravation for me personally on the LVS mailing list,
is postings from people using RedHat LVS who assume that it's the same as LVS,
and who post as if they're using our setup methods.
Just saying that you're using a RedHat distribution doesn't tell us anything,
since you can setup LVS our way in RedHat.
Things you need to know before you post -
		</para>
		<itemizedlist>
			<listitem>
There are reasons for wanting to setup LVS in a standard RedHat distribution
(<emphasis>e.g.</emphasis> RedHat is "approved" in your location whereas "Linux" isn't).
			</listitem>
			<listitem>
There is information in this HOWTO (<xref linkend="pbs_nutshell"/>)
and in the various links from here which show you how to setup RedHat LVS.
			</listitem>
			<listitem>
We have a method of setting up LVS which works for all distributions (including RedHat).
We are not interested in learning, understanding, debugging, supporting or fixing
a setup method that only works for one distibution.
			</listitem>
			<listitem>
RedHat don't talk to us about what they do and while
they may monitor the LVS mailing list,
rarely (only about once a year, that I can tell)
do they reply to people having problems with RedHat LVS.
It appears that RedHat does not think their version of LVS worthy of much support
and I agree with them.
			</listitem>
			<listitem>
If you setup LVS the RedHat way,
you still need an understanding
of how an LVS works and is setup (just like everyone else),
before posting to the mailing list.
			</listitem>
		</itemizedlist>
		<para>
If you are setting up with RedHat and want help with it,
make sure that you describe what you've done,
that you're using the RedHat files and how you've set it up,
otherwise we'll assume that you're setting up using our methods.
		</para>
		</section>
		<section id="why_you_may_not_get_an_answer">
		<title>Why you may not get an answer</title>
		<itemizedlist>
			<listitem>
				<para>
no-one knows.
				</para> 
				<para>
The <xref linkend="LVS-NAT_ftp_bug"/> took a long time to figure out. 
Since no-one else had seen the problem, we didn't know at first if it was a problem with LVS. 
It wasn't till 6 months later when someone else had the same symptom,
and found that it only occured when the ftp helper module was loaded, 
that we could do something.
				</para>
				<para>
I once needed to do something with <filename>iproute2</filename> 
that I spent about 3 weeks trying to figure out. 
No-one on the list knew the answer. 
I had to post off-line to someone who could figure it out for me.
				</para>
			</listitem>
			<listitem>
				<para>
We may not have a useful answer.
				</para>
				<para>
If you post saying "I want to build an LVS with (list of hardware);
do you think it will work?", all we can say is "probably".
				</para>
				<para>
Often when questions like this come up, 
there are people who are happy to share their experiences, 
so there's no harm in posting such a question.
In general the people who've been working with LVS for years will expect 
you to have read the docs and know what LVS does before you post. 
In the time I alot for a reply, 
I don't have time to figure out whether in your case LVS is best for you 
- you should pay a consultant to do this if you can't do it yourself.
				</para>
			</listitem>
			<listitem>
				<para>
Your question may not be well posed.
				</para>
				<para>
We are reading the postings in our spare time. 
You will get at most 30secs of attention before we figure out whether 
we can help you, this will take a bit of thinking, or we can't help you.
				</para>
				<para>
If you have a long posting in which you haven't figured out which parts
are causing the problem and which parts are working, then we aren't
going to try to figure it out either. 
Post the minimum setup that will produce the problem.
				</para>
			</listitem>
			<listitem>
You haven't read the HOWTO.
			</listitem>
		</itemizedlist>
		</section>
		<section id="edit_posts">
		<title>Edit your posts! (top, bottom and in-line posting)</title>
		<para>
Please edit the posting you're replying to, leaving only the parts relevant to your reply.
We don't need to see material from previous posts irrelevant to the current posting,
and the disk archive doesn't either.
		</para>
		<para>
Reply in-line, <emphasis>i.e.</emphasis> following each statement by the poster.
Here's a posting on the subject from one of the kernel mailing lists.
		</para>
		<para>
Greg KH <emphasis>greg (at) kroah (dot) com</emphasis> 16 Nov 2005 
		</para>
<programlisting><![CDATA[
A: http://en.wikipedia.org/wiki/Top_post
Q: Were do I find info about this thing called top-posting?
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
A: No.
Q: Should I include quotations after my reply?
]]></programlisting>
		</section>
	</section>
	<section id="after_youve_got_help">
	<title>After you've Got Technical Help</title>
	<para>
In most cases when a problem is solved, there's enough info on the mailing list
to see how it worked and we can write it up here for the next people. 
Occasionally, we get a posting "I've worked it out. Thanks for the help."
When this happens we have no idea what the solution was 
and will have to reinvent it for the next person.
	</para>
	<para>
If you've got help from all the unpaid people on the mailing list who've given their
spare time to help you, 
when they could instead have been watching Gilligan's Island reruns, 
please write it up for the HOWTO. 
When I write to people asking for their solution
I don't want to hear that you're busy and have a job. 
We're busy, have jobs, kids, homework to do and tax forms to fill in 
and we stopped what we were doing to help you. 
Here's a template.
	</para>
	<itemizedlist>
		<listitem>
what you wanted to do
		</listitem>
		<listitem>
why/how it didn't work
		</listitem>
		<listitem>
what you needed to do to get it to work
		</listitem>
		<listitem>
how the solution works
		</listitem>
	</itemizedlist>
	</section>
	<section id="subscribing">
	<title>Mailing list: subscribing, unsubscribing, searching </title>
	<para>
Thanks to Hank Leininger for the mailing list archive which is searchable not
only by subject and author, but also by strings in the body.
Hank's resource has been of great help in assembling this HOWTO.
	</para>
	<para>
The <ulink url="http://www.linuxvirtualserver.org/mailing.html">mailing list</ulink>
is available for further questions.
A single mailing list handles developers, new
users and old users and has about 0-20 postings a day.
You don't have to join the mailing list to read the archives.
If you want to post questions, then you have to join.
If you aren't subscribed and you post (or you post from
an unsubscribed address),
you'll get a reply saying that your posting is
"awaiting moderator approval".
It isn't; because of the volume of spam,
we no longer review these messages - they're deleted.
	</para>
	</section>
	<section id="problem_report">
	<title>
	Mailing list: posting to
	</title>
	<para>
Please send e-mail with straight ascii (not html)
and turn line-wrap on (some mails come with each
paragraph on a single long line).
	</para>
	<para>
If you're stuck with posting from a Windows machine
or Lotus notes, or using Lookout, 
where each paragraph is sent as one line: 
	</para>
	<blockquote>
		<para>
Francois JEANMOUGIN <emphasis>Francois (dot) JEANMOUGIN (at) 123multimedia (dot) com</emphasis> 09 Jul 2004
		</para>
<programlisting><![CDATA[
System manager -> Global Settings -> Internet Message Format -> Default (or
the one used) -> Advanced -> word wrap 
]]></programlisting>
		<para>
like shown in
<ulink url="http://www.lemis.com/email/fixing-outlook.html">
fixing outlook</ulink>
(http://www.lemis.com/email/fixing-outlook.html)
especially in
<ulink url="http://www.lemis.com/email/exchsrvr-wordwrap.gif">
word wrap</ulink>
(http://www.lemis.com/email/exchsrvr-wordwrap.gif),
but this is a very old version of exchange.
		</para>
	</blockquote>
	<para>
Please don't turn on your vacation message, intended only for your work mates,
for messages from a list.
<emphasis>e.g.</emphasis>
	</para>
<programlisting><![CDATA[
I will be out of the office starting  07/30/2004 and will not return until 08/03/2004.
]]></programlisting>
	<para>
The LVS mailing list doesn't want to know.
	</para>
	<para>
Dan Moljar Aug 2004
	</para>
	<para>
For Lotus Notes:
The client is not configured correctly.
In the 'Out of Office' enable dialog under the 'Exceptions' tab, 
there is a check box for 'Do not reply to Internet Addresses'. 
Check it.
The server shouldn't do it to begin with, 
but you can make the client stop.
	</para>
	<para>
There's always new ideas and questions being posted on the mailing list.
We don't expect this to stop.
There are many complexities to LVS and we don't expect
new people to understand any more about LVS that we did when we started.
No-one is expected to know/understand everything
in the docs but your questions will be better received,
if you've done your homework,
if you have setup the test configurations here,
have at least perused this HOWTO (yes we know it's big),
and have looked at the
<ulink url="http://www.linuxvirtualserver.org/mailing.html">mail archives</ulink>.
We can't help you if you just tell us that you've read the documents and your LVS
doesn't work.
To you, all problems look the same ("it doesn't work").
To help you, we need more information.
We at least need the forwarding method,
the service(s) being forwarded, the number of networks
and the output of ipvsadm in the problem state.
	</para>
	<para>
Before you come up on the mailing list -
	</para>
	<itemizedlist>
		<listitem>
Read the LVS-HOWTO (this document) and the
<link linkend="mini-HOWTO">LVS-mini-HOWTO</link>
		</listitem>
		<listitem>
		<para>
Set up a simple LVS (3 nodes: client, director, realserver)
with LVS-DR or LVS-NAT forwarding,
with the service telnet using the instructions in the LVS-mini-HOWTO.
You should be able to do this starting from a
freshly downloaded kernel from ftp.kernel.org and the LVS patches
(ipvs and the hidden patch if you have 2.4.x realservers).
		</para>
		<para>
<emphasis role="bold">Don't</emphasis> 
setup first with http, with filter rules, with firewalls, with complicated
file systems (<emphasis>e.g.</emphasis> coda, nfs) or network accelators
- debug all these nifty things after you have LVS working with telnet
and with no filter rules.
		</para>
		<para>
<emphasis role="bold">Do</emphasis> use standard compilers (gcc-2.95.3), tools
and utilities (<command>ifconfig</command> or <filename>iproute2</filename>).
		</para>
		<para>
<emphasis role="bold">Do not</emphasis> use non-standard tools particular to a distribution
designed to capture market share (<emphasis>e.g.</emphasis> <command>ifup</command>).
		</para>
		</listitem>
		<listitem>
If you are using one of the packages that can be used with LVS
(<emphasis>e.g.</emphasis>
heartbeat from the Linux HA project http://www.henge.com/&#126;alanr/ha,
or piranha from Redhat),
again we may know what the problem is,
but they need the feedback that you can't get it to work, not us.
Many of us are on each others'
mailing lists and we try to help when we can,
but the best people to handle the problem are the developers for each package.
		</listitem>
		<listitem>
Consult the
<ulink url="http://marc.theaimsgroup.com/?l=linux-virtual-server&amp;r=1&amp;w=2">
LVS mailing list archives</ulink>.
		</listitem>
		<listitem>
Use our jargon as best you can.
The machine names will be client, director, realserver1, realserver2...
IPs are CIP, VIP, RIP, DIP.
If you do this,
we won't have to translate "susanne" and "annie" to their
functional names as we scan your posting.
		</listitem>
		<listitem>
we need to know your kernel
(<emphasis>e.g.</emphasis> 2.2.14)
and the ip_vs patch that was applied to it (eg 0.9.11),
whether you are using LVS-DR, LVS-NAT or LVS-Tun.
Tell us
		<itemizedlist>
			<listitem>
what you did
			</listitem>
			<listitem>
what you expected
			</listitem>
			<listitem>
what you got and why that's a problem
			</listitem>
		</itemizedlist>
		</listitem>
	</itemizedlist>
	<para>
If you don't understand your problem well,
here's a suggested submission format from Roberto Nibali
<emphasis>ratz (at) tac (dot) ch</emphasis>
	</para>
	<orderedlist>
		<listitem>
System information, such as kernel, tools and their versions.
		<para>
Example:
		</para>

<programlisting><![CDATA[
hog:~ # uname -a
Linux hog 2.2.18 #2 Sun Dec 24 15:27:49 CET 2000 i686 unknown

hog:~ # <command>ipvsadm</command> -L -n | head -1
IP Virtual Server version 1.0.2 (size=4096)

hog:~ # <command>ipvsadm</command> -h | head -1
<command>ipvsadm</command> v1.13 2000/12/17 (compiled with popt and IPVS v1.0.2)
]]></programlisting>
		</listitem>
		<listitem>
Short description and maybe sketch of what you intended to setup.
		<para>
Example for LVS-DR:
		</para>

<programlisting><![CDATA[
	o Using LVS-DR, gatewaying method.
	o Load balancing port 80 (http) non-persistent.
	o Network Setup:

                        ________
                       |        |
                       | client |
                       |________|
			   | CIP
                           |
			(router)
			   |
			   | GEP
                 (packetfilter, firewall)
                           | GIP
                           |       __________
                           |  DIP |          |
                           +------+ director |
                           |  VIP |__________|
                           |
         +-----------------+----------------+
         |                 |                |
     RIP1, VIP         RIP2, VIP        RIP3, VIP
    ____________      ____________    ____________
   |            |    |            |  |            |
   |realserver1 |    |realserver2 |  |realserver3 |
   |____________|    |____________|  |____________|


	CIP  = 212.23.34.83
	GEP  = 81.23.10.2	(external gateway, eth0)
	GIP  = 192.168.1.1	(internal gateway, eth1, masq or NAT)
	DIP  = 192.168.1.2	(eth0:1, or eth1:1)
	VIP1 = 192.168.1.110	(director: eth0:110, realserver: lo0:110)
	RIP1 = 192.168.1.11
	RIP2 = 192.168.1.12
	RIP3 = 192.168.1.13
	DGW  = 192.168.1.1	(GIP for all realserver)

	o ipvsadm -L -n

hog:~ # ipvsadm -L -n
IP Virtual Server version 1.0.2 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port          Forward Weight ActiveConn InActConn
TCP  192.168.1.10:80 wlc
  -> 192.168.1.13:80             Route   0      0          0
  -> 192.168.1.12:80             Route   0      0          0
  -> 192.168.1.11:80             Route   0      0          0
]]></programlisting>
		<para>
The output from ifconfig from all machines (abbreviated, just need the
IP, netmask etc), and the output from netstat -rn.
		</para>
		</listitem>

		<listitem>
What doesn't work.
Show some output from
<command>tcpdump</command>,
<command>ipchains</command>/<command>ip_tables</command>,
<command>ipvsadm</command> and
<filename>kernlog</filename>.
Later we may ask you for a more detailed configuration like routing table,
OS-version or interface setup on some machines used in your setup.
Tell us what you expected. Example:

<programlisting><![CDATA[
ipchains -L -M -n (2.2.x) or cat /proc/net/ip_conntrack (2.4.x)
echo 9 > /proc/sys/net/ipv4/vs/debug_level && tail -f /var/log/kernlog
tcpdump -n -e -i eth0 tcp port 80
route -n
netstat -an
ifconfig -a
]]></programlisting>
		<para>
<command>tcpdump</command> listings are difficult to read.
If you post one, please change the IPs to VIP, CIP, RIP1..n, DIP etc.
Since you'll likely be on a switched network, <command>tcpdump</command>
will only see packets to that NIC. Tell us which machine (director, realserver...)
and the NIC (if there are two NICs on the machine) that it was run on.
		</para>
		</listitem>
	</orderedlist>
	</section>
	<section id="bug_fixes">
	<title>Bug Fixes</title>
	<para>
It's wonderful to get an unsolicited bug fix.
Please let us know what it does and why it's better than the current file.
A new version of a file without any information about what it does,
or what it fixes isn't much use to us.
	</para>
	</section>
	<section id="other_solutions">
	<title>Other load balancing solutions, GPL, opensource and commercial</title>
		<section id="open_source_solutions">
		<title>Open Source and GPL solutions</title>
		<para>
from <emphasis>lvs (at) spiderhosting (dot) com</emphasis>
<ulink url="http://dmoz.org/Computers/Software/Internet/Site_Management/Load_Balancing/">a list of load balancers</ulink>
		</para>
		<para>
Brent Cook <emphasis>busterb (at) mail (dot) utexas (dot) edu</emphasis> 28 Mar 2002
		</para>
		<blockquote>
There's the http://www.bsdshell.net/ HighUpTime (HUT) projec (link dead Apr 2003).
It's FreeBSD.
		</blockquote>
		<para>
The HUT author, Sebastian Petit
<emphasis>spe (at) selectbourse (dot) net</emphasis> has joined the LVS mailing list.
		</para>
		<para>
For L7 Switching see the <link linkend="DRWS">DRWS project</link>.
		</para>
		<para>
BSD load balancing:
		</para>
		<para>
Roberto Nibali <emphasis>ratz (at) tac (dot) ch</emphasis> 05 Nov 2003
		</para>
		<para>
As already mentioned by others, LVS will not work on FreeBSD as director due to
the kernel part. Using FreeBSD on the RS is of course ok.
The BSD folks have not shown bigger interest in adopting the LVS idea or parts
of the code yet.
If you're interested in load balancing and HA Solutions under FreeBSD, you could
check out following links:
		</para>
<programlisting><![CDATA[
http://www.bsdshell.net/hut_fvrrpd.html
http://www.backhand.org/wackamole/
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/isp/2003-05/0026.html
http://redundancy.redundancy.org/fbsd_lb.html
]]></programlisting>

		<para>
Gavin Henry <emphasis>ghenry (at) suretecsystems (dot) com</emphasis> 06/13/2005 
		</para>
		<blockquote>
<ulink url="http://geminis.dyndns.org/wordpress/index.php/2005/06/12/loadbalancer-less-clusters-on-linux/">ClusterIP</ulink> by Harald Welte.
What is the list's view on it?
		</blockquote>
		<para>
Gavin Henry 
<emphasis>ghenry (at) suretecsystems (dot) com</emphasis> 13 Jun 2005
		</para>
		<para>
The man page for more recent versions of iptables says:
		</para>
		<blockquote>
CLUSTERIP: This module allows you to configure a simple
cluster of nodes that share a certain IP and MAC address
without an explicit load balancer in front of them
		</blockquote>
		<para>
Horms 
		</para>
		<para>
Been there, done that. Works, but is it neccessary?
<ulink url="http://www.ultramonkey.org/papers/active_active/">LVS with upto 16 directors active</ulink>
(http://www.ultramonkey.org/papers/active_active/)
		</para>
		</section>
		<section id="commercial_solutions">
		<title>List of Commercial Solutions</title>
		<para>
Cahya Wirawan <emphasis>cwirawan (at) email (dot) archlab (dot) tuwien (dot) ac (dot) at</emphasis> 19 Feb 2004
		</para>
		<blockquote>
I'm implementing proxy, smtp and webserver with LVS as local node,
and I have tested it and it's running fine, but because
someone from management section thinks that such an implementation
is easy (just run setup.exe and everything is installed and ready to use),
he pushed me to move the setup into production, and create another one
as soon as possible.
I want to tell him
that such an implementation is not a trivial thing and needs time to setup and
to test before we go into production.
I want to show him
a list of companies who have such complete solutions, so he can see
the cost. Then he can understand that high
availability and load balancing is not easy to setup,
and will cost alot of money if we buy a complete solution.
		</blockquote>
		<para>
Vendors just rub their hands with glee on finding management like this
- see my
<ulink url="http://www.austintek.com/book_reviews/the_ibm_way.html">
review of the book &quot;The IBM Way&quot;</ulink>
(http://www.austintek.com/book_reviews/the_ibm_way.html) for
how IBM handles the situation.
		</para>
		<para>
Peter Mueller
		</para>
		<para>
Prices at this level are negotiable.  Who knows what you could pay?
		</para>
		<itemizedlist>
			<listitem>
http://www.cisco.com/ - the old man on the LB-gig.
			</listitem>
			<listitem>
http://www.f5.com/f5products/bigip/LB520/  - the second old man in the LB
gig.
			</listitem>
			<listitem>
http://www.suse.com/us/business/products/server/ - Suse has always been a
big player in the Linux-HA world.
			</listitem>
			<listitem>
http://www.redhat.com/software/rhel/purchase/ - they have clustering based
on LVS, not sure about price.  At this point you have to buy enterprise
edition (or http://www.whiteboxlinux.com) to use the clustering software.
			</listitem>
			<listitem>
http://www.ibm.com/ - always an option...
			</listitem>
			<listitem>
http://www.dell.com/ - moving up in the datacenter world.  I see lots of
Dells now..
			</listitem>
			<listitem>
http://www.ebay.com/ - see how much the gear is worth on the open market.
			</listitem>
			<listitem>
				<para>
http://www.linuxvirtualserver.org/ - $0.
				</para>
				<para>
There's plenty of people on list
who can help you and your boss feel more comfortable with your setup.  I'm
sure if you posted something some people would be willing to help make you
sleep better at night.  BTW, you know about the http://www.ultramonkey.org/
and http://www.keepalived.org/ projects, right?
				</para>
			</listitem>
		</itemizedlist>
		</section>
		<section id="radware" xreflabel="Radware">
		<title>Radware</title>
		<para>
Joe Oct 2005:
From a presentation by 
<ulink url="http://www.radware.com/">Radware</ulink>)
(http://www.radware.com/)
given to <ulink url="http://www.ncsysadmin.org/">North Carolina Systems Administrators (NCSA)</ulink>
(http://www.ncsysadmin.org/) on 10 Oct 2005.
Unfortunately I was the guy getting the pizzas for the
meeting, so I missed most of the talk (which I wanted
to hear).
		</para>
		<para>
Radware is used by Ebay and Accuweather.
Radware has a NAT loadbalancing director that appears to 
function similarly to an LVS-NAT director. The servers can 
have private IPs.
		</para>
		<para>
Radware's loadbalancing director is only a small part of 
their offering. Radware have boxes that filter based on 
packet content (looking for viruses) that sit in the flow of 
packets (possibly before the director, possibly after - didn't 
find this out). They have boxes which just handle SYN 
floods. They use SYN cookies and do a statistical analysis 
of the packets, letting some through to see which machines 
reply to the SYN-ACKs. Radware has a gui to control the 
loadbalancer, which can do things like shutting down some of 
the backend servers at sometime in the future (<emphasis>e.g.</emphasis> at 10pm
later that night) for 
new connections, so that by 8am next morning these machine 
have few or no connections and can be taken offline for 
servicing. Much of their hardware is ASIC based.
		</para>
		<para>
Health checking seems to be done from the director, and 
checks are made through to 3rd-Tier components of the 
backend servers (<emphasis>e.g.</emphasis> database machines 
behind the webservers that the client doesn't directly 
connect to).
		</para>
		<para>
Each local NAT'ed load balancing setup is itself a member of 
a distributed DNS-based load balancer. So www.foo.net might 
have a loadbalanced set of servers in different sites eg 
London, New York, San Francisco and Tokyo. Each local setup 
has an authoritative nameserver for www.foo.net
		</para>
		<para>
The way is works is
		</para>
		<itemizedlist>
			<listitem>
client in Scotland asks for the IP of www.foo.net
			</listitem>
			<listitem>
the client's nameserver doesn't know the IP and asks a 
rootserver for the machine authoritative for foo.net.
			</listitem>
			<listitem>
The rootserver has a list of 4 authoritative nameservers 
for foo.net and selects the next nameserver by round robin. 
If the next one in its list is in New York, it tells the 
client's nameserver to go query the nameserver in New York.
			</listitem>
			<listitem>
The New York nameserver for foo.net measures the packet 
latency to the client's nameserver and then returns the VIP 
for www.foo.net associated with the New York 
installation of www.foo.net. The latency is propagated to 
the other foo.net nameservers (in Tokyo, London and San 
Francisco).
			</listitem>
			<listitem>
Sometime later after the client's nameserver has flushed 
the IP entry for www.foo.net from its cache, another (or the 
same) client using the same nameserver asks for the IP of 
www.foo.net again and this time the rootserver will 
possibly send the request to another of the sites (say 
London).  The London machine already knows the latency from 
New York to the client (without knowing where the client 
is), and sees that its latency to the client is lower than 
the latency from New York to the client, and returns the IP 
of its copy of www.foo.net. The London 
nameserver also updates the latency tables at the other 
sites (New York, San Francisco and Tokyo).
			</listitem>
			<listitem>
If the next nameserver request from the client site is 
sent to Tokyo, then the Tokyo machine updates the latency 
tables in all the other nameservers, and knowing that the 
latency is lowest to the London nameserver, returns the IP 
of www.foo.net in London.
			</listitem>
			<listitem>
In this way the four nameserver accumulate the latencies to 
all nameservers in the world. This works provided that the 
latencies don't change a lot with time of day (or 
throughput). Presumably you could store successive
latencies and pick the shortest as reflecting the
true network distance. The amount of memory required to do this must 
be small - there can't be more than a million nameservers, 
can there? 1 million 8 bit latencies is not much to store in 
memory.
			</listitem>
		</itemizedlist>
		<para>
Although I didn't get to ask how it works, if a client winds
up at a more distant site (network wise), then http redirects will send
the client to a closer site.
		</para>
		<para>
Radware SSL accelarators:
		</para>
		<para>
When I commented to the speaker that the main reason to use 
SSL accelarators is financial, <emphasis>i.e.</emphasis>
to only have one copy of the certificate, 
rather than one on each realserver, they said 
"it's also for certificate management". Presumably some 
sites have large numbers of certificates. (They didn't 
disagree with my statement.)
		</para>
		<para>
The SSL accelarators in the Radware design don't sit between 
the director and the realservers (or in front of the director 
<emphasis>i.e.</emphasis> between the client and the director), 
but sit at the same 
level as the other realservers. The https request is 
balanced by the director to an accelarator, which decrypts 
the packets and sends the decrypted packet back to the 
director for loadbalancing as http traffic. Since the 
director is a NAT balancer, the return http traffic from the 
http servers, goes back through the director, and then 
recursively back to the SSL accelarator then back to the 
director at https traffic and then back to the client.
		</para>
		<para>
Being able to have the SSL accelarator as a realserver in 
LVS would require the realservers to be a client of the 
director, something that we can do for LVS-NAT, but not for 
LVS-DR. This is not a capability that we've paid much attention
to for LVS. If you need a realserver to be in the path in both 
inward and outward directions (like an SSL accelarator) then 
you will have to use LVS-NAT.
		</para>
	<para> 
Francois JEANMOUGIN <emphasis>Francois (dot) JEANMOUGIN (at) 123multimedia (dot) com</emphasis> 12 Oct 2005 
	</para>
	<para>
Note that we removed our Radware appliance to use LVS instead. Load Balancing
using DNS is _evil_, especially with mobile internet and all those
misconfigured operator gateways.
Most mobile gateway are written in Java, and I'm probably the only
one who read the java.security file. Just have a look on this ugly stuff you
can find in it and the unbelievable silly explanation given:
	</para>
<programlisting><![CDATA[
# The Java-level namelookup cache policy for successful lookups:
#
# any negative value: caching forever
# any positive value: the number of seconds to cache an address for
# zero: do not cache
#
# default value is forever (FOREVER). For security reasons, this
# caching is made forever when a security manager is set.
#
# NOTE: setting this to anything other than the default value can have
#       serious security implications. Do not set it unless
#       you are sure you are not exposed to DNS spoofing attack.
#
#networkaddress.cache.ttl=-1
]]></programlisting>
	<para>
For security reasons! Guys! Well. So we removed radware. Note that we had
other problem with radware. The DNS cache of the clients is one, the response
time of the DNS was another. Several technical issues when you reach some
trafic limits was the last.
	</para>
	<para>
Henrik Holst
	</para>
	<blockquote>
still, geographic load balancing would be very nice to have and I
cannot figure out another way to do it than involve DNS round-robin.
	</blockquote>
	<para> 
Francois  
	</para>
	<para>
Round-Robin DNS could work if
	</para>
	<itemizedlist>
		<listitem>
You have enough clients
		</listitem>
		<listitem>
Clients are using DNS as expected
		</listitem>
		<listitem>
Clients are dealing with TTL
		</listitem>
		<listitem>
Client DNS caches or provider DNS are honouring DNS TTL
		</listitem>
		<listitem>
All your sites are always up and working (you can't use a DNS solution for
failover)
		</listitem>
	</itemizedlist>
	<para>
My clients are mobile phones, basically points 1 to 4 are not OK :). And I
have to deal with multiple sources for the same client (the transaction begin
in the gallery gateway and continues in the standard surf gateway, and I have
to use fwmarks to keep the session)...
We used RadWare to try to load-balance between our two peers. It clearly was
not working. Unfortunately, I don't have all the details.
	</para>

	<para>
Horms
	</para>
	<para>
If you want to distribute traffic between hosts
that have fast, reliable links, like a LAN, then LVS is a good option.
No, an excellent option.
If you want to distribute traffic between geographically separated
hosts, then you don't want something like LVS that channles packets
through a single location then to another. Something DNS based is
probably the way to go - though round robin is not nearly smart
enough for my liking.
In practice, if you do have geographically distributed sites,
then each site should probably be an LVS cluster. So essentially
you end up using two techniques to solve different parts of the
same problem.
	</para>
	<para>
I wrote quite a lot of this on supersparrow.org once upon a time,
its still there if people want to read/play/enhance/.
(links through <xref linkend="supersparrow"/>).
	</para>
		</section>
	</section>
	<section id="books">
	<title>Books on LVS</title>
	<para>
Karl Kopper has tackled this. 
Writing a book on a moving target like LVS is a difficult proposition
certainly more than I was prepared to tackle. 
	</para>
<programlisting><![CDATA[
The Linux Enterprise Cluster
Karl Kopper
Pub: No Starch Press
ISBN 1593270364
]]></programlisting>
	<para>
The book is available at your usual suppliers.
	</para>
	<para>
I'm loath to mention the names of internet booksellers
who require your e-mail address as part of your purchase,
so that they can spam you later. 
I've been buying my books by phone at a marginally higher price 
since realising their business practices.
However recently (Jul 2004) I've discovered disposable e-mail addresses 
<emphasis>e.g.</emphasis> the free service from 
<ulink url="http://www.jetable.org/">Jetable.org</ulink>
(http://www.jetable.org/).
They have a google-like (<emphasis>i.e.</emphasis> simple) interface.
You give them your e-mail address, 
the required lifetime of the address
(1-8days), and click. 
Up comes an e-mail address (test by sending a message to it)
that you can give to your internet vendor, 
and mail will be forwarded to you for the period selected. 
After that time, no more mail will get to you.
I've been using jetable since Jul 2004 (now Sep 2004)
and have not got any spam from Jetable or from internet vendors.
	</para>
	</section>
	<section id="LVS_in_the_news" xreflabel="LVS in the News">
	<title>LVS in the news</title>
	<para>
&quot;Wired&quot; Magazine in Jun 2004 has a small article about LVS, 
illustrating the multinational cooperative nature of GPL software development.
The page is 
<ulink url="http://www.wired.com/wired/archive/12.06/images/atlas_software.pdf">here</ulink>
(http://www.wired.com/wired/archive/12.06/images/atlas_software.pdf),
or a 
<ulink url="files/atlas_software.pdf">local copy of the article</ulink> on this server.
	</para>
	</section>
	<section id="related_info">
	<title>Software/Information/HOWTOs useful/related to LVS</title>
	<para>
<ulink url="http://www.ultramonkey.org/">Ultra Monkey</ulink>
is LVS and HA combined.
	</para>
	<para>
tong <emphasis>tong (at) csusb (dot) net</emphasis>
25 Jun 2003
	</para>
	<para>
Here's a step-by-step
<ulink url="http://www.cula.net/cluster/">
guide for setting up an LVS system with heartbeat</ulink>
(http://www.cula.net/cluster).
	</para>
	<note>
This guide was published a year ago and we've only just heard about it.
The author has never popped up on the mailing list to say hello.
	</note>
	<para>
from <emphasis>lvs (at) spiderhosting (dot) com</emphasis>
<ulink url="http://www.supersparrow.org/">Super Sparrow Global Load Balancing</ulink>
using BGP routing information.
	</para>
	<para>
Ratz is documenting the 
<ulink url="http://www.drugphish.ch/~ratz/IPVS/index.html">
2.6 headers and calls with doxygen</ulink>
(http://www.drugphish.ch/~ratz/IPVS/index.html)
whenever he has reason to fiddle with a piece of code 
(<emphasis>i.e.</emphasis> the documentation isn't exhaustive, at least yet). 
	</para>
	<para>
From ratz, there's a write up on load imbalance with persistence and sticky bits at our friends
at <ulink url="http://www.microsoft.com/technet/prodtechnol/windows2000serv/deploy/confeat/nlbovw.asp">M$</ulink>.
	</para>
	<para>
From ratz, Zero copy patches to the kernel to speed up network throughput,
<ulink url="ftp://ftp.kernel.org/pub/linux/kernel/people/davem">Dave Miller's patches</ulink>,
<ulink url="http://surriel.com/patches/">Rik van Riel's vm-patches</ulink> and
more of Rick van Riel's patches at
http://www.linux-mm.org/ (link dead Dec 2003).
The Zero
copy patches may not work with LVS and may not work with netfilter either 
(from <emphasis>john (at) antefacto (dot) com</emphasis>).
	</para>
	<para>
From Michael Brown <emphasis>michael_e_brown (at) dell (dot) com</emphasis>, the
<ulink url="http://www.redhat.com/software/">TUX kernel level webserver</ulink>.
	</para>
	<para>
Dustin Puryear <emphasis>dustin (at) puryear-it (dot) com</emphasis>
gave a talk on LVS at LISA 2003.
The tutorial, 
is avaialble at:
<ulink url="http://www.puryear-it.com/publications.htm#6">
LVS: Load Balancing and High Availability for Free</ulink>
(http://www.puryear-it.com/publications.htm#6).
	</para>
	</section>
</section>
<section id="LVS-HOWTO.install">
<title>LVS: Install, Configure, Setup</title>
	<section id="from_source">
	<title>Installing from Source Code</title>
	<para>
Doing this from source code is now described in the
<ulink url="http://www.austintek.com/LVS/LVS-HOWTO/mini-HOWTO/LVS-mini-HOWTO.html">LVS-mini-HOWTO</ulink>.
Two methods of setup are described
	</para>
	<itemizedlist>
		<listitem>
Setup from the command line.
This is fine to understand what's going on,
and if you only want to have a single type of setup.
For LVSs which you're reconfiguring a lot, it's tedious and mistake prone.
If it doesn't work, you will spend some time figuring out why.
		</listitem>
		<listitem>
From a configure script which sets up an LVS with a single director.
This script is fine for initial setups: it's mistake proof 
(will give you enough information about failures to figure
out what might be wrong) and I used it for all my testing of LVS.
Since it's not easily expandable to handle director failover
and other configuration tools handle this now,
the configure script is not being developed anymore.
For production, where you need failover directors, you
should use other setup tools or save your hand-built setup as a script
(<emphasis>e.g.</emphasis> with <command>ipvsadm-sav</command>).
		</listitem>
	</itemizedlist>
	</section>
	<section id="setup_ultramonkey">
	<title>Ultra Monkey</title>
	<para>
<ulink url="http://www.ultramonkey.org">Ultra Monkey</ulink> is a packaged
set of binaries for LVS, including Linux-HA for director failover and
ldirectord for realserver failover.
It's written by Horms, one of the LVS developers.
Ultra Monkey was used on many of the server setups sold by VA Linux
and presumably made lots of money for them.
Ultra Monkey has been around since 2000 and is mature and stable.
Questions about Ultra Monkey are answered on the LVS mailing list.
Ultra Monkey is mentioned in many places in the LVS-HOWTO.
	</para>
	</section>
	<section id="setup_keepalived">
	<title>Keepalived</title>
	<para>
<ulink url="http://keepalived.sourceforge.net">Keepalived</ulink>
is written by Alexandre Cassen, and is based on vrrpd for director failover.
Health checking for realservers is included.
It has a lengthy but logical conf file and sets up an LVS for you.
Alexandre released code for this in late 2001.
There is a keepalived mailing list and Alexandre also monitors the LVS mailing list
(May 2004, most of the postings have moved to the keepalived mailing list).
The LVS-HOWTO has some information about
<ulink url="LVS-HOWTO.failover.html#keepalived_vrrpd">Keepalived</ulink>.
	</para>
	</section>
	<section id="soekris">
       	<title>Alternate hardware: Soekris (and embedded hardware)</title>
	<para>
Clint Byrum <emphasis>cbyrum (at) spamaps (dot) org</emphasis> 27 Sep 2004
	</para>
	<blockquote>
		<para>
I'd like to setup a two node Heartbeat/LVS load balancer using Soekris
Net4801 machines. These have a 266Mhz Geode CPU, 3 Ethernet, and 128MB
of RAM. The OS (probably LEAF) would live on a CF disk. If these are
overkill, I'd also consider a Net4501, which has a 133Mhz CPU, 64MB RAM,
and 3 ethernet.
		</para>
		<para>
I'd need to balance about 300 HTTP requests per second, totaling about
150kB/sec, between two servers. I'm doing this now with the servers
themselves (big dual P4 3.02 Ghz servers with lots and lots of RAM).
This is proving problematic as failover and ARP hiding are just a major
pain. I'd rather have a dedicated LVS setup.
		</para>
		<para>
1) anybody else doing this?
		</para>
		<para>
2) IIRC, using the DR method, CPU usage is not a real problem because
reply traffic doesn't go through the LVS boxes, but there is some RAM
overhead per connection. How much traffic do you guys think these should
be able to handle? 
		</para>
	</blockquote>
	<para>		
Ratz 28 Sep 2004 
	</para>
	<para>
The Net4801 machines are horribly slow but for your purpose enough. 
The limiting factor on those 
boxes are almost always the cache sizes. I've waded through too many 
processor sheets of those Geode derivates to give your specific details 
on your processor but I would be surprised if it had more than 16kb 
i/d-cache each.
	</para>
	<blockquote>
16k unified cache. :-/
	</blockquote>
	<para>
Make sure that your I/O rate is as low as possible or the first thing to 
blow is your CF disk. I've worked with hundreds of those little boxes in 
all shapes, sizes and configurations. The biggest common mode failures 
were CF disk due to temperature problems and I/O pressure (MTTF was 23 
days); other problems only showed up in really bad NICs locking up half 
of the time.
	</para>
	<blockquote>
I haven't ever had an actual CF card blow on me. LEAF is made to live on
readonly media.. so its not like it will be written to a lot.
	</blockquote>
	<para>
Sorry, blow is exaggerated, I mean they simply fail because they only 
have limited write capacity on the cells.
	</para>
	<para>
RO doesn't mean that there's no I/O going to your disk as you correctly 
noted. The problem is that if you plan on using them 24/7 I suggest you 
monitor your block I/O on your RO partitions using the values from 
/proc/partitions or the wonderful iostat tool. Then extrapolate about 4 
hours worth of samples, check your CF vendor specification on how many 
writes it can endure and see how long you can expect the thing to run.
	</para>
	<para>
I have to add that thermal issues were adding to our high failure rates. 
We wanted to ship those little nifty boxes to every branch of a big 
customer to do a big VPN network. Unfortunately the customer is in the 
automobile industry and this means that those boxes were put in the 
stranges places imaginable in garages sometimes causing major heat 
congestion. Also as it is usual in this sector of industry people are 
used to reliable hardware and so they don't care if at the end of a 
working day they simply shut down the power of the whole garage. 
Needless to say that this adds up to the reduced lifetime of a CF.
	</para>
	<para>
I then did a reliability analysis using the MGL (multiple greek letter, 
derived from the beta-factor model) model to calculate the average risk 
in terms of failure*consequence and we had to refrain from using those 
little nifty things. The costs of repair (detection of failure -> 
replacement of product) at a customer would exceed the income our 
service provided through a mesh of those boxes.
	</para>
	<blockquote>
If these are
overkill, I'd also consider a Net4501, which has a 133Mhz CPU, 64MB RAM,
and 3 ethernet.
	</blockquote>
	<para>
I'd go with the former ones, just to be sure ;).
	</para>
	<blockquote>
Forgive me for being frank, but it sounds like you wouldn't go with
either of them.
	</blockquote>
	<para>
I don't know your business case so it's very difficult to give you a 
definite answer. I only give you an (somewhat intimidating) experience 
report, someone might just as well give you a much better report.
	</para>
	<blockquote>
I'd need to balance about 300 HTTP requests per second, totaling about
150kB/sec, between two servers.
	</blockquote>
	<para>
So one can assume a typical request to your website is 512 Bytes, which 
is rather quite high. But not really an issue for LVS-DR.
	</para>
	<blockquote>
I didn't clarify that. The 150kB/sec is outgoing. This isn't for all of
the website, just the static images/html/css.
	</blockquote>
	<blockquote>
I'm doing this now with the servers
themselves (big dual P4 3.02 Ghz servers with lots and lots of RAM).
This is proving problematic as failover and ARP hiding are just a major
pain. I'd rather have a dedicated LVS setup.
	</blockquote>
	<para>
I'd have to agree to this.
	</para>
	<blockquote>
1) anybody else doing this?
	</blockquote>
	<para>
Maybe. Stupid questions: How often did you have to failover and how 
often did it work out of the box?
	</para>
	<blockquote>
Maybe once every 2 or 3 months I'd need to do some maintenance and
switch to the backup. Every time there was some problem with noarp not
coming up or some weird routing issue with the IPs. Complexity bad. :)
	</blockquote>
	<para>
So frankly speaking: your HA solution didn't work as expected ;).
	</para>
	<blockquote>
2) IIRC, using the DR method, CPU usage is not a real problem because
reply traffic doesn't go through the LVS boxes, but there is some RAM
overhead per connection. How much traffic do you guys think these should
be able to handle? 
	</blockquote>
	<para>
This is very difficult to say since these boxes impose limits also 
through their inefficiant PCI busses, their rather broken NICs and the 
dramatically reduced cache. Also it would be interesting to know if 
you're planning on using persistency on your setup.
	</para>
	<blockquote>
Persistency is not a requirement. Note that most of the time a client
opens a connection once, and keeps it up as long as they're browsing
with keepalives.
	</blockquote>
	<para>
Yes, provided most clients use HTTP/1.1. But since on an application 
level you don't need persistency.
	</para>
	<para>
But to give you a number to start with, I would say those boxes should 
be able (given your constraints) to sustain 5Mbit/s of traffic with 
about 2000pps (~350 Bytes/packet) and only consume 30 Mbyte of your 
precious RAM when running without persistency. This is if every packet 
of your 2000pps is a new client requesting a new connection to the LVS 
and will be inserted by the template at an average of 1 Minute.
	</para>
	<para>
As mentioned previously, you HW configuration is very hard to compare to 
actual benchmarks, thus take those numbers with a grain of salt, please.
	</para>
	<blockquote>
Thats not encouraging. I need something fairly cheap.. otherwise I might
as well go down the commercial load balancer route. 
	</blockquote>
	<para>
Well, I have given you number which are (at a second look) rather low 
estimates ;). Technically, your system should be able to deliver 
25000pps (yes, 25k) at a 50Mbit/s rate. You would then, if every packet 
was a new client, consume about all the memory of your system :). So 
somewhere in between those two numbers I would place the performance of 
your machine.
	</para>
	<para>
Bubba Parker <emphasis>sysadmin (at) citynetwireless (dot) net</emphasis> 27 Sep 2004
	</para>
	<para>
In my tests, the Soekris net4501, 4511, and 4521 all were able to route almost 20Mbps at wire-speed.
I would suspect the 4801 to be in excess of 50Mbps, 
but remember, your Soekris board has 3 nics, 
but what they don't tell you is that they all share the same interrupt, 
so performance degredation is exponential with many packets per second.
	</para>
	<para>
Ratz 28 Sep 2004
	</para>
	<para>
For all Geode based boards I've received more technical documentation 
than I was ever prepared to dive in. Most of the time you get a very 
accurate depiction of your hardware including south and north bridges 
and there you can see that the interrupt lines are hardwired and require 
a interrupt sharing.
	</para>
	<para>
However this is not a problem since there's not a lot of devices on the 
bus anyway that would occupy it and if you're really unhappy about the 
bus speed, use setpci to reduce latency for the NIC's IRQs.
	</para>
	<para>
Newer kernels have excellent handling for shared IRQs btw.
	</para>
	<para>
Did you measure exponential degradation? I know you get a pretty steep performance 
reduction once you push the pps too high but I newer saw exponential 
behaviour.
	</para>


	<para>
Peter Mueller 2004-09-27 
	</para>
	<blockquote>
What about not using these Soekris's and just using those two beefy servers?
e.g.,  http://www.ultramonkey.org/2.0.1/topologies/ha-overview.html or
 http://www.ultramonkey.org/2.0.1/topologies/sl-ha-lb-overview.html
	</blockquote>
	<para>
Clint Byrum 27 Sep 2004
	</para>
	<para>
Thats what I'm doing now. The setup works, but its complexity causes
issues. Bringing up IPs over here, moving them from eth0 to lo over
there, running noarpctl on that box. Its all very hard to keep track of.
Its much simpler to just have two boxes running LVS, and not worry about
whats on the servers.
	</para>
	<para>
Simple things are generally easier to fix if they break. It took me
quite a while to find a simple typo in a script on my current setup,
because it was very non-obvious at what layer things were failing.
	</para>
	</section>
	<section id="turnbull">
       	<title>LVS on a CD: Malcolm Turnbull's ISO files</title>
       	<para>
Malcolm Turnbull <emphasis>Malcolm (dot) Turnbull (at) crocus (dot) co (dot) uk</emphasis>
03 Jun 2003, has released a Bootable ISO image of his Loadbalancer.org appliance software.
The link was at
http://www.loadbalancer.org/modules.php?name=Downloads&amp;d_op=viewdownload&amp;cid=2
but is now dead (Dec 2003).
Checking the website (Apr 2004) I find that the code is available as a
30 day demo
(http://www.loadbalancer.org/download.html, link dead Feb 2005).
	</para>
	<para>
Here's the original blurb from Malcolm
	</para>
	<blockquote>
       		<para>
The basic idea is creating an easy to use layer 4 switch appliance to
compete with Coyote Point Equalizer/ CISCO local director...
All my source code is GPL, but the ISO distribution contains
files that are non-GPL to protect the work and allow vendors to licence
the software.
The ISO requires a license before you can legally use it in production.
       		</para>
       		<para>
Burn it to CD and then use it to boot a spare server with
pentium/celeron + ATAPI CD + 64MB RAM + 1 or 2 NICs+20GB HD
       		</para>
<programlisting><![CDATA[
root password is : loadbalancer
ip address is : 10.0.0.21/255.255.0.0
web based login is : loadbalancer
web based password is : loadbalancer
]]></programlisting>
       		<para>
Default setup is DR so just plug it straight into the same hub as your
web servers and have a play..
Download the manuals for configuration info...
       		</para>
	</blockquote>
	</section>
</section>
<section id="LVS-HOWTO.ipvsadm" xreflabel="ipvsadm and schedulers">
<title>LVS: Ipvsadm and Schedulers</title>
<para>
<command>ipvsadm</command> is the user code interface to LVS. 
The scheduler is the part of the ipvs
kernel code which decides which realserver will get the next new connection.
</para>
<para>
There are patches for ipvsadm
</para>
<itemizedlist>
	<listitem>
<link linkend="Padraig">machine readable error codes for ipvsadm</link>
	</listitem>
	<listitem>
<link linkend="stateless_ipvsadm">stateless entry of <command>ipvsadm</command> commands</link>
	</listitem>
	<listitem>
		<para>
Mar 2004. There appears to have been introduced a bug in the wrr code. 
Presumably this will be fixed sometime in the main code, and presumably
older versions of <filename>ipvs</filename> still work (but I don't 
know how far back you need to go, presumably to the 2.4 kernels).
Here are some postings on the matter and links to a patch.
		</para>
		<para>
Jan Kasprzak <emphasis>kas (at) fi (dot) muni (dot) cz</emphasis> 2005/03/25
		</para>
		<blockquote>
			<para>
port unreachable after RS removal:
			</para>
			<para>
   I use IPVS with direct routing and wrr scheduler. The problem is
that for some configurations I get "icmp port unreachable" when one of the
real servers fails and is removed from the ip_vs tables. 
The smallest case where I can
replicate the problem is the following:
			</para>
<programlisting><![CDATA[
ipvs# ipvsadm -A -t virtual.service:http -s wrr
ipvs# ipvsadm -a -t virtual.service:http -r realserver1:http -w 100
ipvs# ipvsadm -a -t virtual.service:http -r realserver2:http -w 1000

client$ wget -O - http://virtual.service/
[works as expected]

ipvs# ipvsadm -d -t virtual.service:http -r realserver2

client$ wget -O - http://virtual.service/
--14:46:29--  http://virtual.service/
           => `-'
Resolving virtual.service... 1.2.3.4
Connecting to virtual.service[1.2.3.4]:80... failed: Connection refused.
]]></programlisting>
			<para>
   I have verified by tcpdump that no traffic is sent to realserver2
after it is removed from the virtual.service pool. The ICMP "tcp port
unreachable" is sent by the ipvs director.
This appears to be a problem in the wrr scheduler. With wlc or rr
it works as expected.
The director is Fedora Core 3 with vanilla 2.6.11.3 kernel,
but I have been experiencing this for a longer time.
			</para>
		</blockquote>
		<para>
Sent by: lvs-users-bounces@LinuxVirtualServer.org 2005/03/26
		</para>
		<para>	
This is exactly the problem I described in my previous mails, 
and for which a patch is available from Wensong and/or Horms.
Search the mailinglist archive for 'overload flag not
resetting' which was my initial (wrong) diagnosis.
See
		</para>
<programlisting><![CDATA[
http://marc.theaimsgroup.com/?l=linux-virtual-server&m=110604584821192&w=2

and the (inital) patch:
http://marc.theaimsgroup.com/?l=linux-virtual-server&m=110749794000222&w=2
]]></programlisting>
	</listitem>
</itemizedlist>
	<section id="using_ipvsadm">
	<title>Using ipvsadm</title>
	<para>
You use <command>ipvsadm</command> from the command line (or in rc files) to setup: -
	</para>
	<itemizedlist>
		<listitem>
services/servers that the director directs
(<emphasis>e.g.</emphasis> http goes to all realservers,
while ftp goes only to one of the realservers).
		</listitem>
		<listitem>
			<para>
weighting given to each realserver - useful if some servers
are faster than others.
			</para>
			<para>
Horms 30 Nov 2004
			</para>
			<para>
The weights are integers, but sometimes they are assigned to an
atomic_t, so they can only be 24bits <emphasis>i.e.</emphasis>
values so 0 to (2^24-1) should work.
			</para>
		</listitem>
		<listitem>
		<link linkend="scheduler">scheduling algorithm</link>
		</listitem>
	</itemizedlist>
	<para>
You use can also use <command>ipvsadm</command> to
	</para>
	<itemizedlist>
		<listitem>
		add services: add a service with weight &gt;0
		</listitem>
		<listitem>
		shutdown (or quiesce) services: set the weight to 0.
		<para>
This allows current connections to continue,
untill they disconnect or expire,
but will not allow new connections.
When there are no connections remaining,
you can bring down the service/realserver.
		</para>
		</listitem>
		<listitem>
delete services: this stops traffic for the service (the connection will hang),
but the entry in the connection table is not deleted till it times out.
This allows deletion, followed shortly thereafter by adding back the service,
to not affect established (but quiescent) connections.
		</listitem>
		<listitem>
			<para>
		Once you have a working LVS, save the 
<command>ipsvadm</command> settings with <command>ipvsadm-sav</command>
<programlisting><![CDATA[
$ipvsadm-sav > ipvsadm.sav
]]></programlisting>
			</para>
			<para>
		and then after reboot, 
restore the <command>ipvsadm</command> settings, with ipvsadm-restore
<programlisting><![CDATA[
$ipvsadm-restore < ipvsadm.sav
]]></programlisting>
			</para>
			<para>
Both of these commands can be part of an <command>ipvsadm</command> init script.
			</para>
		</listitem>
		<listitem>
		list version of ip_vs (here 0.9.4, with a hash table size of 4096)
<programlisting><![CDATA[
director:/etc/lvs# ipvsadm
IP Virtual Server version 0.9.4 (size=4096)
]]></programlisting>
		</listitem>
		<listitem>
		list version of <command>ipvsadm</command> (here 1.20)
<programlisting><![CDATA[
director:/etc/lvs# ipvsadm --version
director:/etc/lvs# ipvsadm v1.20 2001/09/18 (compiled with popt and IPVS v0.9.4)
]]></programlisting>
		</listitem>
</itemizedlist>
	</section>
	<section id="sysctl documentation">
	<title>sysctl documentation</title>
	<para>
the <filename>sysctl</filename> for ipvs will be in 
<filename>Documentation/networking/ipvs-sysctl.txt</filename> for 2.6.18 (hopefully).
It is derived from http://www.linuxvirtualserver.org/docs/sysctl.html v1.4.
	</para>
	</section>
	<section id="ipvs_kernel_match">
	<title>Compile a version of ipvsadm that matches your ipvs</title>
	<para>
Compile and install <command>ipvsadm</command> on the
director using the supplied Makefile. You can optionally compile ipvsadm
with popt libraries, which allows <command>ipvsadm</command> to handle more complicated
arguments on the command line.
If your <filename>libpopt.a</filename> is too old, your <command>ipvsadm</command> will segv.
(I'm using the dynamic libpopt and don't have this problem).
	</para><para>
Since you compile <filename>ipvs</filename> and <command>ipvsadm</command> independantly and
you cannot compile <command>ipvsadm</command> until you have patched the kernel headers,
a common mistake is to compile the kernel and reboot, forgetting to
compile/install ipvsadm.
	</para><para>
Unfortunately there is only rudimentary version detection code into ipvs/ipvsadm.
If you have a mismatched ipvs/<command>ipvsadm</command> pair,
many times there won't be problems, as any particular
version of <command>ipvsadm</command> will work with a wide range of patched kernels.
Usually with 2.2.x kernels,
if the ipvs/<command>ipvsadm</command> versions mismatch, you'll get weird but non-obvious
errors about not being able to install your LVS. Other possibilities are
that the output of <command>ipvsadm</command> -L will have IP's that are clearly not IPs (or
not the IP's you put in) and ports that are all wrong. It will look something
like this
	</para>
<programlisting><![CDATA[
[root@infra /root]# ipvsadm
IP Virtual Server version 1.0.4 (size=3D4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP  C0A864D8:0050 rr
  -> 01000000:0000      Masq    0      0          0
]]></programlisting>
	<para>
rather than
	</para>
<programlisting><![CDATA[
director:/etc/lvs# ipvsadm
IP Virtual Server version 0.9.4 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  lvs2.mack.net:ssh rr
  -> RS2.mack.net:ssh             Route   1      0          0
]]></programlisting>
	<para>
There was a change in the /proc file system for
ipvs about 2.2.14 which caused problems
for anyone with a mismatched ipvsadm/ipvs.
The <command>ipvsadm</command> from different kernel series (2.2/2.4) do not
recognise the ipvs kernel patches from the other series (they appear to
not be patched for ipvs).
	</para><para>
The later 2.2.x ipvsadms know the minimum version of ipvs that they'll run on,
and will complain about a mismatch.
They don't know the maximum version
(which will be produced presumably some time in the future)
that they will run on.
This protects you against the unlikely event of installing a new 2.2.x version of
director:/etc/lvs# ipvsadm on an older version of ipvs, but will not protect you against the
more likely scenerio where you forget to compile <command>ipvsadm</command> after building your kernel.
The <command>ipvsadm</command> maintainers are aware of the problem.
Fixing it will break the current code and they're waiting
for the next code revision which breaks backward compatibility.
	</para><para>
If you didn't even apply the kernel patches for ipvs, then ipvsadm
will complain about missing modules and exit
(<emphasis>i.e.</emphasis> you can't even do `<command>ipvsadm -h</command>`).
	</para>
		<section id="other_compile_problems">
		<title>Other compile problems</title>
		<para>
Ty Beede <emphasis>tybeede (at) metrolist (dot) net</emphasis>
		</para>
		<blockquote>
		<para>
Ty Beede <emphasis>tybeede (at) metrolist (dot) net</emphasis>
on a slackware 4.0 machine I went to compile <command>ipvsadm</command> and it gave
me an error indicating that the iphdr type was undefined and
it didn't like that when it saw
		</para>
		<para>
Ty Beede <emphasis>tybeede (at) metrolist (dot) net</emphasis>
to <filename>ip_fw.h</filename> I added
		</para>
<programlisting><![CDATA[
#include <linux/ip.h>
]]></programlisting>
		<para>
Ty Beede <emphasis>tybeede (at) metrolist (dot) net</emphasis>
in ipvsadm.c, which is where the iphdr
#structure is defined and everything went ok
		</para>
		</blockquote>
		<para>
Doug Bagley <emphasis>doug (at) deja (dot) com</emphasis>
		</para>
		<para>
The reason that it fails "out of the box" is because fwp_iph's
type definition (struct iphdr) was
		</para>
<programlisting><![CDATA[
#ifdef'd out in <linux/ip_fw.h>
]]></programlisting>
		<para>
(and not included anywhere else) since the symbol __KERNEL_ was
undefined.
		</para>
<programlisting><![CDATA[
Including <linux/ip.h> before <linux/ip_fw.h>
]]></programlisting>
		<para>
in the .c file did the trick.
		</para>
		</section>
	</section>
	<section id="realservers_in_etc_hosts">
	<title>put realservers in /etc/hosts</title>
	<para>
(from a note by Horms 26 Jul 2002)
	</para>
	<para>
<filename>ipvsadm</filename> by default outputs the <emphasis>names</emphasis> 
of the realservers rather than the IPs.
The director then needs name resolution.
If you don't have it, 
<command>ipvsadm</command> will take a long time (upto a minute) to return,
as it waits for name resolution to timeout.
The only IPs that the director needs to resolve are of the realservers.
DNS is slow. To prevent the director from needing DNS,
put the names of the realservers in <filename>/etc/hosts</filename>.
This lookup is quicker than DNS and you won't need
to open a route from the director to a nameserver.
	</para>
	<para>
Or you could use `<command>ipvsadm -n</command>` which outputs the IPs
of the realservers instead.
	</para>
	</section>
	<section id="scheduler">
	<title>RR and LC schedulers</title>
	<para>
On receiving a connect request from a client, the director
assigns a realserver to the client based on a &quot;schedule&quot;.
The scheduler type is set with ipvsadm.
The schedulers available are
	</para>
	<itemizedlist>
		<listitem>
		round robin (rr), weighted round robin (wrr) - new
connections are assigned to each realserver in turn
		</listitem>
		<listitem>
		<para>
		least connected (lc), weighted least connection (wlc) - new
connections go to realserver with the least number of connections.
This is not neccessarily the least busy realserver but is a step in
that direction.
		</para>
		<para>
		<note>
Doug Bagley <emphasis>doug (at) deja (dot) com</emphasis>
points out that *lc schedulers will not work properly 
if a particular realserver is used in two different LVSs.
		</note>
		</para>
		</listitem>
		<listitem>
		<xref linkend="LVS-HOWTO.persistent_connection"/>
		</listitem>
		<listitem>
		LBLC: a persistent memory algorythm
		</listitem>
		<listitem>
		DH: destination hash
		</listitem>
		<listitem>
		SH: source hash
		</listitem>
	</itemizedlist>
	<para>
The original schedulers are rr, and lc (and their weighted versions).
Any of these will do for a test setup. In particular,
round robin will cycle connections
to each realserver in turn, allowing you to check that all
realservers are functioning in the LVS.
The rr,wrr,lc,wlc schedulers should all work similarly when
the director is directing identical realservers with identical services.
The lc scheduler will better handle situations where machines
are brought down and up again
(see <link linkend="thundering_herd">thundering herd problem</link>).
If the realservers are offering different services and some have clients
connected for a long time while others are connected for a short time,
or some are compute bound, while others are network bound,
then none of the schedulers will do a good job of distributing
the load between the realservers.
LVS doesn't have any load monitoring of the realservers.
Figuring out a way of doing this that will work for a range of different types
of services isn't simple (see <link linkend="agent">load and failure monitoring</link>).
	</para>
	</section>
	<section id="netmask_for_VIP">
	<title>Netmask for VIP</title>
	<para>
You setup the RIPs, DIP and other networking with whatever netmask you
choose. For the VIP
	</para>
	<itemizedlist>
		<listitem>
For LVS-DR, LVS-Tun: netmask for VIP on director, realservers must be /32.
		</listitem>
		<listitem>
For LVS-NAT: the netmask can be /32 or the netmask of the RIPs, DIP.
		</listitem>
	</itemizedlist>
	<para>
You will need to setup the routing for the VIP to match the netmask.
For more details, see the chapters for each forwarding method.
	</para>
	<para>
Horms 12 Aug 2004 
	</para>
	<para>
The real story is that the netmask works a little differently
on lo to other interfaces. On lo the interface will answer to
_all_ addresses covered by the netmask. This is how 127.0.0.1/8 on
lo ends up answering 127.0.0.0-127.255.255.255. So if
you add 172.16.4.222/16 to eth0 then it will answer 172.16.4.222 and
only 172.16.4.222. But if you add the same thing to lo then it 
will answer 172.16.0.0-172.16.255.255. So you need to use
172.16.4.222/32 instead.
	</para>
	<para>
To clarify -
	</para>
<programlisting><![CDATA[
ifconfig eth0:0 192.168.10.10 netmask 255.255.255.0 broadcast 192.168.10.255 up
   -> Add 192.168.10.10 to eth0

ifconfig lo:0 192.168.10.10 netmask 255.255.255.0 broadcast 192.168.10.255 up
   -> Add 192.168.10.0 - 192.168.10.255 to lo

ifconfig lo:0 192.168.10.0 netmask 255.255.255.0 broadcast 192.168.10.255 up
   -> Same as above, add 192.168.10.0 - 192.168.10.255 to lo

ifconfig lo:0 192.168.10.10 netmask 255.255.255.255 broadcast 192.168.10.10 up
   -> Add 192.168.10.10 to lo
]]></programlisting>

	<para>
Malcolm Turnbull <emphasis>malcolm (at) loadbalancer (dot) org</emphasis> 2005/04/21
	</para>
	<para>
On all platforms apart from windows you want 255.255.255.255 for the 
loopback.
On windows you can get away with 255.255.255.0 IF you use a priority 254 
80% of the time.
255.255.255.255 can be used if you mod the registry...
But we've found that 255.0.0.0 will work better 99% of the time because 
windows by default uses the smallest subnet first for routing
and a class  A will never be used instead of a class C.
	</para>
	</section>
	<section id="DH">
	<title>LBLC, DH schedulers</title>
	<para>
The LBLC code (by Wensong) and the DH scheduler
(by Wensong, inspired by code submitted by Thomas Proell
<emphasis>proellt (at) gmx (dot) de</emphasis>)
are designed for web caching realservers
(<emphasis>e.g.</emphasis> squids).
For normal LVS services (eg ftp, http), the content
offered by each realserver is the same and it doesn't
matter which realserver the client is connected to.
For a web cache, after the first fetch has been made,
the web caches have different content.
As more pages are fetched, the contents of the web caches will diverge.
Since the web caches will be setup as peers,
they can communicate by ICP (internet caching protocol)
to find the cache(s) with the required page.
This is faster than fetching the page from the original webserver.
However, it would be better after the first fetch of a page
from http://www.foo.com/*, for all subsequent clients wanting a
page from http://www.foo.com/ to be connected to that realserver.
	</para>
	<para>
The original method for handling this was to make
connections to the realservers persistent,
so that all fetches from a client went to the same realserver.
	</para>
	<para>
The -dh (destination hash) algorythm makes a hash from the target IP
and all requests to that IP will be sent to the same realserver.
This means that content from a URL will not be retrieved
multiple times from the remote server.
The realservers (eg squids in this case)
will each be retreiving content from different URLs.
	</para>
	<para>
Wensong Zhang <emphasis>wensong (at) gnuchina (dot) org</emphasis> 16 Feb 2001
	</para>
	<para>
Please see "man ipvsadm" for short description of DH and SH
schedulers. I think some examples to use those two schedulers.
	</para>
	<para>
Example:  cache cluster shared by several load balancers.
	</para>
<programlisting><![CDATA[
		Internet
		|
                |------cache array
                |
		|-----------------------------
		   |                |
		   DH               DH
		   |                |
		 Access            Access
                 Network1          Network2
]]></programlisting>
	<para>
The DH scheduler can keep the two load balancer redirect requests
destined for the same IP address to the same cache server. If the server
is dead or overloaded, the load balancer can use cache_bypass feature to
send requests to the original server directly. (Make sure that the cache
servers are added in the two load balancers in the same order)
	</para>
	<para>
Diego Woitasen 12 Aug 2003
	</para>
	<blockquote>
The scheduling algorithms that use dest IP for selecting
the realserver to use (like DH, LBLC, LBLCR) is only aplicable to
transparent proxy, this being the only aplication where the dest ip
could be variable.
	</blockquote>
	<para>
Wensong Zhang <emphasis>wensong (at) linux-vs (dot) org</emphasis> 12 Aug 2003
	</para>
	<para>
Yes, you are almost right. LBLC and LBLCR are written for transparent
proxy clusters only. DH can be used for transparent proxy cluster and can
be used in other clusters needing static mapping.
	</para>
	<para>
	<note>
Here's follows a set of exchanges between a Chinese person and Wensong,
that were in English, that I didn't follow at all. Apparently it was
clear to Wensong.
	</note>
	</para>
	<blockquote>
If lblc uses dh, then is lblc = dh + lc?
	</blockquote>
	<para>
Wensong Zhang 09 Mar 2004
	</para>
	<para>
Maybe lblc = dh + wlc.
	</para>
<programlisting><![CDATA[
/*
 * The lblc algorithm is as follows (pseudo code):
 *
 *       if cachenode[dest_ip] is null then
 *               n, cachenode[dest_ip] <- {weighted least-conn node};
 *       else
 *               n <- cachenode[dest_ip];
 *               if (n is dead) OR
 *                  (n.conns>n.weight AND
 *                   there is a node m with m.conns<m.weight/2) then
 *                 n, cachenode[dest_ip] <- {weighted least-conn node};
 *
 *       return n;
 *
 */
]]></programlisting>

	<para>
	The difference between lblc and lblcr is that cachenode[dest_ip] in
lblc is a server, and cachenode[dest_ip] in lblcr is a server set.
	</para>
	<blockquote>
In lblc the server has overloaded and lvs use wlc and allocate a server in
half load of the server,
Allocate the weighted least-connection server to IP address.
Is this means after allocation for ip address, it will not return to past
server ?
	</blockquote>
	<para>
No, it will not in most cases. There is only one possible situation that
the current map expires after it is not used for six minutes, and the past
server is the one with least connections when next access to the ip
address comes.
	</para>
		<section id="scheduling_squids">
		<title>scheduling <link linkend="squids">squids</link></title>
		<para>
The usual problem with squids not using a cache friendly scheduler
is that fetches are slow. In this case the website is sending hits
to several different RIPs. Some websites detect this and won't
even serve you the pages.
		</para>
		<para>
Palmer J.D.F. <emphasis>J (dot) D (dot) F (dot) Palmer (at) Swansea (dot) ac (dot) uk</emphasis> 18 Mar 2002/
		</para>
		<blockquote>
		<para>
I tried https and online banking sites (<emphasis>e.g.</emphasis> www.hsbc.co.uk).
It seems that this site and undoubtedly many other secure sites don't like
to see connections split across several IP addresses as happens with my
cluster.
Different parts of the pages are requested by different realservers, and
hence different IP addresses.
		</para>
		<para>
It gives an error saying...
"...For your security, we have disconnected you from internet banking due to
a period of inactivity..."
		</para>
		<para>
I have had caching issues with HSBC before, they seem to be a bit more
stringent than other sites.
If I send the requests through one of the squids on it's own it works fine,
so I can only assume it's because it is seeing fragmented requests, maybe
there is a keepalive component that is requested.
How do I combat this?  Is this what persistence does or is there a way of
making the realservers appear to all have the same IP address?
		</para>
		</blockquote>
		<para>
Joe
		</para>
		<para>
change -rr (or whatever you're running) to -dh.
		</para>
		<para>
Lars
		</para>
		<para>
Use a different scheduler, like lblc or lblcr.
		</para>
		<para>
Harry Yen <emphasis>hyen1 (at) yahoo (dot) com</emphasis> 16 April 2002
		</para>
		<para>
What is the purpose of using LVS with Squid to a https site?
HTTPs based material typically is not cachable.
I don't understand why you need Squid at all.
		</para>
		<para>
Once a request reaches a Squid and incurs a cache miss, the forwarded
request will have Squid IP as the source address.  So you need to find a
way to make sure all connections from the same client IP to go to the
same Squid farm. Then when they incur cache misses, they will wind up
via LVS persistency to the same real sever.
		</para>
		<blockquote>
			<para>
The reason https is sent to the squids is because it's much easier to send
all browser traffic to the squids and then let them handle it.
The only way I seemed to be able to get this to work (IE access the bank
site) is to set a persistence (360 seconds), and using lblc scheduling.
The current output of <command>ipvsadm</command> is this... I am a tad concerned at the
apparent lack of load balancing.
			</para>
<programlisting><![CDATA[
TCP  wwwcache-vip.swan.ac.uk:squi lblc persistent 360
  -> squidfarm1.swan.ac.uk:squid  Route   1      202        1045
  -> squidfarm2.swan.ac.uk:squid  Route   1      14         8
]]></programlisting>
			<para>
HSBC seems to be a bit more
stringent than other sites. If I send the requests through one of the
squids on it's own it works fine, so I can only assume it's because it is
seeing fragmented requests, maybe there is a keepalive component that is
requested.
How do I combat this? Is this what persistence does or is there a way
of making the realservers appear to all have the same IP address?
I have sorted it by using persistence, couldn't get any of the dedicated
squid schedulers to work properly. I'm currently running wlc, and 360s
persistance.  Seems to be holding up really well.  Still watching it with
eagle eyes though.
			</para>
		</blockquote>
		<para>
The -dh scheduler was written expressly to handle squids.
Jezz tried it and didn't get it to work satisfactorily but found that
persistence worked. We don't understand this yet.
		</para>
		<para>
Jakub Suchy <emphasis>jakub (at) rtfm (dot) cz</emphasis> 2005/02/23
		</para>
		<para>
round-robin algorithm is not usable for squid.
Some servers (banks etc.) check clients ip address and terminates it's 
connection if it changes.
When you use the source-hashing algorithm, 
IPVS checks the client against its 
local table and forwards connection always to same squid real server, 
so the client always accesses the web through same squid.
source-hashing can become unbalanced when you have few clients 
and one of them use squid more frequently than others. 
With more clients, it's statistically balanced.
		</para>
		</section>
	</section>
	<section id="Henrik">
	<title>LVS with mark tracking: fwmark patches for multiple firewalls/gateways</title>
	<para>
If the LVS is protected by multiple firewall boxes and each
firewall is doing connection tracking, then packets arriving
and leaving the LVS from the same connection will need
to pass through the same firewall box or else they won't be
seen to be part of the same connection and will be dropped.
An initial attempt to handle the firewall problem was
sent in by Henrik Nordstrom, who is involved with developing
<ulink url="http://squid.sourceforge.net/hno/">web caches (squids)</ulink>.
	</para>
	<para>
This code isn't a scheduler, but it's in here awaiting further developements of code
from Julian because it addresses similar problems to the <link linkend="SH-scheduler">
SH scheduler</link> in the next section.
	</para>
	<para>
Julian 13 Jan 2002
	</para>
	<blockquote>
	Unfortunately Henrik's patch breaks the LVS fwmark code.
Multiple gateway setups can be solved with routing and a solution is planned for LVS.
Until then it would be best to contact
Henrick, <emphasis>hno (at) marasystems (dot) com</emphasis> for his patch.
	</blockquote>
	<para>
Here's Henrick's 
<ulink url="files/ipvs-0.2.3-mark-track-v2.patch">patch</ulink> and here's some history.
	</para>
	<para>
Henrik Nordstrom <emphasis>hno (at) marasystems (dot) com</emphasis> 13 Jan 2002
	</para>
	<blockquote>
	<para>
My use of the MARK is for routing purposes of return traffic only, not at
all related to the scheduling onto the farm.
This to solve complex routing problems arising in borders between networks
where it is impractical to know full routing of all clients.
One example of what I do is like this:
	</para>
	<para>
I have a box connected to three networks (firewall, including LVS-NAT load
balancing capabilities for published services)
	</para>
	<itemizedlist>
		<listitem>
a - Internet
		</listitem>
		<listitem>
b - DMZ, where the farm members are
		</listitem>
		<listitem>
c - Large intranet
		</listitem>
	</itemizedlist>
	<para>
For simplicity both Internet and intranet users connect to the same
LVS IP addresses.
Both networks 'a' and 'c' is complex, and maintaining a complete and
correct routing table covering one of the networks (i.e. the 'c' network
in the above) is on the border to impossible and error prone as the use of
addresses change over time.
	</para>
	<para>
To simplify routing decisions I simply simply want return traffic to be
routed back the same way as from where the request was received. This
covers 99.99% of all routing needed in such situation regardless of the
complexity of the networks on the two (or more) sides without the need of
any explicit routing entries. To do this I MARK the session when received
using netfilter, giving it a routing mark indicating which path the
session was received from. My small patch modifies LVS to memorize this
mark in the LVS session, and then restore it on return traffic received
FROM the realservers. This allows me to route the return traffic from the
farm members to the correct client connection using iproute fwmark based
routing rules.
	</para>
	<para>
As farm distribution algorithms I use different ones depending on the type
of application. The MARK I only use for routing of return traffic.
I also have a similar patch for Netfilter connection tracking (and NAT),
for the same purpose of routing return traffic. If interested search for
CONNMARK in the netfilter-devel archives.
The two combined allows me to make multihomed boxes who do not need to
know the networks on any of the sides in detail, besides it's own IP
addresses and suitable gateways to reach further into the networks.
	</para>
	<para>
Another use of the connection MARK memory feature is a device connected to
multiple customer networks with overlapping IP addresses, for example two
customers both using 192.168.1.X addresses. In such case making a standard
routing table becomes impossible as the customers are not uniquely
identified by their IP addresses. The MARK memory however deals with such
routing at ease since it do not care about the detailed addressing as long
as it possible to identify the two customer paths somehow. i.e. interface
originally received on, source MAC of the router who sent us the request,
or anything uniquely identifying the request as coming from a specific
path.
	</para>
	<para>
The two problems above (not wanting to known the IP routing, or not being
able to IP route) are not mutually exclusive. If you have one then the
other is quite likely to occur.
	</para>
	</blockquote>
	<para>
Here's Henrik's announcement and the replies.
	</para>
	<para>
Henrik Nordstrom 14 Feb 2001
	</para>
	<blockquote>
		<para>
Here is a small patch to make LVS keep the MARK,
and have return traffic inherit the mark.
		</para>
		<para>
We use this for routing purposes on a multihomed LVS server, to have
return traffic routed back the same way as from where it was received.
What we do is that we set the mark in the iptables mangle chain
depending on source interface, and in the routing table use this mark to
have return traffic routed back in the same (opposite) direction.
		</para>
		<para>
The patch also moves the priority of LVS INPUT hook back to infront of
iptables filter hook, this to be able to filter the traffic not picked
up by LVS but matchin it's service definitions. We are not
(yet) interested of filtering traffic to the virtual servers, but very
interested in filtering what traffic reaches the Linux LVS-box itself.
		</para>
	</blockquote>
	<para>
Julian - who uses NFC_ALTERED ?
	</para>
	<blockquote>
Netfilter. The packet is accepted by the hook but altered (mark changed).
	</blockquote>
	<para>
Julian -
Give us an example (with dummy addresses) for setup that require
such fwmark assignments.
	</para>
	<blockquote>
		<para>
For a start you need a LVS setup with more than one real interface receiving
client traffic for this to be of any use. Some clients (due to routing
outside the LVS server) comes in on one interface, other clients on another
interface. In this setup you might not want to have a equally complex routing
table on the actual LVS server itself.
		</para>
		<para>
Regarding iptables/ipvs I currently "only" have three main issues.
		</para>
		<itemizedlist>
			<listitem>
As the "INPUT" traffic bypasses most normal routes, the iptables conntrack
will get quite confused by return traffic..
			</listitem>
			<listitem>
	Sessions will be tracked twice. Both by iptables conntrack and by IPVS.
			</listitem>
			<listitem>
There is no obvious choice if IPVS LOCAL_IN sould be placed before or after
iptables filter hook. Having it after enables the use of many fancy iptables
options, but instead requires one to have rules in iptables for allowing ipvs
traffic, and any mismatches (either in rulesets or IPVS operation) will cause the
packets to actually hit the IP interface of the LVS server which in most cases is
not what was intended.
			</listitem>
		</itemizedlist>
	</blockquote>
	</section>
	<section id="SH-scheduler" xreflabel="-SH scheduler">
	<title>Wensong's SH scheduler</title>
	<para>
Scheduling is based on the IP of the client.
Other than the few comments here, no-one has used the -sh scheduler.
The -sh (source hash) scheduler was originally intended 
for directors with multiple firewalls,
with the balancing based on hashes of the MAC address of the firewall. 
However with the scheduling based on the client IP, 
it would solve some of the problems that currently require persistence 
(<emphasis>i.e.</emphasis> having a client always go to the same realserver).
	</para>
	<para>
Here's Wensong's announcement:
	</para>
	<para>
Wensong Zhang <emphasis>wensong (at) gnuchina (dot) org</emphasis> 16 Feb 2001
	</para>
	<para>
Please see "man ipvsadm" for short description