12. LVS: LVS clients on Realservers

This HOWTO is a little disorganised here. Read the section on non-lvs clients on realservers too.

12.1. Do you really need LVS clients on the realserver in a 3-Tier setup?

Thomas Champagne 10 Apr 2007

There are two services on each servers : Apache and Mysql. Each service have its IP and have a VIP address : The problem : Accessing services from a remote client (outside the cluster) to the VIP is ok. But when the client is the cluster, it always connects on the local machine.

people coming to this mailing list are always trying to balance the 3rd-tier (in your case, mysql). If this was easy to do, that would be one thing, but with the current design of LVS, it's next to impossible.

The first connection (here to apache) is balanced, so that the connection to the 3rd-tier (here to mysql) will be (at least reasonably) balanced. So you have the balanced apache on your realserver connect to the local mysql.

To have a valid realserver, both apache and mysql have to be up. Maybe people think then that, running two services, there's twice the chance of the realserver going down and for the same hardware their 99% uptime realserver is now a 98% uptime realserver. So they have to be prepared for apache_1 to connect to mysql_2. That would be true if the only failures on the machine were the demons dying and that they died independantly. I don't run a production internet site, so I don't have any numbers on failures in those situations, but it's not often that demons for no reason at all just die or stop answering. Most failures seem to be disks and fans dying, memory chips going bad resulting in corrupt files being written, loss of network connectivity to the outside world (the backhoe problem) and surprisingly routers dying. Rarely does the demon die. In which case requiring two demons to have a functioning realserver may not change the downtime a whole lot. There's many other demons running on the realserver which are part of unix, and which are required for a running machine, so you actually need maybe 10-20 demons for a functioning realserver, in which case an extra one (mysql) isn't going to make a whole lot of difference.

But let's say a functional realserver will have twice the downtime because it requires two functioning realservsers. Well that's high availability life when you have a service that requires multiple demons. You have to fail out the realserver when either service goes down. That's all.

Last exchange I had on this subject, the person didn't have any technical reason why they needed to balance the 3rd-tier. They just wanted it. So I haven't been convinced that you must have a balanced 3rd tier.

12.2. Realserver as LVS client in LVS-NAT

The LVS-mini-HOWTO states that the lvs client cannot be on the realservers, i.e. that you need an outside client. This restriction can be relaxed under some conditions.

12.2.1. Jacob Reif's solution

This came from a posting by Jacob Reif Jacob (dot) Rief (at) Tiscover (dot) com 25 Apr 2003.

It is common to run multiple websites (Jacob has 100s) on the same IP, using name based http to differentiate the websites. Sometimes webdesigners use some kind of include-function to include content from one website into another, by means of server-side-includes. (see http://www.php.net/manual/en/function.require.php) using http-subrequests. The include requires a client process running on the webserver, to make a request to a different website on the same IP. If the website is running on an LVS, then the realservers need to be able to make a request to the VIP. For LVS-DR and LVS-Tun this is no problem: the realserver has the VIP (and the services presented on that IP), so requests by http clients running on the realserver to the VIP, will be answered locally.

For LVS-NAT, the services are all running on the RIP (remember, there is no IP with the VIP on realservers for LVS-NAT). Here's what happens when the client on the realserver requests a page at VIP:80

realserver_1 makes a request to VIP:80, which goes to the director. The director demasquerades (rewrites) dst_addr from VIP to RIP_2. realserver_2 then services the request and fires off a reply packet with src_addr=RIP_2, dst_addr=RIP_1. This goes to realserver_1 directly (rather than being masqueraded through the director), but realserver_1 refuses the packet because it expected a reply from VIP and not from RIP_2.

           +-------------+
           |     VIP     |
           |  director   |
           +-------------+
            ^           |
            |           |req
            |req        v
  +-------------+     +-------------+
  |   RIP_1     |<--- |   RIP_2     |
  |  Realserver | ans |  Realserver |
  |  = client   | wer |  = server   |
  +-------------+     +-------------+

Here are the current attempts at solutions to the problem, or you can go straight to Jacob's solution

  • Using the /etc/hosts solution of Ted Pavlic for indexing, doesn't work as there are 100s of domain-names registered (rather than just one) onto the same IP-address.
  • Julian's solution removes the local routing (as done for one network LVS-NAT) and forces every packet to pass through the director. The director therefore masquerades (rewrites) src_addr=RIP_2 to VIP and realserver_1 accepts the request. This puts extra netload onto the director.

               +-------------+
               |    <vip>    |
               |  director   |
               +-------------+
                |^         |^
             ans||      req||ans
                v|req      v|
      +-------------+     +-------------+
      |  <rip1>     |     |  <rip2>     |
      |  Realserver |     |  Realserver |
      |  = client   |     |  = server   |
      +-------------+     +-------------+
    

Jacob's solution: The solution proposed here does not put that extra load onto the director. However each realserver always contacts itself (which isn't a problem). Put the following entry into each realserver. Now the realservers can access the httpd on RIP as if it were on VIP.

realserver#  iptables -t nat -A OUTPUT -p tcp -d $VIP --dport 80 -j DNAT --to ${RIP}:80

12.2.2. Carlos Lozano's solution

Carlos Lozano clozano (at) andago (dot) com 02 Jul 2004

We have a machine that must be both a client and director. The two problems to solve are

  • ipvs doesn't handle loopback packets
  • the return packets are handled by ip_vs_in, and not by ip_vs_out.

I have written a ip_vs_core.c.diff (http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/files/ip_vs_core.c.diff) patch for 2.4.26 using IPVS-NAT. It works correctly in my testcase. The schema is:

External client ---> IPVS:443 --> Local:443 ---> IPVS:80 ---> RealServer

The problem happens when Local:443 goes to localIPVS:80, because the packet is discarded by the next lines in ip_vs_core.c:

if (skb->pkt_type != PACKET_HOST) || skb->dev == &loopback_dev) { 
        IP_VS_DBG(12, "packet type=%d proto=%d daddr=%d.%d.%d.%d ignored\n",
                       skb->pkt_type,
                       iph->protocol,
                       NIPQUAD(iph->daddr));
        return NF_ACCEPT;
}  

Ratz

Why do you need this? Seems like a replication of mod_proxy/mod_rewrite. Your patch obviously makes it work but I wonder if such a functionality is really needed.

We are using it like an ssl accelerator. The first ipvs (443) sends the request to localhost:443 or to a different director, and the second ipvs(80), distributes the traffic in the realservers.

Ext. client --> IPVS:443 --> Local:443 --> IPVS:80 --> RealServer1
                         |-> Director2:443         |-> RealServer2

In the first case, it is a scheme "external machine client+director", but in the second case it is a "client+director in the same machine". This part of the patch only solves the output packet, the return is handled by the second part of the patch. (what is really a bad hack)

For a mini-HOWTO on using this patch see https_on_localnode. Matt Venn has tested it, it works using the local IP of the director, but not 127.0.0.1.

12.2.3. Graeme Fowler's proposals, Rob Wilson's help and Judd Bourgeois' modification

Note
Graeme came up with the original idea, Rob Wilson proposed a solution that didn't quite work, Graeme fixed it and then Judd saw an easier solution for the case of only one VIP. I've somewhat mashed the history in my write-up (sorry).

Graeme Fowler is looking for a solution for realservers that can't use iptables

Graeme Fowler graeme (at) graemef (dot) net 11 Feb 2005

After a long day spent tracing packets through the LVS and netfilter trail whilst trying to do cleverness with policy routing using the iproute2 package, I can condense quite a lot of reading (and trial, mainly followed by error!) down as follows:

  1. FastNAT, as provided by the iproute2 package, is incompatible with the netfilter conntrack module. As most LVS-NAT systems are also doing masquerading or SNAT for outbound connections from the realservers, the conntrack module is loaded automagically - thus FastNAT via policy routing simply won't work.
  2. Try as you might to do SNAT, it has to be done in the 'nat POSTROUTING' chain - and the packets being processed via LVS don't traverse this chain, because they're hooked right out of the nat POSTROUTING table and are processed by ip_vs_port_routing instead, which then plonks them back on the wire magically without further processing. So SNAT won't work either.
  3. Using fwmarks seems inconclusive, because ultimately (in my case at least) I want to SNAT the packets in some way, and point (2) above precludes that.
  4. "Internal" VIPs. This one just came to me so please feel free to try it, I'm away from my development lab and it might prove to be a complete lemon anyway! Here's the idea: on the director, for every "external" VIP configuration which faces the clients (say VIP1) another VIP - iVIP1 - is also configured with identical realservers but attached to the _internal_ interface. The principle difference is that this VIP uses LVS-DR, because - for obvious reasons - the realservers can respond directly to each other. The only complicated bit is setting up a netfilter rule to do DNAT as the packets arrive - trap all packets destined for VIP1 and DNAT them to iVIP1. Ensure VIP1 is a loopback alias on your realservers as per normal DR configuration, and in theory at least the realservers should then be able to talk to each other as clients of a VIP.

Conclusions: mixing policy routing and LVS sounds like a great idea, and probably is if you're using LVS-DR or LVS-TUN. Just with LVS-NAT, it's a no-go (for me at the moment, anyway).

Graeme Fowler graeme (at) graemef (dot) net 2005/03/11

Solved... was Re: LVS-NAT: realserver as client (new thread, same subject!)

I've solved it - in as far as a proof of concept goes in testing. It's yet to be used under load though; however I can't see any specific problems ahead once I move it into production.

The solution of type "4" above involves a "classic" LVS-NAT cluster as follows. Nomenclature after DIP/RIP/VIP classification is "e" for external (ie. public address space), "i" for internal (ie. RFC1918 address space) and numbers to delimit machines.

Director: External NIC eth0 - DIPe, VIP1e
          Internal NIC eth1 - DIPi

Realserver 1: Internal NIC eth1 - RIP1

Realserver 2: Internal NIC eth1 - RIP2

In normal (or "classic" as referred to above) LVS-NAT, the director has a virtual server configured on VIP1e to NAT requests into RIP1 and RIP2. Under these circumstances, as discussed in great length in several threads in Jan/Feb (and many times before), a request from a realserver to a VIP will not work, because:

src         dst
RIP1 SYN -> VIP1e
RIP1 SYN -> RIP2  (or RIP1, doesn't matter)
RIP2 ACK -> RIP1

at this point the connection never completes because the ACK comes from an unexpected source (RIP2 rather than VIP1e), so RIP1 drops the packet and continues sending SYN packets until the application times out. We need a way to "catch" this part of the connection and make sure that the packets don't get dropped. As it turns out, the hypothesis I put forward a month ago works well (rather to my surprise!), and involves both netfilter (iptables) to mangle the "client" packets with an fwmark, and the use of LVS-DR to process them.

What I now have (simplified somewhat, this assumes a single service is being load balanced in a very small cluster):

Director: External NIC eth0 - DIPe, VIP1e
          Internal NIC eth1 - DIPi

Realserver 1: Internal NIC eth1 - RIP1
              Loopback adapter lo:0 - VIP1e

Realserver 2: Internal NIC eth1 - RIP2
              Loopback adapter lo:0 - VIP1e

The on the director:

/sbin/iptables -t mangle -I PREROUTING -p tcp -i eth1 \
   -s $RIP_NETWORK_PREFIX -d $VIP1e --dport $PORT \
   -j MARK --set-mark $MARKVALUE

and we need a corresponding entry in the LVS tables for this. I'm using keepalived to manage it; yours may be different, but in a nutshell you need a virtual server on $MARKVALUE rather than an IP, using LVS-DR, pointing back to RIP1 and RIP2. Instead of me spamming configs, here's the ipvsadm -Ln output:

director# ipvsadm -Ln
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port   Forward Weight ActiveConn InActConn

FWM  92 wlc
  -> $RIP1:$PORT           Route  100    0          0
  -> $RIP2:$PORT           Route  100    0          0

(empty connection table right now)

...and believe it or not, that's it. Obviously the more VIPs you have, the more complex it gets but it's all about repeating the appropriate config with different RIP/VIP/mark values.

For ease of use I make the hexadecimal mark value match the last octet of the IP address on the VIP; it makes for easier reading when tracking stats and so on.

I've not addressed any problems with random ARP problems yet because they haven't yet occurred in testing; and one major bonus point is that if a connection is attempted from (ooh, let's say, without giving too much away) a server-side include on a virtual host on a realserver to another virtualhost on the same VIP, then it'll get handled locally as long as Apache (in my case) is configured appropriately.

An interesting, and useful, side-effect of this scheme is that when a realserver wants to connect to a VIP which it is handling, it'll connect to itself - which reduces greatly the amount of traffic traversing the RS -> Director -> RS network and means that the amount of actual load-balancing is reduced too.

Rob Wilson rewilson () gmail ! com 2005-08-09

We have an LVS server for testing which is handling 2 VIPs through LVS-NAT (using keepalived). Each of the VIPs currently points to 1 real server - it's a one realserver LVS - just in testing phase at the moment. Both real-servers are on the same internal network.

VIP1 -> Realserver1 
VIP2 -> Realserver2 

We'd now like Realserver2 to be able to connect to Realserver1 via VIP1. I was able to accomplish this following the solution provided by Graeme Fowler: http://www.in-addr.de/pipermail/lvs-users/2005-March/013517.html However, external connections to VIP1 no longer work while that solution is in place. Dropping the lo:0 interface assigned to VIP1 on Realserver1 fixes this, but then breaks Realserver2 from connecting.

Graeme Fowler graeme () graemef ! net 2005-08-10

Are you doing your testing from clients on the same LAN as the VIP, by any chance? Have you set the netmask on the lo:0 VIP address on the realservers to 255.255.255.255? I can see that making it a /24 mask - 255.255.255.0 - might result in the realservers thinking that the client is actually local to them, thus dropping the packets.

Rob Wilson rewilson () gmail ! com 2005-08-10

That's exactly it. I was hoping it was something daft I misconfigured, so.. wish granted :) It works perfectly now. Thanks for your help (and coming up with the idea in the first place!).

Judd Bourgeois simishag (at) gmail (dot) com 19 Jan 2006

I am running LVS-NAT, where the director has two NICs (and two networks). The VIP is on the inside of the director (in the RIP network) (Joe - this functions as a two network LVS-NAT). Some of my web sites proxy to "themselves" within a page (proxy, PRPC, includes, etc.). The symptom is that the proxy functionality breaks. The real server does a DNS lookup for the remote site, gets back the VIP, and hangs waiting for a response.

Previously I solved this problem by putting the site names and 127.0.0.1 in /etc/hosts (as mentioned in this section and in indexing), but after reading the FAQ more carefully tonight, I solved it by simply adding the VIP as a dummy interface on all of the realservers. This appears to be addressed in Graeme's solution, but he runs an extra iptables command on the director. Is this really necessary? Won't any packets originating on the real servers and destined for the VIP be handled by the dummy interface on the real server, without being put on the wire?

It all appears to work fine and has the added nice effect of forcing each realserver to proxy to itself when necessary.

Graeme Fowler graeme (at) graemef (dot) net 1/20/06

What you've suggested is the "single VIP" case of the above idea. It worked for me, it seems to have worked for Rob Wilson, so casting aside the fact that you might have multiple VIPs frontending multiple realserver clusters (as is my case) I can't see any reason why you shouldn't just go for it.

Judd Bourgeois simishag (at) gmail (dot) com 20 Jan 2006

Right. In fact, after reading your solution again, I think your solution is the more useful general case, where there may be an arbitrary number of VIPs, RIPs, and groupings of real servers (which I don't need right now, but I've realized I will need it down the road). I have some Alteons that call these real server groups, not sure what the LVS equivalent is, but here's a short illustration.

Assume 1 director, 3 VIPs, 4 RIPs on 4 real servers. Assume we have real server groups (RG) RG1 (RIP1-2), RG2 (RIP3-4), RG3 (RIP1-4). VIP1 goes to RG1, VIP2 goes to RG2, VIP3 goes to RG3.

In my solution, servers in RG1 can simply put VIP1 and VIP3 on dummy interfaces, but for proxy requests they will only be able to talk to themselves. They will not be able to talk to VIP2. All servers should be able to talk to VIP3. Your solution solves this by using fwmark.

This is a fairly common problem with NAT in general that I have to deal with a lot. Basically, the NAT box will not apply NAT rules for traffic originating and terminating on the NAT box. I recall that one workaround for this is to use the OUTPUT chain, I can't find the rules at present but it seemed to work ok.

Ratz 21 Jan 2006

There is no LVS equivalent of "real server groups". But I think Alteon (Nortel) only has this feature for adminstrative reasons, so you can assign a group by its identifier to a VIP. What I would love to see with LVS is the VSR approach and a proper and working imlementation of VRRP or CARP. I've just recently set up a 2208 switch using one VSRs and 2 VIRs, doing failover when either the link or the DGW is not reachable anymore. The sexy thing about this setup is that you don't need to fiddle around with arp problems and you don't need to have NAT, so balancing schedulers can get meaningful L7 information. Alteon's groups are just an administrative layer with an identifier. We could add such a layer in ipvsadm and the IPVS code, however what benefit do you see in such an approach?

One problem I see with the Alteon approach is that if you add a RS to a group, to my avail it can only pertain to one RG. This is a bit suboptimal if you want to use RS as spillover servers on top of their normal functionality. Regarding your example, I'd like to say, that RG1 is a spillover group for RG3. You can specify (IIRC) a spare server of each RG in AltheOS, however not cross-RG wise. Correct me if I'm wrong, please.

Judd

Graeme's solution solves this by using fwmark.

Yes, fwmark solves almost all problems

Graeme Fowler graeme (at) graemef (dot) net 21 Jan 2006

Judd doesn't need fwmark, because in a single VIP LVS-NAT, with that VIP assigned locally on the realservers on a dummy interface (or loopback alias), the realservers will always answer requests for the VIP locally.

In a two-VIP case (the simplest multiple), if you have two "groups" [0] of realservers, then the director becomes involved by virtue of it being the default gateway for the realservers. At the point the director gets involved you need some way of determining which interface your traffic is on, and segregation via fwmark seems the most elegant way to achieve this (given the known and predictable failure of realservers as clients in LVS-NAT). I know I struggled for months before realising that I could, in effect, combine the use of NAT via an external interface for my real clients, and DR via an internal interface for my "realservers as clients".

[0] I use the word groups in quotes and advisedly, since it appears that Alteon use that in their setup terminology from previous posts.

12.3. Realserver as LVS client in LVS-DR

The topic came up with a posting about an LVS of httpd which generated mail (presumably a webmail LVS). The poster reasonably wanted the e-mail to be balanced by the same director. The problem is that the mail is being sent to the VIP (on the director) from a machine (the realserver) which also has the VIP. The mail will be accepted locally on the realserver, rather than being sent to the director to be load balanced. If you don't attempt to load balance the mail requests, then if there are enough requests, then statistically (over a long enough period) the http traffic will be balanced and the mail coming from each realserver will be approximately balanced. This posting started an off-line discussion with Horms and Ludo about ways to have clients on the director and on the realservers. The outcome was an idea by Ludo, which no-one has got to work yet (Horms tried something similar a while ago and couldn't get it to work either) and a proposal by Julian, which seems likely to work.

Dan kasper37 (at) speakeasy (dot) net 1 Oct 2005

Is there a way to connect from one of the real servers hosting web to the VIP:smtp service? The problem is that telnet to VIP:smtp from one of the web real servers is going to try to connect to smtp locally. I'm actually talking about any virtual service in general. Here's what we've got so far (brace yourself):

# ip route add x.x.x.70 dev eth1 table local tos 4 scope link src y.y.y.16
# iptables -A PREROUTING -t mangle -p tcp --dport 25 -j TOS --set-tos 4
# ip route ls table all| grep x.x.x.70
x.x.x.70 tos reliability via y.y.y.16 dev eth1  table local  scope link  src y.y.y.16
local x.x.x.70 dev lo  table local  proto kernel  scope host  src y.y.y.70

These commands are run on the real server (for the sake of brevity I only included the commands for one real server, but imagine these being run on all real servers with the correc RIPs substituted for y.y.y.16). With these rules, packets are being output to the network as hoped, but the problem is that the /source/ address is x.x.x.70 instead of the real server's RIP. If there was a way to force the kernel to send the request from the real servers RIP, this may actually work. Any ideas?

Ludo Stellingwerff ludo (at) protactive (dot) nl Oct 2 2005

Dan, try it like this: (without your routing table hacks)


#iptables -A PREROUTING -t nat -i lo -p tcp -d <local_ip> --dport 25 - -j ROUTE --oif  eth0

I assume you have fixed the ARP problems. Therefore the above ROUTE target should work. If it doesn't I'll have to think of a solution using the "ip rule" command in combination with firewall marking.

Joe (off-list)

Can either of you think of a situation where it would be useful to have the director also be a client?

Ludo

Besides testing purposes, I can only think of two reasons:

  • Flexibility - for all those situations we can't guess because of lack of imagination
  • In my line of work: When you combine the LVS-director with a proxy server.

Most companies (including my own) seem to want to integrate all difficult routing problems on the company's gateway/router/firewall. (My job is making this possible in a user-friendly manner, interfaces:) One of the things many firewalls do is providing proxy-services to the internal users. If you integrate LVS-director services on this firewall, proxyusers should be able to access these services too. Thus the director functions as a client.

Joe

The case of allowing the realserver to be a client seems more useful. The posting today on lvs-users was of a 3-Tier site where the LVS'ed httpd sends mail. The poster wants to LVS the smtp too, but the realservers connect to the local VIP:smtp.

Right, this seems even more common.

Is there any routing that's done down in the depths of LOCAL_IN? What happens to a packet desting for a local IP? Does it appear in the routing diagram, or does it just never get out. Can you fwmark a packet to dst=LOCAL_IP:smtp and get it out somehow?

Ludo

For email this question completely depends on the way the client-software presents it email: There are generally two possibilities:

  • Using a smtp-client
  • Using a local postdrop (Presenting mail to the local MTA without using network traffic)

Most localservices will use a local postdrop, these can't be pulled out of the local machine easily. But clientsoftware using the SMTP protocol will normally connect to: localhost:25.

To answer your question: traffic to the localhost is always done via the loopback network device (dev lo). This can be fwmarked and rerouted without too much of a problem.

#iptables -A PREROUTING -t nat -i lo -p tcp -d <local_ip> --dport 25 - -j ROUTE --oif  eth0

local_ip is the VIP not the RIP. I'm using it as the -d (destination) of the connection. The part behind the -j might need some more thinking/testing, but this should work. If ROUTE doesn't work, I can think of some more complex solutions to get the packet to the director, through fwmarking and using Policy Routing.

Let's follow a packet:

  • SMTP-client (on Realserver) wants to send a email
  • SMTP-client sends a SYN-packet to VIP:25.

    The src_addr of this smtp-client should be the RIP of the realserver that handled the original http request. Maybe you'll need to enforce the fact that the src_addr of this realserver's packet is the RIP:

    #iptables -A POSTROUTING -o lo -d <VIP> -p tcp --dport 25 -j SNAT --to-source <RIP>
    
  • This packet gets sent through the loopback device. (Because the VIP is local to the realserver when using LVS/DR)
  • the PREROUTING nat rule above matches, stealing the packet from the normal routing.
  • The packet is send directly through the output function of dev eth0.

I'm not sure if this and the next step work correctly. If not, I'll have to go to the Policy Routing solution.

  • This output driver asks for the MAC address of the VIP through ARP requests.
  • the realserver doesn't answer because the ARP problem is solved.
  • the Director does answer, so the packet is sent to the director, who balances the packet.

The packets then are

  • RIP -> VIP:smtp
  • The packet goes to the VIP on the director and then you'll get a reply packet from the MTA: VIP:smtp -> RIP

Normally you don't want RIP to be routable on the internet, but in this example the VIP host(s) do know where this RIP is, because it is in the same local network. Lets make this a complete example:

                    Internet
                        |
                Gateway: 1.2.3.1
                        |
                        |
       /----------------+--------------------\
       |                |                    |
       |          Director (LVS/DR)          |
       |            VIP:1.2.3.4              |
       |           LAN1:192.168.1.1          |
       |                                     |
       |                                     |
Realserver1: 192.168.1.2       Realserver2: 192.168.1.3

To make this setup work the realservers use the loopback trick to prevent arp problems: The VIP is on their loopback devices. The RIP is on device eth0 of the realservers. Just a normal LVS/DR setup will do.

Now you want Realserver1 to connect to the balanced service smtp(port 25) on the director. Using the following two iptables rules should do the trick:

#iptables -A POSTROUTING -o lo -d 1.2.3.4 -p tcp --dport 25 -j SNAT --to-source 192.168.1.2
#iptables -A PREROUTING -i lo -d 1.2.3.4 -p tcp --dport 25 -j ROUTE --oif eth0

Or you might want to give a more generic solution to the Realservers connection to the director:

#iptables -A POSTROUTING -o lo -d 1.2.3.4 -j SNAT --to-source 192.168.1.2
#iptables -A PREROUTING -i lo -s 192.168.1.2 -d 1.2.3.4 -j ROUTE --oif eth0

And for Realserver2:

#iptables -A POSTROUTING -o lo -d 1.2.3.4 -j SNAT --to-source 192.168.1.3
#iptables -A PREROUTING -i lo -s 192.168.1.3 -d 1.2.3.4 -j ROUTE --oif eth0

Any client on the realservers that wants to connect to the VIP's services can work now.

To test you can try: "telnet 1.2.3.4 25" from one of the realservers. The MTA should react with something like:

HELO example.com                           <--- command you give the MTA.
220 realserver1.example.com ESMTP Postfix  <--- MTA's answer

Joe: Let's say there's no way to do it with iptables. Is it possible to write a piece of code that does what we want outside of iproute2/iptables?

Ratz: Basically, what you want is to trick a RS FIB to handle a mark'd packet with scope local into thinking its realm is scope global and then route t out on the interface to only wait that it comes back with src mac == mac of RIP, dest mac == mac of RIP, src IP = RIP, dest IP = VIP. My 5 minutes of thinking on the problem suggest that it's unsolvable with conventional methods without causing major breakage in the FIB of the routing cache.

Julian Anastasov ja (at) ssi (dot) bg 3 Nov 2005

One can try the "loop" flag (send-to-self) feature at routing level: http://www.ssi.bg/~ja/#loop. There is a text file that explains its usage.

This patch changes the way how packets to local IPs are routed. The trick is that it is done only for outgoing routes. If applied to RS it can establish connection from RIP to VIP with the assumption the packets are looped via crossover cable or hub. In our case they will pass director and will come back to RS with daddr=VIP. At least, this is the theory, only for DR method. Not tested. The incoming connection is served as usually, the loop patch allows it to come with saddr=RIP. One can try it after making sure the director will not drop the packet due to rp_filter checks if packet from RIP comes from wrong interfaces. If the interface in director is single then there is no problem. The RS can look in this way (VIP and RIP on different eth devices):

eth0: VIP (for DR the ARP problem should be solved with solutions that work for eth devices), for traffic from director to RS (RIP->VIP) eth1: RIP: for outgoing traffic (RIP->VIP)

Such RS boxes will have loop=1 on eth0 and eth1 and should be protected by firewall because there is a risk they to accept unwanted UDP packet from RIP to VIP from world. Should be easy if reverse path checks are done only in border firewall and not in director and RSs.

I'm not pushing it for inclusion as it is one big hack but Anton Blanchard is very active in this:

http://marc.theaimsgroup.com/?l=linux-netdev&m=106209315512638&w=2
http://marc.theaimsgroup.com/?l=linux-netdev&m=110109969803330&w=2

and once DaveM said he will review it, so he knows about it

Ratz: What about medium_id issues? Does it work with bonding interfaces?

May be it should work. medium_id does not play here, loop works just after fib_lookup and accepts packets from local source IP, for other traffic loop=1 does not modify behaviour.

12.4. Markus's thoughts

Markus Hofer hofmarkus (at) gmail (dot) com 26 Aug 2011

Solution are:

  1. Change host entry for services to other realserver.

    Negative: Problem is if you have a lot of services with different DNS-Names and you have to insert every new services in every realserver (or make a little DNS-Server in the realserver-net), but it isn't nice

  2. Julian's solution removes the local routing (as done for one network LVS-NAT one network LVS-NAT) and forces every packet to pass through the director. The director therefore masquerades (rewrites) src_addr=RIP_2 to VIP and realserver_1 accepts the request. This puts extra netload onto the director.

              +-------------+
              |   <vip>     |
              |  director   |
              +-------------+
               |^         |^
            ans||      req||ans
               v|req      v|
     +-------------+     +-------------+
     | <rip1>      |     | <rip2>      |
     |  Realserver |     |  Realserver |
     |  = client   |     |  = server   |
     +-------------+     +-------------+
    

    Look at: http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.lvs_clients_on_realservers.html and http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.lvs_clients_on_realservers.html

    Negative:

    • Every traffic goes over the Loadbalancer (director)
    • every backup
    • every rsync
    • every ssh, scp
    • I couldn't logon via SSH from one realserver to another. I must insert this with a internal service on director.
  3. Make NAT on realserver: Look at: http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.lvs_clients_on_realservers.html

    Jacob's solution: The solution proposed here does not put that extra load onto the director. However each realserver always contacts itself (which isn't a problem). Put the following entry into each realserver. Now the realservers can access the httpd on RIP as if it were on VIP.

    realserver#  iptables -t nat -A OUTPUT -p tcp -d $VIP --dport 80 -j DNAT --to ${RIP}:80
    

    Negative:

    • The logic of the loadbalancer (director) you insert in the realserver
    • you must do it for every service and
    • for every different ip
  4. 4. It is not possible to insert a iptable-rule on director "every traffic from the realserver-net --> (to) the realserer-net (from one realserver to another realserver)", so that this traffic receive a NAT URL, then the traffic goes back from realserver-2 to director and than to realserver_1: Like this:

        IPTABLES:         -A POSTROUTING (or PREROUTING??) -s 192.168.0.0/255.255.255.0 -p tcp -j SNAT --to-source 192.168.200.5
    

    
                +-------------+
                |   <vip>     |	192.168.200.15 (service meteo.example.com)
                |  director   |	192.168.200.5  (VIP)
                +-------------+
                 |^         |^
              ans||      req||ans
                 v|req      v|
       +-------------+     +-------------+
       | <rip1>      |     | <rip2>      |
       |  Realserver |     |  Realserver |
       |  = client   |     |  = server   |
       +-------------+     +-------------+
        192.168.0.10        192.168.0.20
    

    Negative: nothing (or I can't find it)

    Positive:

    • only one entry to change the settings
    • only the VIP traffic goes from realserver_1 <--> VIP<---> realserver_2