This problem comes up for failover of large numbers of VIPs.
You need this if you are setting up a director without a VIP. You don't need a VIP on a director if you're forwarding packets by fwmark, or if machines are accepting packets by transparent proxy. (All the simple LVS setups use VIPs on the director. You can skip this chapter if you aren't using fwmark or transparent proxy.)
LVS rearranges the tcip connection that normally requires exchanges between 2 machines (client <->server) so that (for LVS-DR) the packets are moved in a circle between 3 machines (client->director->realserver->client). In the case of firewall mark (fwmark) or Transparent Proxy (Horm's method), the packets from the client are sent to and accepted by a machine (the director) which doesn't have the IP of the dst_addr. The routing and tcpip connections must not break the tcpip RFCs and the client must still think it is connected to one machine.
When setting up an LVS on a VIP, normal routing mechanisms will deliver the packet from the client (with dst_addr=VIP) to the director. Once the packet has arrived on the director, it will be accepted locally, as the director has the VIP on one of its ethernet devices. IPVS picks up the packet from there and forwards it to the realservers.
When the director is setup to forward fwmark'ed packets by LVS, it (usually) will not have an IP that matches the dst_addr of the client's packets. The client doesn't know that the LVS is setup with fwmarks and sends requests to one (or several) VIP:ports. The fwmark on the director could be put onto packets that consist of an arbitary grouping of IP:ports. The VIP:port that the client is connecting to, is actually on the realservers (which are not replying to arp requests), but there is no VIP on the director, just some iptables rules marking the packet and ipvs forwarding the marked packets. Without a VIP on the director, normal routing mechanisms will not send the packets to the director. How does a packet get to the director (or realserver), which is accepting packets for the VIP, if the host doesn't have the VIP on an eth device to respond to arp requests? You have to intervene to make sure the packets get to the director.
This solution is to configure the router by hand to forward packets with dst_addr to the director. These rules rely on forwarding the packet from the router (or test client) to a known MAC address on the director.
On the realservers in a LVS-DR LVS the routing is already handled for you. In LVS-DR when the director wants to send a packet to a realserver, it looks up the MAC address for the RIP and sends the packet with addresses CIP->VIP by linklayer to the MAC address of the realserver. Once the packet arrives at the realserver, processes listening to VIP:port pick up the packet.
These routing methods do not work well when director failover is required, as the router tables need to be programmatically updated about the identity of the new director. This requires code and you may not have access to the router tables.
|The original plan for vrrpd had a virtual MAC address assigned to an interface. This MAC address could then be moved to the new director on failover. Unfortunately Alexandre found this too difficult to code up due to lack of access to information about the hardware and the plan has not been implemented.|
Normally, once a packet arrives on the director, it is either delivered locally (if the dst_addr is on the director) or else routed somewhere (after consulting routing tables). In the case of a director forwarding by fwmarks, there is neither a VIP for local delivery, nor routing rules for forwarding packets with dst_addr=VIP. If you send a packet to a director configured this way, it will send arp requests "who has VIP, tell director".
Julian 09 Dec 2003
Currently, IPVS requires the traffic to VIPs to be locally delivered (LOCAL_IN). It means the VIPs should be local IPs or you have to use ip rules selecting table(s) with local route(s). IIRC, transparent proxy does not work for IPVS in standard 2.4 kernels as it is in DNAT form. This is a local route:
ip route add local 0/0 dev lo table XXX
Realservers can have fwmark rules to process packets with dst_addr=VIP. With LVS-DR the routing problem is already handled; the director sends the packets to the interface with the MAC address of the RIP (LVS-Tun also routes packets to the realservers).
Several methods are available to enable local delivery of a packet which has arrived on a machine which does not have the IP of the dst_addr.
If you are using a single (or small number of) VIPs, you can put these IPs on the director on an ethernet device or on lo.
If a range of addresses is required, an alias can be set to accept a network of addresses, without assigning an IP to the device
Horms horms (at) vergenet (dot) net 10 Apr 2000
ifconfig lo:0 192.168.1.0 netmask 255.255.255.0
You can now ping anything in the 192.168.1.0/24 network from the console.
Note: You can't ping any of those IP's from a remote host (i.e. after adding a route on the remote host to this network). If you put this network onto an eth0 alias (rather than an lo alias), it won't reply to pings from the console - presumably the ping replies in the lo:0 case are coming from 127.0.0.1.
For another example of routing to an interface without an IP, see routing to realservers from director in LVS-DR.
Here's the iproute2 method of getting all packets delivered locally. This is now the currently preferred method of arranging for packets to be accepted locally.
Julian 7 Jul 2002
The local routes are used for transparent proxy, for example:
The recipe is
ip rule add prio 100 fwmark 1 table 100 ip route add local 0/0 dev lo table 100
Note Joe: you might (should?) be able to route packets to the VIP this way, allowing packets for an IP which is not on the machine to be accepted by LVS.
If you have just 1 VIP, you don't need these rules, you can just setup the VIP on the director in the normal manner
ifconfig eth0:1 220.127.116.11 netmask 255.255.255.255
In many ways, having an IP on the director for no other reason than so that it can accept the packet, is a step backwards.
Joe 9 Jul 2002: is it possible/reasonable/sensible for ipvs, when forwarding fwmarks, to pick the marked packet up from the PREROUTING chain (or wherever it is)? Currently ipvs needs an IP or a functional equivelent of ipchains redirect to be able to get the packet.
Julian: Yes, it is useful for some cases. Such feature is in the bottom of our TODOs but requires many changes, including breaking the routing func prototypes. So, for now the answer is no :)
(for 2.2.x kernels) you can use Transparent Proxy (Horm's method) to accept packets for the VIP
Transparent proxy code was written for squids to allow them to accept packets destined for remote addresses. Only the transparent proxy code for kernel 2.2.x works for LVS. With this code the packet arrives with dst_addr=VIP and is picked up by ipvs. With the 2.4.x kernels, the packet arrives with dst_addr=127.0.0.1. This is fine for squids, but ipvs ignores the packet. It's unfortunate that by the time the netfilter people were informed that this was a problem for us, the new code was too entrenched and they didn't want to change it.
Transparent proxy in the standard 2.4.x kernels, where most future development of LVS will be, does not work for LVS. RedHat have patched their kernel so this is fixed (Mike McLean mikec (at) redhat (dot) com 4 Dec 2002).
|Balazs has patched the 2.4 kernel to restore the original transparent functionality.|
Use a fwmark service and be rid of your VIP on an interface all together.
Joe 26 Aug 2003
getting rid of the VIP altogether is still a bit of a problem isn't it? there are solutions that apparently came originally from Julian (urls in the archive were then listed).
http://marc.theaimsgroup.com/?l=linux-virtual-server&m=106020019020431&w=2 pretty straight forward and basically the way fwmarks work if you are using them for more than one IP address, which was the reason fwmarks were origionally added to the LVS code.
The route commands are needed because ipvs is called after routing takes place. I think that in the case of fwmarks it would be best to move the code to the prerouting stage to avoid the need for this. I.e. hook ip_vs_in into NF_IP_PRE_ROUTING instead of NF_IP_LOCAL_IN.
what will this get us? Will we still need the route command? Are you going to do implement it, or are you just thinking out loud?
Yes, it will remove the requirement for traffic to be local. I made the change - it is one line - and very briefly tested it. It seemed to work quite well. But it is a change that will most likely have side effects so it warrants further thought and investigation.
such move can allow IPVS not to require local delivery. There will be some issues with properly identifying the direction of the packets but it is possible to implement. The problem is that we are stuck with the netfilter hooks. If we move out of the hooks or if we add some changes to the kernel we can do everything including proper routing for inout packets (working with multiple ISPs), avoiding the LOCAL_IN->LOCAL_OUT problems that start to appear with 2.6, etc. May be we will need ROUTING hook. IIRC, fwmark is present in PRE_ROUTING but such move can create some compatibility problems, are all we ready for this?
In http://marc.theaimsgroup.com/?l=linux-virtual-server&m=106020171022117&w=2 the packets are delivered locally because of the "local" in
ip route add local 0/0 dev lo table 100
Again, this isn't really the way it was supposed to work AFAIR.
if/since Julian's routing method works, why do we need transparent proxy (if we ever did)?
The people have alternatives, all these methods differ in some way and can be selected according to their behaviour. IPVS is liberal for the local delivery method.
Note the main things:
the local delivery does not depend on fwmark, i.e. you can safely route locally without using fwmarks, e.g.
# select incoming packets by DADDR ip rule add prio 100 to VIP1 table 100 ip rule add prio 100 to VIP2 iif eth0 table 100 ip rule add prio 100 to VIP3 iif eth1 table 100 # and deliver them locally ip route add local 0/0 dev ANY_DEVICE table 100
Note ANY_DEVICE: it does not matter until you try to play with the device state, lo is preferred as it is always up
This routing method does not handle ICMP errors, because it is assumed the VIPs are not configured as local IPs. The current kernels have checks for local source IP when generating packets (icmp_send uses ip_route_output which has such checks starting from 2.4) and if they are not configured as local IPs, then the reply is not generated. So, there are some corner cases where the kernel does not like our local delivery methods. If that is considered a problem, better the VIP to be configured locally.
|Joe: If you're being secure about your LVS, you aren't going to have a route from the VIP on the director back to the outside world (see default gw for director with LVS-DR/LVS-Tun) and you won't be sending back icmp traffic anyhow).|
As for the original subject, the LVS directors can not be realservers, clients and backup servers at the same time for the same virtual service. The VIP must be announced only from one director. If the backup director has the VIP configured then it cannot communicate with other hosts from the cluster. Also, the backup server must not create ARP problem if the VIP is configured there.
Can you set up a squid then with this routing method, without using transparent proxy?
I thought the people know about/use such alternative:
may be someone has experience with this method and he can provide actual settings. It is useful in setups where the packet header must be preserved. IIRC, 2.4 TP breaks this rule.
Is this routing method a a generalised way of accepting packets on the director when using fwmark with LVS?
I was wondering that on the way home last night. I would suspect so. It has the potential to cover a lot of issues in a manner that is supported by stock kernels. That would be nice. But then again those issues may disappear if LVS was moved to prerouting.
For Ludo's reinJect forwarder the director needs to accept packets with dst_addr=0/0
I don't force the director to accept the packets as local packets. I modified the kernel to send all forwarded traffic into LVS. Thus I force the director to accept them as forwarded packets! This is the purpose of the "LVS" iptables target. No need for other fancy tricks, just match the traffic in iptables and use "-j LVS" as the target. This patch provides an iptables target called "LVS" which calls the entry function of LVS. Thus you can match the traffic to the VIP on the FORWARD hook of iptables:
#iptables -A FORWARD -t mangle -d <VIP> -m state --state NEW -j MARK - --set-mark 1 #iptables -A FORWARD -t mangle -m mark --mark 1 -j LVS
This last line mimics the behaviour of packets going to the director directly, it will call the LVS functions as if the packet was on the local-delivery path.
Q. Some demons listen only to specific IPs. What IP is the telnet/httpd listening to when it accepts a connection by transparent proxy?
A. It depends where you are when you make the connect request (this is for 2.2.x kernels).
You are on the console of a host and add x.x.x.111:http by transparent proxy and setup the httpd to listen to x.x.x.111:80. You cannot ping x.x.x.111. To connect to x.x.x.111:http you need to
# route add -host 192.168.1.111 lo
(adding a route to eth0 does not work).
If you go to an outside machine, you still cannot ping x.x.x.111 and you cannot connect to x.x.x.111:http unless you make the target box the default gw or add a host route to x.x.x.111.
If you now go back to the console of the transparent proxy machine and change the httpd to listen to 127.0.0.1:http (and not to x.x.x.111:http) you can still connect to x.x.x.111:http even though nothing is listening to that IP:port (linux tcpip does local delivery to local IPs). (You can also connect to 127.0.0.1:http, but this is not concerned with transparent proxy.)
Returning to the outside machine, you cannot connect to x.x.x.111:http.
The connections from the outside machine model connections to the director with the VIP by transparent proxy, while the connections from the console model the realserver which has a packet delivered from the director. On the realserver you could have your services listening to 127.0.0.1 rather than the VIP. You may run into DNS problems (see Running indexing programs) if the process listening to 127.0.0.1 doesn't know that it's answering to lvs.domain.org.
For other examples of routing and accepting packets without IPs, look at the section on default gw for LVS-DR.