27. LVS: Transparent Bridging

Here's a summary of bridging as it relates to LVS directors with help from Julian and Joe Cooper joe (at) swelltech (dot) com Apr 2002. We haven't done anything thing with transparent bridging in LVS yet, but the subject comes up in the mailing list often enough to warrant some info in the HOWTO.

The director sits between 2 networks, the realserver network and the outside world. Bridging has been proposed several times on the list for the director as a way of getting packets between the realservers and the outside world. Initially I thought that bridging could be used to send packets through the director to 0/0 from a realserver in LVS-DR, thus solving the problem now solved by the martian modification.

A bridge is a layer-2 device for connecting 2 physically separate networks. Being a layer-2 device, a bridge only looks at the MAC addresses on a packet. The bridge doesn't look at the IPs and has no information about routing at the IP level. Here's a 2 NIC bridge connecting 2 networks.

     network A
     -------------------------------
                  |
                  |
                  | eth0
            -------------
           |             |
           |    bridge   |
           |             |
            -------------
                  | eth1
                  |
                  |
     --------------------------------
     network B

In one implementation (the transparent bridge) the bridge learns the network location of hosts (i.e. which NIC the host is attached to) by inspecting the source MAC addresses of packets.

Since the bridge only inspects the MAC addresses on packets, the IPs on the hosts in network A and network B, can belong to the same or different IP networks/netmasks. A bridge can be used to separate traffic. If all hosts are on 192.168.1.0/24 but most of the packets are passed between 2 hosts, these two hosts can be put on one of the networks and the rest on the other network. At the same time the bridge connects separate networks, without adding route table entries on the hosts in the two networks. So bridging allows connection of different physical networks without requiring route entries (needed if a router had been used instead), but keeps the traffic off networks that don't need to hear them. Not being a router, the bridge is not seen by traceroute.

About transparent bridging from the howstuffworks site

http://www.howstuffworks.com/lan-switch4.htm

and from cisco (including an explanation of the spanning tree algorithm)

http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/transbdg.htm

About bridging from the internet encyclopedia at freesoft

http://www.freesoft.org/CIE/Topics/30.htm (site down 14 Sep 2004)

transparent proxy with bridging from the Transparent Proxy with Linux and Squid mini-HOWTO http://www.tldp.org/HOWTO/mini/TransparentProxy-7.html

In Linux, the first bridges were implemented by proxy-arp and is called pseudo-bridging. Proxy-arp works only for IPv4.

http://www.tldp.org/HOWTO/Adv-Routing-HOWTO-16.html (from the Advanced Routing HOWTO).

Here's some more on proxy-arp.

http://www.sjdjweis.com/linux/proxyarp/

With proxy-arp running on the setup above, eth1 would be configured to reply to arp requests for an IP on network A allowing packets from network B to be sent to that host on network A.

With proxy-arp, packets are sent through the tcpip stack and routing tables on the bridge host. These packet can be filtered (iptables/ipchains), but also martian packets will be recognised and dropped.

Kate (aka John Looney) asked if bridging could be used to put a director infront of a functioning server, to make an LVS, without breaking service to the clients accessing that server.

John P. Looney 18 Apr 2002

The director is configured to listen for the IPs of the realservers at the internal side of the network bridge, and pass them on, with some intelligence.

     RS1    RS2    RS3
     |       |      |
     \--------------/
             |
         DR/Bridge
             |
          router
             |
          Clients

  RS1=original webserver (ip 293.2.2.1/26)
  RS2=new webserver (ip 293.2.2.2/26)
  RS3=new webserver (ip 293.2.2.3/26)
  DR =Director (ip 293.2.2.4/26)
  router (ip 293.2.2.63/26 and 293.2.3.63/26)

Note the router is not on the same physical network as the realservers, but is on the same logical network.

So, when a connection comes in for RS1 (293.2.2.1), the router sends out an ARP request. The DR answers with it's own MAC, and transparently forwards all

Julian Anastasov ja (at) ssi (dot) bg 18 Apr 2002

not with the director's MAC when bridging is used.

connections on to RS1/RS2/RS3 depending on it's load balancing algorithm. The realservers still have their gateway set to be the router.

By using the bridging code you have to stick with the following rules:

  1. it is Layer 2, i.e. the decisions to do something with the packet are based entirely on the link layer protocol info (until you patch the code, of course)

  2. at link layer level we have broadcast/multicast/unicast addresses

  3. all received non-unicast frames are passed to the IP stack and to all other bridge ports

  4. all unicast frames are passed to the IP stack or to the appropriate bridge port according to the destination link layer address

  5. Linux IP does not accept packets destined to foreign lladdr (for ethernet, link layer address == MAC).

  6. Linux does not send ICMP replies for frames not destined to our lladdr (one of the reasons not to see ICMP errors against UDP broadcsts, for example, for missing listener)

  7. Linux TCP accepts frames destined only to our lladdr

  8. Linux does not forward packets not destined to our lladdr

To put this in the LVS space (we assume the Director is using bridging and is between the uplink router and the realservers):

- according to (3) we can't stop the broadcast ARPs reaching the realservers and they to reply to the requestor (the router), i.e. we can't avoid the ARP problem for DR and TUN methods.

- according to (4) the realservers can reach the router directly without disturbing the director's rp_filter. As result, the bridging helps the director to pass the LVS-DR replies from the realservers to the uplink router

- the uplink router sees one LAN because the bridging code passes all frames preserving the original source link layer addresses

We can say that the default bridging behavior is not the desired one for all cases. There are some useful modes we can require from the bridging. For example, in one mode we can grab all IP packets (even packets destined to foreign lladdrs) and to feed them to the upper layers and to rely on the proper routing rules for filtering, etc. The bonus is that you don't need to place your IPs, routes, etc on the bridging interfaces, you don't need to implement firewalling specificaly designed for the bridged ports, etc, etc.

Joe Cooper joe (at) swelltech (dot) com CONFIG_NET_DIVERT is the IP packet diverter that allows one to configure selective redirects from a bridged interface, so that it can then be REDIRECTED or whatever by the iptables rules. Benoit Locher wrote it and his homepage about the project is here: http://diverter.sourceforge.net/

It is a part of the official Linux trees (2.2.19+ and 2.4.10+) these days, so no patching is necessary, but you do need the divert-utils package to configure it if you're going to use it. It makes the Linux bridging code a lot cooler than your ordinary bridge.

Julian There is layer 2 software under CONFIG_BRIDGE option (the currently discussed solution)

http://bridge.sourceforge.net

With Bridging the realservers can send packets to the uplink router through the director's layer 2 bridge. So, yes, the packets are handled from director but do not reach routing. The trick is that if the packets are destined to the director's MAC (which is always true for proxy ARP) then in both solutions the IP packet reaches routing. So, the director's IP should not be used as gateway. But director can run Linux Bridging and to stay betwen the realserver(s) and the client(s)/uplink router. In this case the realservers don't know that when talking to the uplink router's MAC their packets go through director's layer 2.

If DIP is used as GW in realservers then even with bridging you have to use the forward_shared flag. If uplink router's IP is used as GW then we can run Bridging on the director if we want to split the segment.

Where is the trick: the realservers resolve with ARP their GW IP and later send the packets to the resulting MAC. If GWIP is a director's IP then we receive director's MAC and the traffic reaches routing.

So, if we want to put director physically between uplink router and realservers and to use DR or TUN methods without forward_shared flags we can do it by using Bridging and by using the uplink router's IP as GW in the realservers. The only thing that Bridging gives us is that we can use the uplink router's IP as GW in realservers. The Bridging connects the two network segments.

The critical difference then is the gw configured on the realservers. If it is an IP/MAC on the director, then the directors routing tables will see the packet. If the gw is the router on the outside of the director, then the packet will be bridged without seeing the director's routing tables. Presumably this will solve the martian problem. (No-one has tried this out yet).