47. LVS: Geographically distributed load balancing

What people want is to be able to determine the closest realserver (by some internet metric) to any particular client. This section is ideas on geographically distributed serving, not all LVS.

LVS-Tun allows your realservers to be on different networks to your director i.e. anywhere, including on different continents.

47.1. Determining Location from the IP

Blocks of IPs are assigned to ISPs and shouldn't move much. In principle the locations of the ISPs are known. It should be possible to produce a mapping of IP to city/country.

Malcolm lists (at) loadbalancer (dot) org 07 Apr 2007

ultradns.com (and others to I'm sure) have a managed service that uses a combination of BGP and Global DNS to map IP to location. They can do it pretty well as they have high level DNS servers and its a very tricky thing to do well. Some people would even argue it's impossible: Why DNS based Global Server Load Balancing (GSLB) Doesn't Work (http://www.tenereillo.com/GSLBPageOfShame.htm).

David Black dave (at) jamsoft (dot) com09 Apr 2007

For work, I implemented an expanded variation of this at http://www.caraytech.com/geodns/ ...and using this database get pretty accurate IP-to-US state mappings, in addition to country: http://www.maxmind.com/app/geolitecity

Not perfect - nothing like this is - but definitely a functional optimization. My applications so far are to direct clients to the "nearest" web server cluster and VPN gateway for the purpose of minimizing latency. I stuff the names to be geo-balanced into a subdomain served only by a group of nameservers running the above code and database.

Stefan Schmidt zaphodb--lvs-users (at) zaphods (dot) net 9 May 2007

I remember seeing a RIPE presentation (http://www.ripe.net/ripe/meetings/ripe-41/presentations/routing-opperman/index.html) on that topic, but it seems the project's website (http://www.bgpdns.org/) has already vanished from the net. If I remember correctly, part of it was a patch for djb's tinydns server. The presentation mentions supersparrow btw. As far as geographic loadbalancing via DNS (http://www.micro-gravity.com/wiki/index.php?page=GeoDns http://wiki.blitzed.org/DNS_balancing) goes: Wikipedia is using the geo backend for PowerDNS (http://doc.powerdns.com/) and seems happy with it (http://www.nedworks.org/~mark/presentations/hd2006/).

47.2. Supersparrow

Horms has written Super Sparrow Project. Super Sparrow works differently than and is incompatible with LVS. Super Sparrow uses zebra to fetch BGP4 routing information from which you can determine the number of AS hops between client and server. Documentation on BGP is hard to find (the early zebra docs had none). Horms suggests a http://www.netaxs.com/~freedman/bgp/bgp.html "BGP Routing Part 1" by Avi Freeman of Akami. It's somewhat Cisco centric and there is no part 2 yet, but is applicable to zebra. This site disappeared in Jul 2002 (look for cached versions), but Avi Freedman has his own webpage with some BGP links and a note that he's writing a book on BGP.

Note
2004: The documentation for zebra and dynamic routing is much better - see Dynamic Routing to handle loss of routing in directors.

horms 8 May 2007

I'm very interested in doing more work in this area, but I'm very distracted by other things.

Horms 30 Aug 2004

I have some code which creates a small routing daemon, that gets all its data from a list of routes you provide at run time. Its at ssrs (http://cvs.sourceforge.net/viewcvs.py/supersparrow/supersparrow/ssrs/). I also have some code to help generate the list of routes from BGP dumps from (Cisco) routers at inet_map (http://cvs.sourceforge.net/viewcvs.py/vanessa/scratch/inet_map/). And I have a patch to Bind 9 to add supersparrow support bind 9 support (http://www.supersparrow.org/download/wip/bind9/). All this needs a bit of polish, as appart from hacking on it for my own personal use I haven't done any work on it for a while. It does work, and is actually used for www.linuxvirtualserver.org

Fortunately the format of the /etc/ssrs.routes file is simple. Each line has a prefix followed by an AS number e.g.

212.124.87.0/24  1234
213.216.32.0/19  1235
195.245.215.0/24 7
195.47.255.0/24  7
217.76.64.0/20   1234
193.236.127.0/24 1234

All the prefixes are stored in a red-black tree and the most speficic prefix for the request will take presedence. If you have

212.124.87.0/24  1234
212.124.87.1/32  7

Then if you look up the AS for 212.124.87.1 you will get 7. If you look up the AS for 212.124.87.2 you will get 1234.

You can telnet to the ssrsd daemon, it will ask you for a password but doesn't actually check it, so just put in whatever you like - I should probably fix that up too :)

Josh Marshall Aug 09, 2004

I've been looking into implementing something like supersparrow to get high availability / fastest connection for our web servers. We have some servers in Australia and some in the USA and some in Holland. I'm interested in the dns method of getting the closest server for the client connecting so that we don't have to do http redirects and have multiple webnames configured. That's a bit further along.

I'm wondering if I need to have a bgp daemon with a public AS number to be able to get the information needed to determine the best path for the client. I have done some tests and read loads of documentation but am not sure how to get the information without having a public AS number. The supersparrow documentation describes what appears to be an internal solution so doesn't show whether this is possible or not.

Horms 14 Oct 2004

The way that supersparrow was designed is that you have access to BGP information for each site that you have servers located at. You do not need a public AS number to get this information, however you do need _read only_ access to your provider's BGP information. Unfortunately this can be difficult to get your hands on.

I guess I'm also wondering whether I should be looking at supersparrow - I know that the software was written a few years ago, but the idea behind it and the amount of processing it needs to do I can imagine it doesn't need to be actively maintained.

Yes it does have that apperarence. But I am actually in the process of sprucing it up a lot. Most of what I have so far is in the cvs repository http://sourceforge.net/cvs/?group_id=10726, http://www.vergenet.net/linux/vanessa/cvs.shtml. About the only thing of note still missing is the patch for bind 9 http://www.supersparrow.org/download/wip/bind9/ . But please feel free to play with what is there.

If anyone has any advice as to what I can do, to get the best path information with (or without) bgp without having a public AS number I'd really appreciate it.

I have been toying with a few ideas to cope with not being able to get access to BGP at colocation sites. One of the ideas that I had was to provide a static list of networks and what site they should map to. I implimented this as ssrsd which is in the CVS tree now. ssrsd understands that for instance 10.0.0.0/25 is part of 10.0.0.0/18 and will choose the site listed for 10.0.0.0/25 over the one for 10.0.0.0/18. Of course you still have to create the list somehow and at this stage it isn't at all dynamic. But it can work quite well.

However many people have thought about geographically distributed LVSs. for historical completeness here are some of their musings.

47.3. sharing/separate routers

Michael Sparks zathras (at) epsilon3 (dot) mcc (dot) ac (dot) uk 2000-03-08

I'm curious about the physical architecture of a cluster of servers where "the realservers have their own route to the client." (Like in LVS-DR and LVS-Tun) How have people achived this in real life? Does each realserver actually have it's own dedicated router and Internet connection? Do you set up groups of realservers where each group shares one line?

Nissim

It could do it this way or it can share resources. We've got 3 LVS based clusters, based around LVS-Tun. The reason for this is because one of the clusters is at a different location (about 200 miles from where I'm sitting) , and this allows us to configure all the realservers in the same way thus:

tunl0:1 - IP of LVS balanced cluster1
tunl0:2 - IP of LVS balanced cluster2
tunl0:3 - IP of LVS balanced cluster3 (remote)

The only machines that ends up getting configured differently then are just the directors.

So whilst machines are nominally in one of the three clusters, if (say) the remote cluster is overloaded, it can take advantage of the extra machines in the other two clusters, which then reply directly back to the client - and vice versa.

In that situation a client in (say) Edinburgh, could request an object via the director at Manchester, and if the machines are overloaded there, have the request forwarded to London, which then requests the object via a network completely separate from the director's and returns the object to the client.

That UK Nat cache likely to be introducing another node at another location in the country at some point in the near future which will be very useful. (The key advantage is that at each location we gain X more Mbit/s of bandwidth to utilise making service better for users.)

47.4. Other uses of BGP4 with LVS

Note
The thread here is a bit of a logical mess. The original postings are not in either archive, so I can't straighten it out anymore.

Joe:

how do I get BGP info from a BGP router to the director?

Lars Marowsky-Bree lmb (at) teuto (dot) net 23 Jul 1999

If you telnet to the BGP4 port (port 179) of the the router running BGP4

#telnet router bgp

and do a

"sh ip route www.yahoo.com"

for example, you will get something like this

Routing entry for 204.71.192.0/20, supernet
  Known via "bgp 8925", distance 20, metric 0
  Tag 1270, type external
  Last update from 139.4.18.2 1w5d ago
  Routing Descriptor Blocks:
  * 139.4.18.2, from 139.4.18.2, 1w5d ago
      Route metric is 0, traffic share count is 1
      AS Hops 4

This address is 4 AS hops away from me. You can also find out this information using SNMP if I recall correctly.

The most cool idea would be to actually run a routing daemon on the cluster manager (like gated or Zebra (see www.zebra.org)), then we wouldn't even need to telnet to the router but could run fully self contained using an IBGP feed. Zebra is quite modular even and could possibly be made to integrated more tightly with the dispatcher...

Joe

It must have been your other mail where you said that this was simple but not everyone knew about it. I just found out why. My cisco tcipip routing book has 2 pages on BGP. They told me to find a cisco engineer to "discuss my needs" with if I wanted to know more about BGP

There is actually some sort of nice introduction hidden on www.cisco.com, search for BGP4. "Internet Routing Architecture" from Cisco Press covers everything you might want to know about BGP4.

_All_ routers, participating in global Internet routing, hold a full view of the routing table, reflecting their view of the network. They know how many AS (autonomous systems) are in between them and any reachable destination on the network. If they have multiple choices (ie multiple connections, so called multi homed providers), they select the one with the shortest AS path and install it into their routing table.

Now, one sets up a dispatcher which has BGP4 peerings with all participatin g clusters. Since the dispatcher only installs the best routes to all targets in it's routing table, it is a simple lookup to see which cluster is closest to the client.

If a cluster fails, the peering with the dispatcher is shutdown and the backup routes (the views learned from other clusters) take over.

This is actually very nice and well tested technology, since it is what makes the internet work.

It requires cooperation on part of the ISP hosting the cluster, that he provides a read-only "IBGP" feed to the cluster manager inside his AS.

BGP4 AS hops may not be directly related to latency. However, it tells you how many backbones are in between you and how many are not, which does have a tight relationship to the latency. And you can use the BGP4 route-maps etc to influnce the load balancing on an easy way - if you got one cluster from which a certain part of the Internet is reached via a slow satellite link, you can automatically lower the preference for all routes comeing in via that satellite link and not use that.

Ted Pavlic tpavlic (at) netwalk (dot) com 9 Sep 1999:

For now AS hops probably are useful - we have two mirrors on different continents.

Lars

You do NOT and cannot run OSPF here. OSPF is an "IGP" (interior routing protocol) which can't be usefully applied here.

I suppose I figured large networks might all share OSPF information, but I guess that they wouldn't share too much OSPF information between different geographical locations. (And I'm guessing that the latency between the load balancer and the user will PROABABLY almost exactly the same as the latency between the end-servers and the user...so...) I never claimed that I knew much of anything about BGP or OSPF, but thought that if BGP wasn't very helpful... OSPF might be. :) (It was a shot in the dark - a request for comments, if anything)

Of course, you not only want to figure BGP4, but also load and availability. We should investigate what other geogrpahical load balancers do. A lot of them setup large proxy'ing networks.

AFAICT, a lot of geographical load balancing systems seem to use their own means of finding which server is best per end-user. I think, for decent load balancing on that sort of scale, most balancers have to invent their own wheel.

This by no means is an easy task -- doing decent geographical load balancing. Companies put in a good deal of R and D to do just this and customers pay a good deal of money for the results.

Worse comes to worst, have a perl script look-up the name of the machine that made the request... grab the first-level domain... figure out which continent it should go on.

DNS has no guaranteed reply times at all.

it wasn't a serious suggestion for production. Just simply a way to divy out which mirror got which request.

47.5. Geographically remote nodes connected by Bridging

Andy Wettstein awettstein (at) cait (dot) org 15 Jul 2003

I am trying to extend an lvs-dr to a different physical location with the help of an etherIP bridge. I am using 2 openBSD boxes to do this, if you want to see the details look at this almost all the way at the bottom: http://www.openbsd.org/cgi-bin/man.cgi?query=brconfig&sektion=8. I am not using IPSec so that is not causing me any problems.

Anyway, I have all normal LAN traffic working correctly, so I'm sure the EtherIP bridge is working correctly, but if I have a server that is in an LVS cluster the server never sees that traffic that is being sent to it as part of the cluster.

Ratz

Do you rewrite MAC addresses on the bridge? How does a tcpdump look like on all the director, the bridge and the node on the other side? How are the neighbour tables set up?

I don't do any MAC address rewriting on the bridge. This is my test service:

TCP  192.168.0.45:8000 wlc
  -> 192.168.0.48:8000    Route   1      0          0

The openbsd box with the director on its physical lan is set up like this (all real ips changed):

vlan0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
        address: 00:02:b3:d0:36:0d
        vlan: 57 parent interface: em0
        inet6 fe80::202:b3ff:fed0:360d%vlan0 prefixlen 64 scopeid 0x1a
        inet 192.168.0.1 netmask 0xffffff80 broadcast 192.168.0.127

gif1: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1280
        physical address inet 172.20.1.2 --> 172.20.1.3
        inet6 fe80::206:5bff:fefd:ef23%gif1 ->  prefixlen 64 scopeid 0x30

bridge0: flags=41<UP,RUNNING>
        Configuration:
                priority 32768 hellotime 2 fwddelay 15 maxage 20
        Interfaces:
                gif1 flags=3<LEARNING,DISCOVER>
                        port 48 ifpriority 128 ifcost 55
                vlan0 flags=3<LEARNING,DISCOVER>
                        port 26 ifpriority 128 ifcost 55

The openbsd box with the member of the cluster (traffic never gets to it):

vlan1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
        address: 00:02:b3:d0:32:78
        vlan: 57 parent interface: em0
        inet6 fe80::202:b3ff:fed0:3278%vlan1 prefixlen 64 scopeid 0x1c

gif1: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1280
        physical address inet 172.20.1.3 --> 172.20.1.2
        inet6 fe80::206:5bff:fe3e:6d58%gif1 ->  prefixlen 64 scopeid 0x31

bridge0: flags=41<UP,RUNNING>
        Configuration:
                priority 32768 hellotime 2 fwddelay 15 maxage 20
        Interfaces:
                gif1 flags=3<LEARNING,DISCOVER>
                        port 49 ifpriority 128 ifcost 55
                vlan1 flags=3<LEARNING,DISCOVER>
                        port 28 ifpriority 128 ifcost 55

the tcpdumps show only packets through bridge0 on the side of the bridge with the director on it. I can't see any traffic on gif1.

192.168.0.0 is subnetted so 192.168.0.143 goes through the openbsd box, which is also our router. That just gave me an idea. Testing from an IP that doesn't need to be routed...Works!!

So going through

192.168.0.143/26 -> 192.168.0.129/26 -> 192.168.0.48/25
                              ^^^
                              router interface on openbsd box (vlan2)

doesn't work, but going

   192.168.0.61/25 -> 192.168.0.48/25

without a route does work.

A little later...

I looked into this a little bit further. The problems I was having were mostly due to the OpenBSD firewall not keeping state on those connections that needed to be routed by that router/etherIP bridge machine. After I got that fixed traffic would show up on the cluster node and the node would try to reply, but I would never see the return traffic. So after doing a little further investigation the tcpdumps showed me that the traffic needed to be fragmented because on that bridge the mtu is 1280. So I set the mtu to 1280 on the the cluster node and everything works.

So you can add that as another way to geographically extend the LVS. Although it is a little inefficient since all broadcasted lan traffic gets transmitted, but that isn't a problem for me.

47.6. Load Balancing by DNS (round robin DNS)

Round robin DNS was one of the first attempts at loadbalancing servers. Horms tried it in the late '90's but was defeated by the cacheing of DNS information in local servers (you can set TTL=0, but quite sensibly, not all servers honour TTL=0).

Malcolm Turnbull malcolm (at) loadbalancer (dot) org 24 Jun 2004

See the fud section on my site, under the GSLB bit (http://www.loadbalancer.org/fud.html, link dead Feb 2005) which goes to Why DNS Based Global Server Load Balancing (GSLB) Doesn't Work (http://www.tenereillo.com/GSLBPageOfShame.htm).

The summary of the page is that most browsers (netscape, IE) have their own DNS cache (15-30mins expiration). This, together with caching in local DNS servers, will defeat any attempt to loadbalance or failover by DNS.

Torsten Schlabach tschlabach (at) gmx (dot) net 11 May 2007

Is it possible to have an LVS (or components of an LVS) at as of two data centers, for dedundancy?

If we want to build a reliable service on the Internet (say a website, for example) we can have one A record for www.oursite.net point to one IP address only. So we would want that IP address to transparently fail over between two data centers. Of course we can put two (or more) servers into different data centers but one IP will be routed to one physical destination, won't it? So wouldn't it make sense for someone who owns a network infrastructure to offer IPVS as a service and have the IP address point to whatever real servers which I can configure?

Volker Jaenisch volker (dot) jaenisch (at) inqbus (dot) de 11 May 2007

One way to achieve this is to use the DNS. If you give your domain e.g. yourdomain.com more than one IP e.g.

Datacenter1 : IP = 123.123.123.1
Datacenter2 : IP = 146.234.12.2

the DNS performes a round robin loadbalancing on DNS -> IP resolving. The first time a webbrowser accesses yourdomain.com it will get the first IP and your customer lands in DC1. If the next webbrowser accesses yourdomain.com it will land in DC2. Next in DC1 and so on. The only problem with this approach is to assure that the DNS TTL settings are long enougth so that a typical costomer will not be switched from DC to DC within its actual session.

A second and from my point of view the most important is the need for synchronisation of your data e.g. your databases and the filesystem over the WAN between the DCs. The databases you can share across the two locations using a sequoia DB-Cluster for e.g. MySQL, PGSQL databases. The filesystem mirroring can be done using DRBD0.8 and a cluster filesystem.

We have done some testing in that direction. This approach is due to the synchronisation not ideal for high traffic sites with many changes in the shared filesystem or the shared DB-Cluster. If you have a low/medium traffic site with extreme need for high availability this scheme may help you.

But please be warned - this approach is definitivly not the cheap one. Such a WAN distributed Cluster is in the order of $100,000 and more.

Torsten Schlabach schrieb:

We have made some testing on how long webbrowsers will cache the DNS information. It seems as that the DNS information is hold longer in the webbroser than the given TTL. So this gives hope to reduce the TTL to say 10 Seconds. Now you will need a third instance that monitors your two DCs. If one of the DCs went down the monitoring instance have to modify the DNS entry.

Joe: I haven't looked for 10yrs or so, but back then DNS servers would not honour a TTL less than some reasonably long time (a day?) so their cache would be useful.

Volker

Our two big ISPs in germany respect the TTL set by the domain provider.

mira2:~# dig @195.50.140.250 inqbus.de


;; ANSWER SECTION:
inqbus.de.              300     IN      A       193.239.28.142

mira2:~# dig @195.50.140.250 inqbus.de

; <<>> DiG 9.2.4 <<>> @195.50.140.250 inqbus.de

;; ANSWER SECTION:
inqbus.de.              297     IN      A       193.239.28.142

As you see the TTL decreases on consequtive queries, as expected. The DNS server queried is a DNS server of the second largest ISP in germany.

Joe: I take it that the situation is different nowadays. What's the point of having DNS servers if every query requires a hit to the root servers?

47.7. BIND, BGP with load balancing (more ideas from Horms)

In a thread where someone suggested load balancing by round robin DNS...

jkreger (at) lwolenczak (dot) net Jun 24, 2004 suggested that you get your routing information from BGP, which is fed a fake table.

Horms

I have some code to do this. Basically it creates a small routing daemon, that gets all its data from a list of routes you provide at run time. Its here here (http://cvs.sourceforge.net/viewcvs.py/supersparrow/supersparrow/ssrs/). I also have some code to help generate the list of routes from BGP dumps from (Cisco) routers (http://cvs.sourceforge.net/viewcvs.py/vanessa/scratch/inet_map/">) and a patch to Bind 9 to add supersparrow support (http://www.supersparrow.org/download/wip/bind9/).

All this needs a bit of polish, as appart from hacking on it for my own personal use I haven't done any work on it for a while. It does work - it is actually used for www.linuxvirtualserver.org

47.8. Commercial Geographically Distributed Servers

How does a client in Ireland gets sent to a server in England, while someone on the east coast of USA gets a server in NewYork? The machine name is the same in both cases.

Malcolm lists (at) loadbalancer (dot) org 21 Nov 2006

You either use:

  • Server side re-direct
  • DNS based geographic load balancing
  • BGP
  • Combination of BGP and Geographic DNS

UltraDNS.com do a managed service for this at about $400 per month per DNS entry, which is one of the best ways of doing it.

Josh Marshall josh (at) worldhosting (dot) org 22 Nov 2006

We use the supersparrow software (written by Horms) on our DNS servers and it works really well for sites between our Australian and Holland datacenters. I don't have the co-operation of our uplinks so I fake the BGP and with a few scripts it also handles failover to one site. My employer's site www.worldhosting.org is handled this way.

First you have to run a patched version of bind9 (I have debian packages for anyone who needs them) - get the source from http://www.supersparrow.org/ Or add the following to your /etc/apt/sources.list, for my supersparrow and patched bind9 packages deb http://debian.worldhosting.org/supersparrow sarge main (woody packages also available, replace sarge with woody)

Create in your bind config something like:

zone "www.worldhosting.org" {
       type master;
       database "ss --host 127.0.0.1 --route_server ssrs --password XXXX \
          --debug --peer 64600=210.18.215.100,64601=193.173.27.8 \
          --self 193.173.27.8 --port 7777 --result_count 1\
          --soa_host ns.worldhosting.org. --soa_email hostmaster.worldhosting.org.\
          --ns ns.worldhosting.org. --ns ns.au.worldhosting.org. --ttl 7 --ns_ttl 60"; \
};

This snippet sets the www to use 210.18.215.100 if the peer is set to 64600 and 193.173.27.8 if the peer is 64601, the ttl for the A record is 60 seconds and the self is the default response for this nameserver (on the secondary nameserver make this the other address). Set the password to the same as in /etc/supersparrow.conf

Create three files to describe the routes in normal and failed modes. In our setup:

$ cat ssrs.routes.AUonly
0.0.0.0/0       64600

$ cat ssrs.routes.NLonly
0.0.0.0/0       64601

$ head ssrs.routes.normal
128.184.0.0/16  64600
128.250.0.0/16  64600
129.78.0.0/16   64600
129.94.0.0/16   64600
129.96.0.0/16   64600
129.127.0.0/16  64600
129.180.0.0/16  64600
130.56.0.0/16   64600
130.95.0.0/16   64600
130.102.0.0/16  64600

The ssrs.routes.normal file contains all the subnets you wish to force to use the respective peer.

Create a script that does a http test periodically (we do it every 5 minutes as the web servers don't go down frequently) if both sites work, symlink the file to /etc/ssrs.routes. If only one works, symlink the file for the site that works (i.e. AUonly or NLonly) to /etc/ssrs.routes. Then check to see if the config has changed and if so, restart supersparrow. I use the check_http script from the nagios package to do the test. See below for my script:

----------------

#!/bin/sh

PATH=/sbin:$PATH
# Supersparrow results
SSNORMAL=0
SSAUONLY=1
SSNLONLY=2

AUIP=210.18.215.100
NLIP=193.173.27.8

AUW=0
NLW=0

#ping -c 2 $AUIP >/dev/null && AUP=1
#ping -c 2 $NLIP >/dev/null && NLP=1

/sbin/check_http -H $NLIP -u /index.html -p 80 -t 20 >/dev/null && NLW=1
/sbin/check_http -H $AUIP -u /index.html -p 80 -t 20 >/dev/null && AUW=1

# Do the tests again in case there was a hiccup

/sbin/check_http -H $NLIP
/sbin/check_http -H $AUIP -u /index.html -p 80 -t 20 >/dev/null && AUW=1


if [ $NLW -eq 1 ]
then
       if [ $AUW -eq 1 ]
       then
               OPMODE="Normal Operation"
               SPARROW=$SSNORMAL
       else
               OPMODE="NL running but AU down"
               SPARROW=$SSNLONLY
       fi
else
       if [ $AUW -eq 1 ]
       then
               OPMODE="AU running but NL down"
               SPARROW=$SSAUONLY
       else
               OPMODE="AU and NL down"
               SPARROW=$SSNORMAL
       fi
fi

if [ $SPARROW -eq $SSNORMAL ]
then
       ln -sf /var/named/supersparrow/ssrs.routes.normal /etc/ssrs.routes
fi

if [ $SPARROW -eq $SSAUONLY ]
then
       ln -sf /var/named/supersparrow/ssrs.routes.AUonly /etc/ssrs.routes
fi
if [ $SPARROW -eq $SSNLONLY ]
then
       ln -sf /var/named/supersparrow/ssrs.routes.NLonly /etc/ssrs.routes
fi

md5sum -c /etc/ssrs.routes.md5sum &>/dev/null && exit
/etc/init.d/supersparrow reload
md5sum /etc/ssrs.routes > /etc/ssrs.routes.md5sum
echo Supersparrow: $OPMODE

-------------

With a DNS server at each location, if there is a international routing problem that prohibits them communicating with each other, then the server will set all responses to point the www at the local hosting location. Then any sites on the net that can get to that DNS server will use the www that is there (and therefore, high chances of working)

Ratz 22 Nov 2006

If we are talking about web services, a nice but not very known and sometimes also not feasible approach is the proxy.pac URL hash load balancing, best explained at: http://naragw.sharp.co.jp/sps/ and http://naragw.sharp.co.jp/sps/sps-e.html .

47.9. from the mailing list

David Carlson dcarlson (at) culminex (dot) com 11 Dec 2002

We are putting a bid in on a fairly major web site. The client has asked for 24/7/365 reliability. We were initially going to bid a Linux virtual server direct routing solution with main and backup Linux directors in a multihomed data centre. We were proposing the following hardware:

  • Linux Director and Backup director to route the requests to Real servers on the LAN
  • Real servers 1 and 2 to do the work and route data back to the user
  • DB server 1 to provide the data to the realservers.

However, our partner has come up with an interesting wrinkle. They have a second data centre where they can host a mirror of our site. It uses a different company for main internet service, so it is not only geographically removed, but has different power and internet service too.

We are now going back and revisiting out hardware configuration. It would seem that with two physical locations, we should use IP tunneling. http://www.linuxvirtualserver.org/VS-IPTunneling.html. In this case, our hardware configuration would be

  • At Main location: Linux director, Real page server 1, DB Server 1
  • At alternate location: backup linux director, Real page server 2, DB server 2

We've never done this before. But if it works, it would sure increase our claimed reliability as we can talk about multihomed, geographically separate, entirely redundant systems.

My questions are - what do we do with the Linux Director at the main site to have a failover solution. If the internet service to the main site fails, how does the alternate site know to take over receiving requests? Given that it is elsewhere on the WAN, how does the backup site update local routers with the virtual IP? Do we need a backup Linux director at the alternate site? What about if the main site Internet is OK but the Main Linux director fails. Will a backup director at the alternate site take over and still send requests to realserver 1 at the main site.

Horms:

It probably needs a bit of TLC but it is pretty simple so should work without too much bother. I'd be quite happy to work with someone to make this so. I'm using it myself on a very small (experimental) site wihtout too much bother. I also have a patch to make it work with bind9 that I'm happy to send anyone who is interested.

Peter Mueller pmueller (at) sidestep (dot) com: The bind9 patch uses recursive-DNS to geo-locate users and send them to specific locations? (Is this what 3DNS does?)

The bind9 patch, works in conjunction to return DNS results based on the source IP address of the DNS request - i.e. the IP address of the DNS server. So yes, it tries to return results based on someonese network/geographic location. I can't really comment on 3DNS as I have not used it, but I believe that it does something similar.

Matthew S. Crocker matthew (at) crocker (dot) com

This is what Akamai does. They use BGP table information to build a network map and then announced DNS information based on the closest server. For example, I have 3 akamai servers in my network. When I use DNS to lookup a.1234.akamai.net I get the IP address of my servers. If I go into some equipment on a different network the same name gives me different IPs. It is always the IP closest to me network wise.

Go to www.msnbc.com, view source and search for akamai.net. You'll see the reference a799.g.akamai.net. Traceroute to that name; from home it is 63.240.15.152, 12 hops away. From work, it is one of my IP's and 2 hops away :). Pretty cool actually.

Using DNS to load-balance between clusters based on network BGP data. Then have each cluster setup as a LVS HA cluster to load balance locally.