38. LVS: Server State Sync Demon, syncd (saving the director's connection state on failover)

38.1. Intro

For seemless director failover, all connection state information from the failed director should be transferred/available to the new director. This is a similar problem to backing up a hot database. This problem had been discussed many times on the mailing list without any code being produced. Grabbing the bull by the horns, Ratz and Julian convened the Bulgarian Summit meeting in March 2001 where a design was set for a server state sync demon (look for links to photos of them working on the design).

38.2. Release Notice

In ipvs-0.9.2 Wensong released a sync demon.

Wensong Zhang wensong (at) gnuchina (dot) org 20 Jun 2001

The ipvs-0.9.2 tar ball is available on the LVS website. The major change is new connection sychronization feature.

Added the feature of connection synchroniztion from the primary load balancer to the backup load balancers through multicast.

The ipvs syncmaster daemon is started inside the kernel on the primary load balancers, and it multicasts the queue of connection state that need synchronization. The ipvs syncbackup daemon is started inside the kernel too on the backup load balancers, and it accepts multicast messages and create corresponding connections.

Here is simple intructions to use connection synchronization.

On the primary load balancer, run

primary_director:# ipvsadm --start-daemon=master --mcast-interface=eth0

On the backup load balancers, run

backup_director:# ipvsadm --start-daemon=backup --mcast-interface=eth0

To stop the daemon, run

director:# ipvsadm --stop-daemon

Note that the feature of connection synchronization is under experiment now, and there is some performance penalty when connection synchronization, because a highly loaded load balancer may need to multicast a lot of connection information. If the daemon is not started, the performance will not be affected.

Note

There aren't a lot of people using the server state sync demon yet, so we don't have much experience with it yet.

Alexandre Cassen alexandre (dot) cassen (at) wanadoo (dot) fr 9 Jul 2001

Using ipvsadm you start the sync daemon on to the master director. So it send adverts to the backups servers using multicast: 224.0.0.81. You need to start ipvsadm sync daemon on the backups servers too...

The master muliticasts messages to the backup load balancers in the following format.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Count Conns  |   Reserved    |            Size               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                    IPVS Sync Connection (1)                   |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                            .                                  |
      |                            .                                  |
      |                            .                                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                    IPVS Sync Connection (n)                   |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

I have planned to add an ICV like in IPSEC-AH (with anti-replay and all strong dataexchange format) but I'm still very busy.

There is now a sync demon write up.

Note

From Lars Marowski-Bree lmb (at) suse (dot) de:

If you're using the -sh and -dh schedulers then there is no state information to transfer ;-)

If you're just setting up and have no connections and are checking your setup, then the sync demon has no data to transfer and is silent (i.e. it appears not to be working).

Sean Knox sean (dot) knox (at) sbcglobal (dot) net 2003-02-21

I've just installed ipvsadm and LVS on a new Debian 3.0 server. Load balancing works fine, however, connection synchronization doesn't; I confirmed this via tcpdump (no sync information being multicast).

The problem is that the sync. daemon won't transmit unless it actually has data (connection states) to send out. If your IPVS state table is empty (i.e. no connections), then you won't see any sync data being sent out. I guessed this was the case after seeing this entry in the kern.log:

kernel: IPVS: Each connection entry needs 120 bytes at least

combined effort of Sean Know and Bernt Jernberg, 25 Feb 2003

On the backup director, the connection table is listed by running ipsvadm -Lcn. The backup director has no connections, so ipvsadm -L will be empty (ipvsadm -L is only relevant on the master).

Carles Xavier Munyoz Baldo Oct 02, 2003

I'm setting up a high availability LVS director with RedHat 8.0 (kernel-2.4.20-20.8), ipvs-1.0.9 and keepalived. I'm running LVS with connection synchronisation enabled. When the master director faults, the backup director takes its role and all stablished connections works without interruption. GREAT !!!!! :-)

The problem is, what to when the failed master is recovered? How may I copy the connection table of the backup director to the master director?

Horms horms (at) verge (dot) net (dot) au 03 Oct 2003

There are various ways to do this, here is the one I would suggest.

  1. Read this post and patch by Alexandre Cassen

    http://marc.theaimsgroup.com/?l=linux-virtual-server&m=105459391703228&w=2
    
  2. The patch was put into LVS 1.1.X but not 1.0.X so if you want that behaviour and you are using LVS 1.0.X (i.e. 2.4.X kernel) you will need to patch the code yourself.
  3. You may want this patch too which fixes a small bug in Alexandre's code.

    --- ipvs-1.0.10.syncd.orig/ipvs/ip_vs_core.c    2003-07-29
    23:37:12.000000000 +0900
    +++ ipvs-1.0.10.syncd/ipvs/ip_vs_core.c 2003-09-29 16:02:33.000000000 +0900
    @@ -1132,7 +1132,7 @@
            /* increase its packet counter and check if it is needed
               to be synchronized */
            atomic_inc(&cp->in_pkts);
    -       if (ip_vs_sync_state == IP_VS_STATE_MASTER &&
    +       if (ip_vs_sync_state & IP_VS_STATE_MASTER &&
                (cp->protocol != IPPROTO_TCP ||
                 cp->state == IP_VS_S_ESTABLISHED) &&
                (atomic_read(&cp->in_pkts) % 50 == sysctl_ip_vs_sync_threshold))
    
With this patch, when I bring back the master director, the backup director will notify it of the new connections but, what happens with the current connections established in the backup director? Are they notified to the recovered master director? When the master directos takes the VIP, this stablished connections will stop.

No. If you wait a short time before the master takes over the VIP then the connections will have been sychronised. Alternatevely when the old master comes back up make it the stand-by. Presumably some time will pass before another failover occurs and synchronisation should have plenty of time to occur. If you are using heartbeat then this is called nice_failback.

Is there any way to copy all the connections table from the backup director to the master director when it gets recovered from a previous fault?

No. It would be possible to add some sort of dump request. But I don't think this would be wise as if you have a lot of connections this could take a while and thus impact load balancing - if you have a lot of connections the linux director is probably already very busy.

38.3. Expiration of Connection in Backup Director

ong cheechye Mar 14, 2003

I'm running Piranha over IPVS (ipvs-1.0.4.patch).

I notice that the connection expire time in the primary director is much longer than the backup director's (seen from ipvsadm -lc below). So when director failover to backup, the connection in backup director might have already expired and removed. Thus the connection would not failover.

Is this the right thing to happen ? How ipvs determine the expiry time of a connection ?

  • In Primary Director
    [root@RADIXS root]# ipvsadm -lc
    IPVS connection entries
    pro expire state       source             virtual
          destination
    TCP 14:42  ESTABLISHED 192.168.123.133:32861
    vipserver:telnet   application:telnet
    
  • In Backup Director
    [root@main ong]# ipvsadm -lc
    IPVS connection entries
    pro expire state       source             virtual
          destination
    TCP 02:17  ESTABLISHED 192.168.123.133:32861
    vipserver:telnet   application:telnet
    

Horms

Ok, this is a little confusing. The expiry values are used in different ways on the primary and the backup, so it doesn't really matter that they aren't the same. Essentially what is going to happen is that even if the connection expires in the backup, as soon as some more packets arrive on the primary, the connection will be updated on the backup and it will re-appear in the backup's connection table.

Bonnet, Mar 14, 2003

You missed Ong's point. Imagine the following :

  1. a telnet connection is established thru primary director
  2. connection sync'ed to backup
  3. no more telnet packets for a while
  4. connection remove from backup (but still on primary)
  5. primary failure
  6. backup taking over
  7. incoming packet for telnet session

Ooops ! The backup director doesn't know this connection, whereas a few seconds before the primary director did know it !

Horms

I guess that I did miss the point. Of course once the connection times out on the backup director failover of that connection is not going to work. This timeout can be modified by changing IP_VS_SYNC_CONN_TIMEOUT in ip_vs_sync.c. The default is 3 minutes.

It could be a /proc entry, but it isn't at the moment.

ong cheechye

Ok. So connection will not failover if it has been idling

Horms

Yes, if the connection has been idling for longer than the timeout on the backup (3 minutes) it will not failover.

Sebastian Vieira sebvieira (at) gmail (dot) com 15 Sep 2006

When I do a failover to the other node ipvsadm -l shows zero connections. But client connections _are_ being sync'd. When the original active director was down, i could connect through the backup director without any problem. If i do a failback it shows all connections again. It's not a big issue since LVS works as it should, but it would be handy though. Is there something that can be done about this?

Joe

I'm not sure what the expected behaviour is here, but the syncd only transfers enough information for the backup director to take over as the active director - i.e. only the connection table and the active connections (connections in FIN_WAIT etc aren't transferred). When you go back to the original director, you're probably seeing it's original connection count before failover.

Monty Ree Jun 04, 2007

I have a two director LVS-DR system like below.

LVS1
LVS-DR : linux kernel 2.6.18
web server : linux kernel 2.6.18(apache+php)
loadbalancing : sh(source hashing)
at realserver : just a little FIN_WAIT, TIME_WAIT
apache KeepAlive : on 

LVS2 
LVS-DR : linux kernel 2.6.21
web server : linux kernel 2.6.21(apache+java+php)
loadbalancing : sh(source hashing)
at realserver : lots of FIN_WAIT, TIME_WAIT
apache KeepAlive : on 

when I execute like below.

# ipvsadm -L -n -c

TCP 14:22  ESTABLISHED 221.155.xx.xx:1995   xxx.x.xx.xx:80  xxx.x.xx.xx:80

here, 14:22 means expire time, right?

at LVS1, after some seconds, above packets (connection entry) disappear but LVS2, doesn not.

Horms 4 Jun 2007

Yes, 14:22 is the expiration time.

If you are using connection syncrhonisation and LVS1 is the master and LVS2 is the backup, here is a rough sketch of what is likely to be going on.

On LVS1 a connection is established and there is a series of packets going between the real-server and the end-user via the master linux-director. During this time the packets are tracked using a connection entry in the ESTABLISHED state. This has a time out of 15:00 minutes which is refreshed each time a packet is received. For the connection above that would seem to indicate that it has been idle for 38 seconds.

When the connection is shut down by either the real-server or the end-user, the connectino entry moves into the CLOSE state, with a much shorter timeout. If no more packets are received (which usually the case, typically, there will only be more packets if some arrive out of order), then the connection entry disappears pretty quickly.

So far so good.

On LVS2 things are a bit different. It doesn't see the packets sent by the real-server and the end-user. Rather it receives connection information via the lvs sychronisation protocol. These are sent out by LVS1. They are sent out once a connection has seen 3 packets (the 3-way handshake is complete) and then every 50th packet. There is also a 2s delay loop in there, but thats not that important. It is this synchronisation information that produces the entries that you see on LVS2. And they may be a little out of date. But its not really that important, because they are just there in case fail-over occurs, so that LVS2 will be able to forward packets for the connections that were synchronized.

The precise details of the timeouts and the state of these synchronised connections is not that important, because while LVS2 is acting as a backup, they aren't used. So other than consuming a very small ammount of memory, they do no harm. And, if failover occurs, then the entries that are used at that time are updated and follow the rules that LVS1 was previously following. Just think of them as templates with a timeout, rather than full fledged connection entries.

38.4. Syncd boxes must have the same time

If the directors are being time updated by ntpdate from cron, rather than ntpd, then after a power down, they may not be in time synchronisation and won't accept messages from each other.

Nicklas

After a total power outage today I'm having trouble getting the backup LVS node to function properly. The backup node keeps transitioning to MASTER state. I'm using keepalived. Things have worked flawless before the power outage. The two LVS machines are also firewalls with appropriate rules applied to pass through VRRP messages and such.

Graeme Fowler graeme (at) graemef (dot) net 06 Nov 2008

Are you 100% sure the firewall rules or a network misconfiguration aren't getting in the way? The most common flaw that causes this is a rule or route on the nominal master preventing it sending announcements, so the slave keeps transitioning.

It's either that, or your system clocks are out of sync with each other. If you're using ntpdate from cron, that time is your problem.

Can you run a local NTP daemon on the directors which is configured against two upstream time servers and only permits local clock slew? That's what I do. The daemon approach means the time is slewed slowly, rather than skewing several tens of seconds at a time.

Also, make sure the hardware clocks are sufficiently close together on both systems (hwclock). If not, get the system times close and then do "hwclock --systohc".

Joe: we didn't hear back, so we don't know the resolution of this problem.

38.5. LVS and syncd do not use conntrack

Note
May 2004: ipvs now uses conntrack. see Running a firewall on the director.

Horms 22 Oct 2003

IPVS does not use contrack! You can examine the state of connections, including those syncronised to a backup using ipvsadm -L -c -n

Carles Xavier Munyoz Baldo Oct 22, 2003

But, which kind of connections will it synchronise? All the connections passing througt the FORWARD chain or only the connections directed to the realservers farm?

All of the connections that have been forwarded to realservers by LVS.

I'm building a high availability firewall using the ipvs sync daemons to synchronise the MASTER FW network connections with the BACKUP FW. Is this possible with ipvs or must I use another high availability software solution for linux?

Not really, unless you run all the connections through LVS.

38.6. Connection Synchronisation (TCP Fail-Over)

Wensong's original implementation of a synch demon would not failback after failover. This code was modified by Julian so that failover could be followed by failback. It was still a master/slave arrangement, where the sync demon on the active director broadcasts connection table information to the sync demon on the backup director.

However if you have and ipvsadm hash table on all directors (i.e. have the virtual services setup on all directors with ipvsadm) including the backup directors, then it's possible to have an LVS where each director is broadcasting the connections information for the virtual services it is handling and receiving connection data from the other directors for the virtual services they are handling, then you have a peer-to-peer (p2p) synch demon.

Note
For the synch demon, you only need to keep track of the connections in ESTABLISHED state. The connections in TIME_WAIT etc, will disappear on their own and you don't need to the other directors to take over a TIME_WAIT state on failover.

If you can arrange for all directors to have the same ipvsadm hash table, and for some other mechanism to all directors to have the same (arping) VIP, then you have a fully failover proof set of p2p directors.

For detailed discussion on the design of such a synchd see Horms paper on Connection Synchronisation (TCP Fail-Over). Horms combines this with Saru where all directors are active.

Horms has virtualised the synchd function of LVS, by writing hooks into LVS, which allows a synchd to be loaded as a module. Horms has rewritten Wensong's implementation as a module and his p2p code as a module and moved the user space controls into ipvsadm. In principle then, anyone can write a synchd module and register it with ipvs.

Unfortunately this code has not been accepted into LVS and is being maintained separately by Horms. This is a bit of a nightmare to maintain and track with each version of LVS, so if you're going to use Horms code, be prepared for patching and spelunking kernel code.

Horms horms (at) verge (dot) net (dot) au 28 Apr 2004

In 2.6 the code has been enhanced by Julian to allow a peer-to-peer setup. The current 2.6 code allows you to run a a master or a backup sync daemon. If you want p2p then you just run both - previously you could only run one or the other. This is configured through ipvsadm.

For 2.4, I think Julian has the main patch on his web site. However there was a minor patch contributed by me, that is required for his patch to work. I am not sure if he incorporated that. It is in the mailing list archive.

I abstracted the sync deamon, which involved moving a bunch of code around and adding an extra layer. This did not change the functionality of the sync daemon at all. I modified the LVS core code, so that the sync code is implemented by a series of hooks. The idea is that people could implement these hooks however they liked. This could be done in a kernel module that registered itself with LVS when it is inserted into the kernel, or instructed to do so from userspace.

Then I implemented a version of these hooks that implemented Wensong's synch demon. These are registered when LVS is intialised. So you get the existing behaviour.

Then I implemented another module to handle synchronisation differently. It communicates with a user space daemon. When you insert this module it registers itself as the sync hooks, unregistering the default hooks. The reverse is true when your remove the module. This is explained at some length here. Implementation of Connection Synchronisation

Using it is explained in the man page. To start the sync daemon you run

ipvsadm --start-daemon master
or 
ipvsadm --start-daemon backup

depending on if you want a master or backup sync daemon. Using Julian's patches you run both on each node to get the p2p behaviour.

To stop the sync daemon run

ipvsadm --stop-daemon

horms 24 Apr 2007

LVSSyncDaemonSwap is a script and its function is decribed in the comment at the top of it.

Breifly. Prior to 2.4.27 only the master or backup sync daemon could run but not both. So when a failover occurs the new active noce needs to stop the backup sync deamon and start the master one. The reverse action needs to occur when a node is running as standby. The purpose of LVSSyncDaemonSwap is to make this switch. If you have a newer kernel, just run both daemons on boot. Incidently, the sync deamon doesn't have a way to flush connections, so I recommend setting autofailback to off.

As of 2.4.27 the master and backup sync daemons can run simultaniously, which is recommended, and thus if you have a newer kernel you don't need LVSSyncDaemonSwap at all.

Graeme Fowler graeme (at) graemef (dot) net 09 Jun 2007

Look at your process list and check that you have two processes running:

ipvs_syncmaster
ipvs_syncbackup

If you do, you don't need LVSSyncDaemonSwap.

(back to Horms)

On my extremely long, and ever expanding my todo list I plan to put up my patches. I have made a start by updating them and getting them all together in my workspace. One of the problems is that the patches tend to conflict with each other, or rely on each other (thankfully not both).

I quite like Rusty's kernel patches page I might go for something like that. But to be honest, I fear the support burden. The patches that I have made available in the past have resulted in much work for me. Usually I just send my patches to the mailing list and forget about them. But in the case of some of the more substantial work I have done, they can be found at:

http://www.ultramonkey.org/papers/ and
http://www.ultramonkey.org/download/

In particular:

http://www.ultramonkey.org/papers/conn_sync/
http://www.ultramonkey.org/download/conn_sync/
http://www.ultramonkey.org/papers/active_active/
http://www.ultramonkey.org/download/active_active/

Julian's code resolves most of the major problems that I saw in the original (current default 2.4) synchronsiation code. So, form a user point of view, since his code is in the ipvs package, it should be easier to setup.

My user-space daemon is called ip_vs_user_sync_simple. As the name suggests, the code is quite simple, hopefully people can extend and customise it. You can find it in http://www.ultramonkey.org/download/conn_sync/ Saru and the sync demon are independant code, but you should have a synch demon if you're using multiple directors.

There are several different development branches of the syncd. Here Horms is trying to get me straight on who wrote what, and which functionality is in which branch.

Horms 27 May 2004

Wensong's implementation is in the kernel. So everyone who has a recent (since 2001) version of LVS has it.

Alexandre wrote some patches (independantly of me) that address most of my concerns. This code, has to the best of my knowledge, been put into IPVS in the 2.6 kernel. Though as of which version I am not sure, possibly 2.6.0. The patches, however, have not been merged into 2.4.

There are two patches from Alexandre, I will address each of them in turn. Though I have used neither of them extensively. I believe both of these patches apply against the kernel. Which means Wensong's implementaiton of the synchronisation code.

  • linux-2.4.26-ipvs_syncd.patch.gz (This has been merged into 2.6.something)

    This adds two features

    • You can run a master and backup sync daemon on the same host. This means that if you have two (or more) linux directors they can act as a master and bakckup. That is they can both send and recieve synchonisation traffic - though typically one machine will only do one of these at a time. This means that regardless of which machine is the active linux director connections will be synchronied and if a failover occurs those connections can continue. This addresses the problem that I discuss here (amongst other places) http://www.ultramonkey.org/papers/conn_sync/conn_sync.shtml#master_slave_problem
    • It adds a SyncID field to the packet. This allows multiple LVS clusters to use synchonisation on the same multicast UDP port. Just set each cluster with a different SyncID and it will ignore packets whose SyncID doesn't match.
  • linux-2.6.4-ipvs_syncd_icv.patch.gz (This is only available for 2.6, presumably it relies on the patch above, which is in 2.6)

    This patch allows the synchronisation packets to be signed. At a quick glance, this is done using an HMAC digest and a shared secret. This should be both secure and fast. (Actually I have done the same thing but am not able to release the code).

    This protects against parties unknown injecting packets and possibly causing havoc on in the connection tables. This is pretty imporant if your linux directors will accept sync packets from parties unknown. Keeping in mind that UDP can be pretty easy to spoof, using packet filters to guard against this can be problematic (though not impossible).

    N.B: I didn't actually read Alexandre's paper, but as I mentioned I have implemented almost the same thing. So I am quite familiar with the concepts.

The stem code which you should patch is in the lvs tree, which means in the kernel. The 2.6 code is a bit more advanced than the 2.4 code. Because it has the first of Alexandre's patches, and possibly the second, I have not checked.

The two main lots of patches are from Alexandre (discussed immediately above) and mine (further up the page). My patches move a lot of the code to user-space. However, they are quite invasive, and probably not a whole lot better than using the core code with Alexandre's patches if you just want a functional syncd. If you want to hack then my approach should be better, as you can do a lot of the work in userspace.

"dingji" dingji (at) broadeasy (dot) com 2005-02-03

according to Manual-8 there should be three files to configure the sync-daemon

/proc/sys/net/ipv4/vs/sync_msg_max_size 
/proc/sys/net/ipv4/vs/sync_frequency 
/proc/sys/net/ipv4/vs/sync_threshold 

but I can only find the last one on my system. according to Connection Synchronisation by Horms, there seems a patch for this. but what's the difference between ip_vs_user_sync_simple and sync-daemon within ipvsadm. and why it seemed to work without the two files, were they set the default values?

Peter Mueller

conn_sync (http://www.ultramonkey.org/papers/conn_sync/conn_sync.shtml), and a more in-depth example of HOWTO-doit: lvs_jan_2004.pdf (http://www.ultramonkey.org/papers/lvs_jan_2004/stuff/lvs_jan_2004.pdf).

Horms 04 Feb 2005

ip_vs_user_sync_simple is userspace; the ipvsadm controlled daemon is in the kernel. I had worked on the synchonisation code for a customer, and we decided to make a userspace daemon to match up with some requirements for that customer. In a nutshell, its easier to write user-space code than kernel code. It also addressed a number of concerns I had with the in-kernel code at the time. However, the kernel code problems have all been fixed now. So unless you desparately want to do some hacking, the in-kernel, ipvsadm-controlled daemon is my recomendation.

Sebastian Vieira sebvieira (at) gmail (dot) com

I've noticed that upon a failover, not all connections are sync'd (via syncdaemons) to the backup director. I read in the docs for ultramonkey (not using it ... I think ... but that was the only source I could find) something about /proc/sys/net/ipv4/vs/sync_threshold and how to manipulate this. If I understand things correctly, sync_threshold has 2 values. By default they are 3 and 50, meaning that after 3 packets the connection will be initially synchronised. After that, each 50 sent packets will cause the connection to be synchronised. That is, if i understand it correctly :)

Ratz 10 Nov 2006

I think this is a sound understanding of the mechanism.

Now I recall reading somewhere that there is a certain timeout involved. I mean that if no packets are sent for a certain time, the connection will not be synchronised. I don't know if this is true, but this could be the reason.

Yes, the "templates" are sent once but within the interval specified.

38.7. The synchd produces broadcast traffic

If the synchd sends its traffic over the RIP network and it's been a while since you set the LVS up, you might forget that it sends broadcasts.

Dan Brown danb (at) zu (dot) com 19 Apr 2006

I've been watching errant traffic via tcpdump trying to track some unrelated problems and have noticed there is a lot of broadcast traffic coming from the active director. The traffic all looks like this:

09:13:04.016297 IP 216.94.150.8.32848 > 224.0.0.81.8848: UDP, length 28

According to some archive posts, this is how Apache session information is shared. I haven't dug deeper into the tcp traffic to figure out if this is true. These broadcasts occur every 2-6 seconds and aren't on a consistent schedule. I have a dedicated set of interfaces for heartbeat information (which I thought also shared the session information) and it looks like this:

09:38:14.093733 IP 10.0.0.1.32847 > 10.0.0.2.ha-cluster: UDP, length 159 09:38:14.319831 IP 10.0.0.2.32807 > 10.0.0.1.ha-cluster: UDP, length 158 09:38:15.095778 IP 10.0.0.1.32847 > 10.0.0.2.ha-cluster: UDP, length 159 09:38:15.317917 IP 10.0.0.2.32807 > 10.0.0.1.ha-cluster: UDP, length 158

I get a pair of broadcasts once per second. I expect this as it is configured that way. The broadcast info to 224.0.0.81.8848 is not configured in ha.cf, and neither director has mcast settings on any device. So what is the information being broadcast on the external internet device? I shouldn't be seeing _ANY_ broadcast packets over the external interface as far as I'm concerned.

Graeme Fowler graeme (at) graemef (dot) net

This is the LVS synchronisation daemon pushing state information from the master to the backup director (and it is in fact multicast, not broadcast, see http://www.iana.org/assignments/multicast-addresses).

You should have an ipvs_syncmaster process on your master, and an ipvs_syncbackup process on the backup. This gives you the stateful failover which is so desirable upon director failure.

It is possible to put this traffic onto a separate interface (like your heartbeat network) to save it being sent out to all the machines on the frontend network, but how that's configured depends on which application you use you manage your LVS.

  • ipvsadm: --mcast-interface interface
  • keepalived: lvs_sync_daemon_interface option in the VRRP instance section
  • ldirectord: seems not to have the option in the CVS version I'm looking at (Id: ldirectord,v 1.136 2006/04/05 02:12:24 horms) but can be driven alongside ipvsadm anyway quite happily, providing you don't stomp on the functionality provided by ldirectord.

Horms 25 Aug 2006

The data is completely unsecured. Anyone who joins the multicast group (opens a socket) can get the packets. Though they probably aren't that interesting. What is intersting is that they can also inject packets, to say flood the connection table with entries. I've never crafted an attack, but I'm pretty sure the scope is ample.

I worked on some code a few years ago to move part of the synchronisation into user-space, and secure it using a signature and a shared secret. Now that crypto-api is in the kernel (and has been for years) it should be easy enough to move this code into the kernel, which is a less invasive change.

I wonder if there is any interest in this, as there certainly wasn't when I worked on it before.

38.8. from the mailing list

Dave Augustus davea (at) support (dot) kcm (dot) org 14 Nov 2002

We are currently building an LVS and are close to deployment. All machines have 2 nics- 1 public and 1 private. I want to use the private for the connection sync daemon.

  • 2 directors: master and backup
  • 4 realservers

All directors and realservers using same kernel: 2.4.19, ipvsadm v1.21 2002/07/09 (compiled with popt and IPVS v1.0.6).

When I specify on Master Director:

--start-daemon=master --mcast-interface=eth1

and on the backup Director:

--start-daemon=backup --mcast-interface=eth1

No connection information is available on the backup using these settings. Also, my message log for the Master Director lists: "kernel: IPVS: ip_vs_send_async error." However, when I change the mcast-interface to eth0, the connection sync works fine and no errors are reported.

...

Now for my belated reply( I just deployed an 8 server LVS): The workaround I came up was simply dropping the --mcast statement altogether. The traffic now is handled on the eth0 interface by default. I didn't want to do it this way. The bug seemed to crop up whenever I specified ANY interface..

tuliol (at) sybatech (dot) com Jun 20, 2004

Currently I have connection synchronization working between directors. The setup is configured by manually running the commands:

ipvsadm --start-daemon master #Master Linux director ipvsadm --start-daemon backup #Slave Linux director

My question is: Is there a way to automatically start those 2 processes by putting a setting in the ldirectord config file or somewhere else?

Horms

Ldirectord is one option. However it only really makes sense if you are running it as a stand alone daemon, as you want the synchronisation daemons running all the time. If you are using something like heartbeat to start and stop ldirectord, then you don't want the synchronisation daemons handled there, else the synchronisation daemon won't run when the ldirectord resource is relinqushed on a node, and this really isn't what you want. ipvsadm has an init script, You should be able to use that to start and stop the daemons.

Well here's what I am going to be doing in my next cluster going live very soon now (sm). If anyone sees anything silly please let me know. Note that if you copy my config you will have to change /etc/ha.d/update's SSH line to the proper host, and reverse it for the backup host. You will also probably want an ssh-public key login setup, although I'm not certain I'm going to do this.

</etc/ha.d/resource.d/lvsstate.sh>
#!/bin/sh
# script to set the sync state properly on both LVS servers.

case "$1" in
  start)
    /sbin/ipvsadm --stop-daemon
    /sbin/ipvsadm --start-daemon master
  ;;
  stop)
    /sbin/ipvsadm --stop-daemon
    /sbin/ipvsadm --start-daemon backup
  ;;
esac

exit 0

</etc/ha.d/haresources>
IPaddr::ip.goes.here.here \
...
ldirectord::ldirectord.cf \
lvsstate.sh

</etc/ha.d/update>
#!/bin/sh
# script for updating ldirectord nicely.  created 06/05/2001 PM

# first, backup ldirectord.cf in case someone messes up later.
cp -f /etc/ha.d/conf/ldirectord.cf
/etc/ha.d/conf/backup.of.ldirectord.cf

# next, scp the ldirectord.cf file over to the other director
# the two LVS servers will have to have public-key acceptance for this to work.
scp /etc/ha.d/conf/ldirectord.cf lvs2-priv:/etc/ha.d/conf/.

# make sure the state is set properly for the active server
ssh rwclvs2-priv /etc/ha.d/lvsstate.sh stop
/etc/ha.d/lvsstate.sh start

# reload ldirectord
killall -HUP ldirectord

# give it a few seconds to allow the config to set
sleep 10

# now display configuration
ipvsadm -L
ipvsadm -L --daemon

James Bromberger jbromberger (at) fotango (dot) com

we run ldirectord on both master and standby hosts ALL the time. We also run ipvsadm --start-daemon master and ipvsadm --start-daemon backup on BOTH hosts all the time (2.6 kernel). The ONLY thing that heartbeat is doing is bringing the service IP addresses up and down. That way ldirectord doesn't need 10 seconds or so to check services before they come into service: everything is instant. Maybe this is wrong, but it works really well. Our failover time is less than half a second, and all state is retained. Fail back can happen immediately as well.

unknown: who asked about the changes needed to the heartbeat scripts for syncd

Peter Mueller pmueller (at) sidestep (dot) com 29 Nov 2004

I am using a very simple script for this purpose, called through heartbeat/haresources. The quick summary seems to be go with a solution similar to mine for 2.4, and use a "slave and master on both servers" solution for 2.6 kernels. See the full thread (http://marc.theaimsgroup.com/?l=linux-virtual-server&m=108924839319403&w=2) for all the details.

Joe - which I think refers to this

Well here's what I am going to be doing in my next cluster going live very soon now (sm). If anyone sees anything silly please let me know. Note that if you copy my config you will have to change /etc/ha.d/update's SSH line to the proper host, and reverse it for the backup host. You will also probably want an ssh-public key login setup, although I'm not certain I'm going to do this.

</etc/ha.d/resource.d/lvsstate.sh>
#!/bin/sh
# script to set the sync state properly on both LVS servers.

case "$1" in
  start)
    /sbin/ipvsadm --stop-daemon
    /sbin/ipvsadm --start-daemon master
  ;;
  stop)
    /sbin/ipvsadm --stop-daemon
    /sbin/ipvsadm --start-daemon backup
  ;;
esac

exit 0

</etc/ha.d/haresources>
IPaddr::ip.goes.here.here \
...
ldirectord::ldirectord.cf \
lvsstate.sh

</etc/ha.d/update>
#!/bin/sh
# script for updating ldirectord nicely.  created 06/05/2001 PM

# first, backup ldirectord.cf in case someone messes up later.
cp -f /etc/ha.d/conf/ldirectord.cf
/etc/ha.d/conf/backup.of.ldirectord.cf

# next, scp the ldirectord.cf file over to the other director
# the two LVS servers will have to have public-key acceptance for this
to work.
scp /etc/ha.d/conf/ldirectord.cf lvs2-priv:/etc/ha.d/conf/.

# make sure the state is set properly for the active server
ssh rwclvs2-priv /etc/ha.d/lvsstate.sh stop
/etc/ha.d/lvsstate.sh start

# reload ldirectord
killall -HUP ldirectord

# give it a few seconds to allow the config to set
sleep 10

# now display configuration
ipvsadm -L
ipvsadm -L --daemon

Sebastiaan Veldhuisen Jun 03, 2005

we just implemented a 2nd director for our HA LVS environment and we want to do connection synchronization between the master and the backup director through ipvsadm

1) Should the backup server list connections its received?

Horms

No

2) If not, how do I verify that it's updated its internal tables?

ipvsadm -L -c -n. The connections will show up in the connection table in the Established state.

3) Does it work if I always run both a master and a slave sync daemon at the same time on both directors, even if ipvsadm is only running on the master?

In more recent versions of the kernel, yes.

38.9. Bug (fixed) in syncd: mixed endianness on directors

Hopefully this is in the standard ipvs release now. Although Justin says that this is an endianness problem, it appears to be a 32/64 bit problem.

Justin Ossevoort justin (at) snt (dot) utwente (dot) nl 30 Sep 2004

There was a small bug in the ip_vs_sync.c code that made it impossible for 2 servers of different endian to sync with each other (e.g. a sparc (big endian) and a i386 (little endian) based system). The problem was in the message size. All other data seems to be correctly rearranged to network byte order except for this one (probably because the size is used from the moment the data is being gathered to the moment it is send). This caused "IPVS: bogus message" messages in my dmesg.

This patch fixes this problem by converting the m->size at the last moment before sending it to network byte order. And changing it back to host order right before the message is processed. The patch is made agains the Linux kernel version 2.6.8.1.

--- linux-2.6.8.1/net/ipv4/ipvs/ip_vs_sync.c    2004-08-14
12:54:46.000000000 +0200
+++ linux-2.6.8.1-fix/net/ipv4/ipvs/ip_vs_sync.c    2004-09-30
11:54:53.000000000 +0200
@@ -16,6 +16,7 @@
   *    Alexandre Cassen    :    Added master & backup support at a time.
   *    Alexandre Cassen    :    Added SyncID support for incoming sync
   *                    messages filtering.
+ *    Justin Ossevoort    :    Fix endian problem on sync message size.
   */

  #include <linux/module.h>
@@ -279,6 +280,9 @@
      char *p;
      int i;

+    /* Convert size back to host byte order */
+    m->size = ntohs(m->size);
+
      if (buflen != m->size) {
          IP_VS_ERR("bogus message\n");
          return;
@@ -569,6 +573,23 @@
      return len;
  }

+static void
+ip_vs_send_sync_msg(struct socket *sock, struct ip_vs_sync_buff *sb)
+{
+    int msize;
+    struct ip_vs_sync_mesg *m;
+
+    m = sb->mesg;
+    msize = m->size;
+
+    /* Put size in network byte order */
+    m->size = htons(m->size);
+
+    if (ip_vs_send_async(sock, (char *)m, msize) != msize)
+        IP_VS_ERR("ip_vs_send_async error\n");
+
+    ip_vs_sync_buff_release(sb);
+}

  static int
  ip_vs_receive(struct socket *sock, char *buffer, const size_t buflen)
@@ -605,7 +626,6 @@
  {
      struct socket *sock;
      struct ip_vs_sync_buff *sb;
-    struct ip_vs_sync_mesg *m;

      /* create the sending multicast socket */
      sock = make_send_sock();
@@ -618,20 +638,12 @@

      for (;;) {
          while ((sb=sb_dequeue())) {
-            m = sb->mesg;
-            if (ip_vs_send_async(sock, (char *)m,
-                         m->size) != m->size)
-                IP_VS_ERR("ip_vs_send_async error\n");
-            ip_vs_sync_buff_release(sb);
+            ip_vs_send_sync_msg(sock, sb);
          }

          /* check if entries stay in curr_sb for 2 seconds */
          if ((sb = get_curr_sync_buff(2*HZ))) {
-            m = sb->mesg;
-            if (ip_vs_send_async(sock, (char *)m,
-                         m->size) != m->size)
-                IP_VS_ERR("ip_vs_send_async error\n");
-            ip_vs_sync_buff_release(sb);
+            ip_vs_send_sync_msg(sock, sb);
          }

          if (stop_master_sync)

Special credit goes to: Byte Internetdiensten (my current employer) for supplying the testbed that triggered this bug and the time sponsored to fix it.

--Boundary_(ID_ibHwHyEMlHthCV5bMNxEug)
Content-type: text/plain; name="fix-ipvs_sync-endian.diff"
Content-disposition: inline; filename="fix-ipvs_sync-endian.diff"
Content-transfer-encoding: 7bit

--- linux-2.6.8.1/net/ipv4/ipvs/ip_vs_sync.c	2004-08-14 12:54:46.000000000 +0200
+++ linux-2.6.8.1-fix/net/ipv4/ipvs/ip_vs_sync.c	2004-09-30 11:54:53.000000000 +0200
@@ -16,6 +16,7 @@
  *	Alexandre Cassen	:	Added master & backup support at a time.
  *	Alexandre Cassen	:	Added SyncID support for incoming sync
  *					messages filtering.
+ *	Justin Ossevoort	:	Fix endian problem on sync message size.
  */
 
 #include <linux/module.h>
@@ -279,6 +280,9 @@
 	char *p;
 	int i;
 
+	/* Convert size back to host byte order */
+	m->size = ntohs(m->size);
+
 	if (buflen != m->size) {
 		IP_VS_ERR("bogus message\n");
 		return;
@@ -569,6 +573,23 @@
 	return len;
 }
 
+static void
+ip_vs_send_sync_msg(struct socket *sock, struct ip_vs_sync_buff *sb)
+{
+	int msize;
+	struct ip_vs_sync_mesg *m;
+
+	m = sb->mesg;
+	msize = m->size;
+
+	/* Put size in network byte order */
+	m->size = htons(m->size);
+
+	if (ip_vs_send_async(sock, (char *)m, msize) != msize)
+		IP_VS_ERR("ip_vs_send_async error\n");
+	
+	ip_vs_sync_buff_release(sb);
+}
 
 static int
 ip_vs_receive(struct socket *sock, char *buffer, const size_t buflen)
@@ -605,7 +626,6 @@
 {
 	struct socket *sock;
 	struct ip_vs_sync_buff *sb;
-	struct ip_vs_sync_mesg *m;
 
 	/* create the sending multicast socket */
 	sock = make_send_sock();
@@ -618,20 +638,12 @@
 
 	for (;;) {
 		while ((sb=sb_dequeue())) {
-			m = sb->mesg;
-			if (ip_vs_send_async(sock, (char *)m,
-					     m->size) != m->size)
-				IP_VS_ERR("ip_vs_send_async error\n");
-			ip_vs_sync_buff_release(sb);
+			ip_vs_send_sync_msg(sock, sb);
 		}
 
 		/* check if entries stay in curr_sb for 2 seconds */
 		if ((sb = get_curr_sync_buff(2*HZ))) {
-			m = sb->mesg;
-			if (ip_vs_send_async(sock, (char *)m,
-					     m->size) != m->size)
-				IP_VS_ERR("ip_vs_send_async error\n");
-			ip_vs_sync_buff_release(sb);
+			ip_vs_send_sync_msg(sock, sb);
 		}
 
 		if (stop_master_sync)