40. LVS: Setting up Linux-HA for directors (mostly by using rpms)

Note

Mar 2003: I had difficultly with the scripts provided with Heartbeat and wrote my own. I haven't written them up as I expect I'll put any spare time into Alexandre's Section 36.6 which automatically handles the arp-caching problem of failover. Ard van Breeman had trouble too and has IPaddr script is included below

This was posted to the mailing list by Peter Mueller Peter Mueller pmueller (at) sidestep (dot) com on 17Sep2001. (The original was in html with DOS carriage control. I've converted it by hand. There may be some parsing errors. Joe)

original mon files from Juri, data posted from personal experience or mailing list (linux-ha or LVS) or respective websites.

urls

Note

these scripts assume mon 0.99.2. For simplicity in install I downloaded mon-0.38rpm (couldn't find new 0.99.2 rpm) and upgraded to 0.99.2 via source. I then changed appropriate lines in /etc/rc.d/init.d/mon.

40.1. linux-ha howto

This document is a mini how-to get heartbeat working between two individually working LVS boxes. It is certainly not intended to be all-encompasing document detailing everything imagineable. What it is intended to deliver is an 'essential steps' to getting LVS-HA functional. And you definitely should have two individually functioning boxes before even attempting this. (Yes, go back and test your setup with each box to insure it works!).

Another important note to add is that I have only tested this setup with Ultramonkey RPMs. I don't know if your setup will work. I wouldn't trust this document unless you do the same. (I would be interested in knowing if the HA features are the same for all 'heartbeat' setups..)

PS. - apologies if this document is RedHat biased, I'm running from VALinux boxes that are RedHat configured.

40.2. Fix the (possible) ethernet alias issue.

By now you've setup a dummy alias device on each LVS box (most likely eth0:0). This alias device is unecessary and potentially problematic in the HA-setup. The reason for this is that the heartbeat software (/etc/ha.d/resource.d/) actually creates a new eth0:0 device on the active box. If you have an eth0:0 (or whatever) alias configured for your VIP on the standby director box, you might get a " VSbox2 kernel: Uh Oh, MAC address 00:02:B3:03:9A:13 claims to have our IP address (vip.ip.goes.here) (duplicate IP conflict likely)" error! Not good...

If I were you I'd move your alias script out of your /etc/sysconfig/network-scripts/ directory and restart networking to clear out that alias. Alternatively, if you are using shell scripts then you should modify those to not control alias ips.

40.3. Configure /etc/ha.d/. files.

  • authkeys

    authkeys MUST be permission-set to 600 or 400 from what I have read. Be sure this is the case. authkeys should contain something like

    auth 2
    #1 crc
    2 sha1 passwordhere
    #3 md5 Hello!
    

    Since you want to make sure this file is the same on both machines, get it setup on one box and scp or ftp the file over to the other.

  • haresources

    haresources is convoluted to understand until you have a working setup. The example config show things like :

    #just.linux-ha.org	135.9.216.110 httpa
    

    when something like : primary.director.box.goes.here shared.resources.address.here http

    #vs1.foo.com vip.foo.com http # <-- put actual IP down instead of vip.foo.com
    vs1.foo.com IPaddr::10.10.10.10 ldirectord::ldirectord.cf # <-- if you use ldirector like this
    # multiple VIP example follows
    # vs1.so.com IPaddr::10.10.10.10 IPaddr::10.10.10.254 ldirectord::ldirectord.cf
    

    It's important to note that the box listed in the first box is considered the 'primary' director box and usually takes control in the event of uncertainty. (Definitely look at nice_failback in ha.cf if you're interested in this thread).

  • ha.cf

    high-availability configuration file. yep, looks like the meat of the subject! I'll just post my config, which assumes you use ttyS0 and eth0 for your links to the other director.

    #       File to wirte debug messages to debugfile /var/log/ha-debug
    #       File to write other messages to logfile /var/log/ha-log
    #       Facility to use for syslog()/logger logfacility     local0
    #       keepalive: how many seconds between heartbeats
    keepalive 1
    #       deadtime: seconds-to-declare-host-dead
    deadtime 20
    #       hopfudge maximum hop count minus number of nodes in config
    #hopfudge 1
    #       serial  serialportname ...
    serial  /dev/ttyS0
    #       Only for serial ports.  It applies to both PPP/UDP and "raw" ports
    #       This means run PPP over ports ttyS1 and ttyS2
    #       Their respective IP addresses are as listed.
    #       Note that I enforce that these are local addresses.
    #	Other addresses are almost certainly a mistake.
    #ppp-udp        /dev/ttyS1 10.0.0.1 /dev/ttyS2 10.0.0.2
    #       Baud rate for both serial and ppp-udp ports...
    baud    19200
    #       What UDP port to use for udp or ppp-udp communication?
    udpport 694
    #       What interfaces to heartbeat over?
    udp     eth0
    #       Watchdog is the watchdog timer.
    #	If our own heart doesn't beat for
    #       a minute, then our machine will reboot.
    #watchdog /dev/watchdog
    #       Nice_failback sets the behavior when performing a failback:
    #
    #       - if it's on, when the primary node starts or comes back from any
    #         failure and the cluster is already active, i.e. the secondary
    #         server performed a failover, the primary stays quiet, acting as a
    #         secondary.  This way some operations like syncing disks can be
    #         easily done.
    #       - if it's off (default), the primary node will always be the primary,
    #         whenever it's powered on.
    nice_failback off		# <-- might want to turn this on after you get things working
    #       Tell what machines are in the cluster
    #       node    nodename ...    -- must match uname -n
    node    vs1.foo.com	# <-- must match uname -n !
    node    vs2.foo.com	# <-- must match uname -n !
    

40.4. Stop ldirectord from starting, ensure heartbeat starts on reboot

/etc/rc.d/init.d/ldirectord stop.
/usr/sbin/chkconfig --level 2345 ldirectord off
/usr/sbin/chkconfig --level 345 heartbeat on # <-- run on whatever init levels you want

40.5. starting heartbeat and verifying functionality

At this point you should have linux-director NOT running on both boxes. If you type ipvsadm -L on either box you should get:

[root@vs1 ha.d]# ipvsadm -L
IP Virtual Server version 0.9.11 (size=3D4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port          Forward Weight ActiveConn InActConn

Now start up heartbeat. tail /var/log/messages, and /var/log/ha-log for important log information. My /var/log/messages looks like :

Apr 24 13:12:38 vs1 heartbeat[2070]: Configuration validated. Starting heartbeat.
Apr 24 13:12:39 vs1 heartbeat[2075]: Starting serial heartbeat on tty /dev/ttyS0
Apr 24 13:12:39 vs1 heartbeat[2075]: UDP heartbeat started on port 694 interface eth0
Apr 24 13:12:39 vs1 heartbeat[2077]: node vs1.internal.smartbasket.com -- link eth0: status up
Apr 24 13:12:39 vs1 heartbeat[2077]: node stage-monitor -- link /dev/ttyS0: status up
Apr 24 13:12:39 vs1 heartbeat[2077]: node stage-monitor -- link eth0: status up

And a quick check of ifconfig on the primary director shows the alias interface (eth0:0) appears. Note that eth0:0 is *NOT* present when heartbeat isn't running.

[root@vs1 ha.d]# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:02:B3:06:B6:45 =20
          inet addr:10.0.1.5  Bcast:10.0.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:106550 errors:0 dropped:0 overruns:0 frame:0
          TX packets:75338 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100=20
          Interrupt:10 Base address:0xd000=20

eth0:0    Link encap:Ethernet  HWaddr 00:02:B3:06:B6:45 =20
          inet addr:10.0.1.10  Bcast:10.0.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:10 Base address:0xd000=20

A ps aux on the active director shows :

root      1648  0.0  0.1  1444  868 ttyS0    SL   13:17   0:00 /usr/lib/heartbeat/heartbeat
root      1650  0.0  0.1  1332  748 ttyS0    SL   13:17   0:00 /usr/lib/heartbeat/heartbeat
root      1651  0.0  0.1  1332  736 ttyS0    SL   13:17   0:00 /usr/lib/heartbeat/heartbeat
root      1652  0.0  0.1  1328  736 ttyS0    S    13:17   0:00 /usr/lib/heartbeat/heartbeat
root      1653  0.0  0.1  1332  732 ttyS0    SL   13:17   0:00 /usr/lib/heartbeat/heartbeat
root      1654  0.0  0.1  1328  728 ttyS0    S    13:17   0:00 /usr/lib/heartbeat/heartbeat
root      1775  0.0  0.8  5352 4388 ttyS0    S    13:17   0:00 perl /etc/ha.d/resource.d/ldirectord ldir
root      1869  0.0  0.1  2344  724 pts/0    R    13:20   0:00 ps aux

40.6. Test your fail-over features, understand HA.

At this point you should test around your failover functionality and learn how your setup works. You also need to customize your ha.cf file to the specifications for your site.

As noted in the 'getting started' document mentioned in the url section above, be certain to NOT yank all heartbeat medium cables at once! This will cause a 'split brain' scenario and you won't be happy! Test failover possibilities one at a time, or catastrophically!

40.7. Configuration of mon - recommended

  • add lines to /etc/services

    mon             2583/tcp                        # MON
    mon             2583/udp                        # MON traps
    
  • install Perl modules

    get modules,files from 10.0.0.34 ftp, directory /mon, install.

    Convert-BER, Period, Time-HiRes, Mon, fping

  • convert headers into perl headers

    cd /usr/include
    h2ph *.h sys/*.h asm/*.h
    
  • install mon-rpm

    to get /etc/rc.d/init.d/mon and other nice features automate-installed. afterwards update to the latest source available to get bug fixes. I recommend untarring in /usr/src/mon-x and symlinking that directory to /usr/src/mon for simplicity and ease of upgrade.

  • install mon.cf file into /etc/mon/

    modify if appropriate (ie, change the gateway that it monitors). the mon.cf file contains lots of configuration options which you should be familiar with, such as log locations. Example included below.

  • copy any specific monitors from staging or production to your new box.

    In this example we are using a few extraneous monitors : fping.monitor, pid.monitor, heartbeat.alert, and restartheartbeat.alert. all '.monitor' files go in the 'mon.d' folder, and all '.alert' files go in the alert.d folder.

  • change the /etc/rc.d/init.d/mon file to point to appropriate paths

    change /usr/lib/mon to /usr/src/mon and the cf line to /etc/mon/mon.cf. (either that or copy from a working server).

  • make sure a copy of fping is in the restricted path solicited by /etc/rc.d/init.d/mon.

    one way of fixing this is via a simple 'cp /usr/local/sbin/fping /usr/sbin' (or /usr/local/bin or anywhere in your path).

  • create /etc/mon/monusers.cf.

    Instructions in man file (man mon).

    #!/bin/bash
    # example heartbeat.alert from Juri
    # Script to start/stop heartbeat daemon
    # Put a line like
    # alert heartbeat.alert
    # or
    # upalert heartbeat.alert
    # in your mon config file
    
    HEARTBEAT="/etc/rc.d/init.d/heartbeat"
    if [ $9 = "-u" ]; then
    	$HEARTBEAT start
    else
    	$HEARTBEAT stop
    fi
    
    #!/bin/sh
    # example pid.mon from Juri
    # Script for mon to check wether a process is running or not.
    # Invoke with
    # monitor pid.monitor process
    
    /sbin/pidof -s $1 > /dev/null 2>&1
    
    if [ $? -eq "0" ]; then
    	echo "$1 running"
    	exit 0
    else
    	echo "$1 not running"
    	exit 1
    fi
    
    # Sample mon.cf configuration file for mon, originally from Juri
    #
    # You have to restart mon after editing this file in order for your
    # changes to take effect.
    
    authtype	=	userfile
    userfile	=	/etc/mon/monusers.cf
    cfbasedir	=	/etc/mon
    alertdir	=	/usr/src/mon/alert.d
    mondir		=	/usr/src/mon/mon.d
    logdir		=	/var/log/mon
    dtlogfile	=	/var/log/mon/downtime
    dtlogging	=	yes
    historicfile	=	/var/log/mon/history
    maxprocs	=	20
    histlength	=	100
    monerrfile	=	/var/log/mon/errfile
    
    # Hostgroup entries
    #hostgroup node1 stage-monitor
    #
    #hostgroup node2 vs1
    #
    hostgroup gateway 10.0.1.2
    
    #hostgroup heartbeat localhost
    
    
    ###########
    # Gateway #
    ###########
    
    watch gateway
    	service fping
    		interval 10s
    		monitor fping.monitor
    		#comp_alerts	<-- default starting in mon 0.99.1
    		period NORMAL: wd {Sun-Sat}
    			numalerts 1
    			alertafter 3
    			alert heartbeat.alert
    			upalert restartheartbeat.alert # read mon file
    
    #############
    # Heartbeat #
    #############
    
    #watch heartbeat
    #	service heartbeat
    #		interval 15s
    #		monitor pid.monitor /usr/lib/heartbeat/heartbeat
    #		depend gateway:fping
    #		dep_behavior m
    #		period NORMAL: wd {Sun-Sat}
    #			#alert restartheartbeat.alert
    #			upalert heartbeat.alert
    
    ##############
    # First node #
    ##############
    #
    #watch node1
    #	service http
    #		interval 10s
    #		monitor http.monitor
    #		period NORMAL: wd {Sun-Sat}
    #			alert restart.alert httpd ;;
    #		period ADVANCED: wd {Sun-Sat}
    #                       alert mail.alert root
    #			alert reboot.alert
    #			alertafter 3
    #			alertevery 1m
    # Example for testing disk operations
    #	service disk
    #		interval 10s
    #		monitor nfs.monitor /vol/shared/0/ ;;
    #		period wd {Sun-Sat}
    #                        alert mail.alert -s 'REBOOT: Disk not responding!' root
    #			alert hardreboot.alert
    #			alertafter 2
    #			alertevery 1m
    # Example for testing rpc based services such as nfs, nis etc.
    #	service rpc
    #		interval 10s
    #		monitor rpc.monitor -r mountd -r nfs
    #		period wd {Sun-Sat}
    #                        alert mail.alert root
    #			alertafter 2
    #                        alertevery 1m
    ###############
    # Second node #
    ###############
    #
    #watch node2
    # Example for testing disk operations
    #	service disk
    #		interval 10s
    #		monitor nfs.monitor /tmp ;;
    #		period wd {Sun-Sat}
    #                       alert mail.alert root
    #			alert hardreboot.alert
    #			alertafter 2
    #			alertevery 1m
    # Example for testing samba
    #	service samba
    #		interval 15s
    #		monitor tcp.monitor -p 139 localhost
    #		period wd {Sun-Sat}
    #                        alert restart.alert smb
    #		period ADVANCED: wd {Sun-Sat}
    #                        alert mail.alert root
    #			alert reboot.alert
    #			alertafter 3
    #			alertevery 1m