Linux Virtual Server Project

Joseph Mack

jmack (at) wm7d (dot) net

Software Freedom Day, UNC, Chapel Hill, NC, 14 Sep 2007


Table of Contents

1. Introduction
1.1. LVS Overview
1.2. LVS Team Members
1.3. Why load-balanced servers rather than a single big server?
1.4. A little bit about the LVS code
1.5. Typical setup: Nomenclature
2. LVS Forwarding
2.1. Forwarding: by Direct Routing
3. Scheduling: Selecting the realserver to service a connection
3.1. General Purpose Schedulers
3.2. Special Purpose Schedulers
4. The arp problem
5. Failover
5.1. Realserver Failover
5.2. Director Failover
6. Performance
7. Security
8. Conclusion

1. Introduction

Figure 1. LVS Logo

LVS logo.

LVS logo.

1.1. LVS Overview

Figure 2. LVS Overview

LVS Overview

LVS Overview

  • kernel code (ip_vs)
  • a userland configuration utility (ipvsadm).

1.2. LVS Team Members

LVS project was started in 1999 by Wensong Zhang, a student in China.

Current LVS teams members

  • Julian Anastasov (Bulgaria)
  • Graeme Fowler (England)
  • Simon Horman (Australian working in Japan)
  • Joseph Mack (Australian working in USA)
  • Roberto Nibali (Sicilian working in Switzerland)

Richard Stallman

1.3. Why load-balanced servers rather than a single big server?

better performance/throughput/reliability for a fixed cost.

  • What everyone wanted: performance
  • What everyone needed: 100% uptime (you hide failure/planned maintenance)

Figure 3. Single Big Server

Single Big Server

Single Big Server

Figure 4. Load Balanced Server

Load Balanced Server

Load Balanced Server

  • from single box: cost=O(e^performance)
  • from array of boxes: cost=O(number_boxes)=O(performance)

Because of the linear scaling of cost with performance, for sufficiently large throughput, everyone turns to clusters of computers.

1.4. A little bit about the LVS code

1.5. Typical setup: Nomenclature

Here are the names of the LVS components.

  • CIP: client's IP
  • VIP: Virtual IP (on director and realservers)
  • DIP: director's IP in the RIP network (default route for LVS-NAT, and is moved on director failover)
  • RIP (RIP network): realserver's IP

Figure 5. LVS nomenclature

LVS Nomenclature

LVS Nomenclature (example shows virtual service on port 80)

On being "virtual"

  • virtual: from Merriam-Webster

    "being on, or simulated on a computer or computer network"
  • virtual server:
  • realserver:

2. LVS Forwarding

2.1. Forwarding: by Direct Routing

Table 1. Packet path in LVS-DR connecting to VIP:80

network segment packet type packet addressing
client->director IP CIP:1025->VIP:80
director->realserver ethernet (content = IP packet) (MAC DIP)->(MAC RIP1)[CIP:1025->VIP:80]
realserver->client IP VIP:80->CIP:1025

Figure 6. LVS-DR 1st hop

LVS-DR 1st hop

LVS-DR 1st hop

Figure 7. LVS-DR 2nd hop

LVS-DR 2nd hop

LVS-DR 2nd hop

Figure 8. LVS-DR 3rd hop

LVS-DR 3rd hop

LVS-DR 3rd hop

Client/Server semantics are preserved

  • The client sends its packets to VIP:80 and receives a reply from VIP:80
  • the client thinks it is connecting directly to a server
  • the realserver thinks it is being contacted directly by the client

neither the client nor the realserver can tell that the director was part of the packet exchange.

3. Scheduling: Selecting the realserver to service a connection

3.1. General Purpose Schedulers

  • round robin: (default)

    Figure 9. LVS scheduling - Round Robin

    
	LVS scheduling - Round Robin

    LVS scheduling - Round Robin

  • least connected:

3.2. Special Purpose Schedulers

  • DH (destination handler)

    Figure 10. LVS scheduling - Destination Handler

    
	LVS scheduling - Destination Handler

    LVS scheduling - Destination Handler

  • SH (source handler)

    Figure 11. LVS scheduling - Source Handler

    
	LVS scheduling - Source Handler

    LVS scheduling - Source Handler

4. The arp problem

All machines (director, realservers) have the VIP and all VIPs can be seen by the router. How does the router know to send the packets from the client to the director and not to the realservers?

Figure 12. LVS arp problem


	LVS arp problem

LVS arp problem

5. Failover

The failover and reconfiguration demons send commands to ipvsadm.

  • mon
  • ldirectord (Linux-HA)
  • keepalived (vrrp)

5.1. Realserver Failover

Figure 13. LVS realserver failover


		LVS realserver failover

LVS realserver failover

  • planned maintenance: set the weight=0 for that service/node. This will stop any new connections being assigned to that service/realserver. You then wait for the number of connections to drop to zero and then bring down the service/realserver.
  • failure (unplanned maintenance): use healthchecking - continuously

    • test service on each realserver:VIP:80 from the director,
    • test network connectivity

5.2. Director Failover

Figure 14. LVS director failover


		LVS director failover

LVS director failover

Directors

  • monitor each other's health
  • standby director updates the LVS connection table by listening for broadcasts from the active director.

6. Performance

Here's the test setup (we used 2.0 and 2.2 kernels).

Figure 15. LVS Performance Test Setup


		LVS Performance Test Setup

LVS Performance Test Setup

With this setup, the only measurement you get is

  • the round trip time for a packet (client-director-realserver-client).

The only possible performance parameters you can retrieve for a tcpip connection are

  • the latency of transfer (time when the first bit arrived, mSec)
  • the maximum throughput (rate of delivery of subsequent packets, Bps).

The only variable available (when the two end points have been decided), is

  • the packet payload size.

Figure 16. Sample data, client-director (log-linear)

sample Netpipe run, client-director

Sample Data, client-director.

Figure 17. Sample data, mtu as variable (log-log)


		Sample data, mtu as variable (log-log)

Sample Data, mtu as variable (log-log).

Figure 18. LVS test, parametric plot (log-log)


		LVS test, parametric plot (log-log)

LVS test, parametric plot (log-log)

What we found

  • the LVS code added at most 15% latency to small (0 byte) packets.
  • there was no change in throughput:
  • the loadaverage on the director barely moved from zero, while the test client and realserver were operating at high loadaverage

7. Security

The realservers are not exposed to clients

  • RIPs are private addresses.
  • There must be no routes to the realservers from outside: all packets to the realservers must go through the director.
  • the only route from the realservers is VIP:80->0/0 (no other packets are allowed out of the LVS).

Because the realservers will be compromised first, all logins occur on the director, from a separate administrative network. You must not be able to login

  • outside->director
  • realserver->director
  • realserver->realserver

Figure 19. LVS Security


		LVS Security

LVS Security

Standard security applies:

  • DENY all, ACCEPT only expected packets:
  • don't route all, only route expected packets (eschew default routes, which aren't needed)

Packet path

  • client->director
  • director->realserver
  • realserver->client

packets - director:VIP:80->0/0

Quiz: what should you do with a SYN packet for VIP:81 (or VIP:22)? (assume no service listening on the director:VIP:81, VIP:22).

Packets on the RIP network

  • non-LVS packets: realserver-realserver (private src,dest addresses)
  • LVS packets 0/0->VIP:80

packets from the realserver to the client: VIP:80->0/0

8. Conclusion

Being part of an internet project:

  • I've learned a lot.
  • It's nice to have all help gratefully accepted.
  • I've met face to face with some smart and nice people, that I first knew through the mailing list.
  • Maybe made a little difference to the world

Joining an internet project

  • find a project that you think is really neat and you could spend some time on (>year?)
  • check that the mailing list treats people courteously, even those who ask stupid questions.
  • start doing something you like or you think is useful (remember you're doing it for fun and because you want to)
  • don't do something just because someone else suggested it (you should expect $ for that).