# SRM with KVM and DRBD


Currently [we](http://www.atcomputing.nl) are building a fairly 
rock solid high availability cluster
for a client. This has the "usual" ingredients: two locations,
two NetApps, two clusters of three vmware ESX servers and a bunch
of virtual machines running on top of the ESX servers. Also
included in the mix is a VDI (now called View) virtual desktop
infrastructure for running virtual windows XP clients.

This is all managed by SRM (site recovery manager) *and* it 
is almost working. But that is another story.

What got me thinking is the following. 

Last week I did a 
consultancy job where they had a build a fail over cluster using
[DRBD](http://www.drbd.org/). With DRBD you have a disk device `/dev/drbd/0`
which is transparently replicated. The device file can be
used like any other, `fdisk`, `mkfs` and `mount` all work as
expected.

Now throw [KVM](http://www.linux-kvm.org/page/Main_Page) into the mix...

The virtual machine images must be stored on the DRBD device. Suppose
we have two servers called *master* and *slave*. On *master* the
`kvm` processes run. In a failover situation the following needs
to happen:

* If *master* is stil available, kill all kvm processes;
* if *master* is still available, set the DRBD device in secondary
mode or disable it all together;
* On *slave* make the DRBD device primary (so that it will become 
available in `rw` mode. If you don't do this you get *Wrong medium type*
errors;
* On *slave* start the kvm processes again.

It would even be cooler if the virtual machines could actually be
copied over *while* still are running, but I don't know if that would
be possible.

Shared storage would be possible by letting one virtual machine
export (via NFS/SaMBa/iSCSI) another DRBD device.

So my site recovery manager script (SRM script) will be something
along the lines of this:

    #!/bin/bash
    
    # when doing a fail over call it on the old site
    # (if still available): srm stop
    # the other side call it like: srm start

    case $1 in
	stop)
	    /etc/init.d/kvm stop
	     drbdadm /dev/drbd/0 secondary

	;;

	start)
	     drbdadm /dev/drbd/0 primary -o
	    /etc/init.d/kvm start

	;;
    esac
    exit 0

Is it really *that* simple?


