The planned configuration was:
- 3 servers, each with 3 nics - runing XenServer v6.0
- 2 iSCSI shared storage servers
- 2 of the NICS bonded providing the management interface and added to each guest (VM) system.
- 1 if the NIC for a dedicated storage network.
- A dedicated storage network (addresses 192.168.40.x)
We then reinstalled exactly the same version of Xenserver. When it started, it could see the 3 network intefaces.
We tried various combinations of first bonding the first 2 NICs before adding it to the pool, or adding it without any configuration.
- backing up the VM metadata.
- though the VM data on the storage array should be unaffected, ensure it is all is backed up.
- for each server - remove it from the pool, install XenServer
- create a new pool, and
- restore the storage repositories and VM metadata.
It was set up much as per this post, there wasn't much difference between v6.0 and v6.1.
What I decided to do is to save the metadata for the VMs individually using cli vm-export command in addition to the xsconsole command that backs up the metadata to disk on the storage repository.
The vm export command:
xe vm-export filename=vm_testserv2 uuid=ac7fc085-aa66-826d-5a69-599c5af20118 metadata=true
As part of the testing we upgraded the test pool from v6.0 to v6.1 (skipping v6.0.2) - the same version of Xenserver as the live pool. I wanted to be sure that there wouldn't be any problem accessing the storage repositories and the VM metadata backups and restarting the virtual machines themselves.
- VM storage configuration.
- VM (in this case Centos) kernel levels, Centos versions older than 5.4 can have problems with XenServer versiosn 6.02 and 6.1
- Ejected any mounted virtual machine iso images.
- Storage repository configuration - ip addresses, names and connection details.
- Storage repository configuration - as screenshots from XenCenter, if under stress later a pictorial representation later can be helpful. We had 7 virtual machines, so taking the screenshots was manageable.
- Saved lists from xen command line utilities, this included: xe sr-list; xe vm-list
- Backup (dump) the pool ( a metadata backup
- Run a server status report and upload it to https://taas.citrix.com/ - Citrix Autosupport
|Example Server disk screenshot.|
2. The Process.
- Check no disks are on any of the local storage arrays. Sometimes templates - if created when the servers were first built can remain on local storage.
- Recheck and eject any mounted VM CD drives (as mentioned above).
- Do another backup of the virtual machines to a storage repository, also backup each vm individually:
xe vm-export filename=vm_custweb uuid=ac7fc085-aa66-826d-5a69-599c5af20118 metadata=true
- Remove one server from the pool (or in our case the newly installed server that we couldn't rejoin to the pool).
- Install Xenserver 6.1, give it the same networking details (ip address, gateway, dns settings).
- On the old servers in the running pool, stop all virtual machines and detach - disconnect) from all of the shared storage repositories. For now we left them running.
- On the new server configure the networking (note however I didn't create the bonded network interface at this point).
- On the new server, connect to the storage repositories.
- Restore the virtual machines - (xs console, see screenshot below)
- Start one of the VMs.
- Then shutdown the old Xenservers, install Xenserver 6.1 and add them to the new pool.
VM Metadata Restoration.
Unreable Guest Server Attached Disk.
This returned the expected partition - a single ext2/ext3 partition for the entire disk. However a fsck still failed.
Once again there was no obvious reason why this partition should have problems. It hadn't been resized or any other changes. One possbility is that it may have been moved from another storage repository - as opposed to being created and formatting on its current repository.
I then attempted to recover the partition using gpart, which did work.
The full trace of the restore session is below, I had first run it in read mode before specifying write mode (-W <device>)
After this I was able to check and mount the partition:
While this process worked ok there is no substitution for having a complete backup of the system.