ZFS snapshot restores of XenServer SRs

Status
Not open for further replies.

Flapjack

Dabbler
Joined
Mar 4, 2015
Messages
22
I have a FreeNAS 9.3 host that has a volume mostly dedicated to running VMs for a XenServer iSCSI storage repository (SR). I've had little to no problems with this particular FreeNAS host, and snapshots occur every 3hrs.

One of the XenServers experienced issues that affected the entire pool. I got all them fixed, but when the dust settled, our Exchange server lost two of its three VDIs (virtual disk images). I exhausted all options for recovering them via Xen, so I decided to rollback to the last good snapshot via ZFS.

Fortunately, that particular host only had an SR and VDIs for the single Exchange server... all other VMs are hosted elsewhere. Unfortunately, the VDI still doesn't show up, even after a rescan of the SR (both via XenCenter and via command line).

Since the FreeNAS host was running without issue, and the Xen host is the one that barfed (ironically when another SR on a different FreeNAS box crashed), I'm leaning towards suspecting the latter. Unfortunately, Xen guys seem to know Xen, and FreeNAS guys seem to know FreeNAS. I was hoping someone on here may be able to explain why even after a ZFS snapshot restore, the other 2/3 VDIs aren't showing up.

My only workable theory is that XenServer hosts metadata about the VDI itself in its local storage repository (the local disk Xen is installed on) and only the raw data of the VDI is on the SR. It sounds plausible, but I don't know much regarding the intricacies of Xen storage technology.

Any help would be really, really appreciated.
 

Flapjack

Dabbler
Joined
Mar 4, 2015
Messages
22
...oh, and if anyone is interested as to how the crash of the other FreeNAS host happened, here is the backstory:

When FreeNAS 9.3 came out, I plotted out an upgrade path. I have two FreeNAS hosts (nas1 and nas2), both attached to my Xen pool. The first to upgrade was nas2. I evacuated all the VDIs from nas2 to nas1 so that it would be empty when I upgraded it. The upgrade went well, and everything seemed to be fine. I reattached the SR in Xen, and began to move the VDIs back to nas2 so I could upgrade nas1. After a few VDI moves (which you can only initiate with a booted VM and XenTools installed on the VM), I finally had an issue. The FreeNAS box became unresponsive. No GUI, no SSH, but I could ping it. When I went to the console, there were several cam errors. In XenCenter, the alerts pane had a corresponding multipathing errors. The FreeNAS machines have on onboard NIC assigned for the management network, and an Intel quad-port NIC for a iSCSI multipath connection to the hosts.

After hard rebooting the FreeNAS box, the SR reconnected in Xen and the rest was cleanup. The interrupted disk move left the VDI in a hung state, with an instance on each SR. Through a bunch of vdi-list/vdi-forget command via xe command line, I was able to get the borked machine back to a bootable state, and restore it to a (Xen) snapshot taken right before the move (which was probably the smartest thing I've done to date in this whole fiasco).

This happened another few times before I stopped the migration process.

I posted a thread on the Citrix forums to see if anyone could assist with the multipathing errors... thinking it had something to do with the XenServers themselves, incorrect networking, etc. After some help troubleshooting, I was able to associate the multipath errors with the cam errors I had been seeing on the FreeNAS host.... specifically with a single SSD in the ZFS pool, used for l2arc. I removed the cache drive via command line and the errors immediately went away. I figured the drive was bad, so I ordered a replacement and set the old one aside for a firmware update and extensive testing, which it ended up passing. I replaced the SSD and added it back to the pool as an l2arc cache and started back to migrating.

During the very next move, the same thing happened. Crash... and hard. I had to go through the same steps to recover the VM after a reboot. Again, I stopped and evaluated. I removed the cache drive again, and the rest of the moves went off without any issues (albeit slower).

I started researching the cam errors, and others seemed to have the problem with their 9.2 > 9.3 upgrades as well. I looked in the updates and saw at least one related to the issue (can't remember which), so I applied the updates, added the l2arc cache drive back, and had no further issues with that machine, until last night.

I'm not sure if it's related, but the USB flash drive had a single error that flagged the alert in the GUI. It had to do with a single file, at: /usr/local/lib/libaria2.a. I posted a thread about it here and was advised to do a fresh install, and restore a backed up config file. That went off without any issues, though I made the critical mistake of using the same flash drive (even though gpsguy told me to buy a new one. The drive was actually a SanDisk, which he suggested. The drive passed testing, and it seemed to me that the corruption was probably related to the hard shutdowns I had experienced, so I figured a good wipe, install of FreeNAS, and a backup config restore was adequate.

Well, the same host failed again last night. It had all my VMs on it (save the Exchange server VDIs running on the nas1 host), so everything was affected... DNS, DHCP, and all other services running on the assorted VMs at the time. This time, I used an unused PNY 8GB flash drive that was still in the package. I installed the 20150327 stable release of FreeNAS 9.3, restored the backup config, and everything was back up and running with no issues. Unfortunately, somehow in the process, the Exchange server VDIs on the nas1 host got screwed up. The nas1 host never crashed or had any issues, but when shtf, XenServer must've lost track of what it was doing.

Which leads me up to the first post in this thread. I couldn't recover the VDIs via the Xen GUI or CLI, so I rolled back the last good snapshot in ZFS, hoping the VDIs would once again be available.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Can you post a debug for the FreeNAS server? It can be obtained from System -> Advanced -> Save Debug
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Few comments I'd like to add (these aren't personal attacks, just personal observations from my experience on this forum, etc.)

1. If Xen server is so poorly made that it has info on the host's boot device that can't be backed up particularly easily (aka you don't know you need to back this up) that would be a *major* setback for Xen. So I tend to think they aren't that stupid to have such a poor design to their software. Let's face it, if Xen had that kind of problem with their server software, it wouldn't take too many people losing their VMs for word to get out that you shouldn't trust their code. They'd either have to fix it, or they'd be unemployed.
2. Can you provide a full hardware list? The forum rules ask for it, but you haven't provided it. (This is probably why you have zero responses so far. Us experienced users are tired of people not providing good up-front data and will ignore people that can't provide some basic info to go on.) Just based on your information in the Citrix discussion, I get the impression you probably aren't familiar with proper hardware demands for VMs on ZFS. I'll discuss those below. Keep reading for now.
3. When you do a snapshot restore, assuming things are just like with VMWare and every other iSCSI client out there, you'll have to unmount the iSCSI device before you do the rollback (this can also be accomplished by stopping the iSCSI service, doing the rollback, then starting the service). Otherwise you'll do a rollback beneath the iSCSI layer and that can end very badly for all parties involved. The actual outcome can't even be predicted because you basically are yanking out the storage from under everything (iSCSI, the Xen server, everything).
4. Snapshots are not really a good "backup" for VMs. The reason is that if your VM is in the middle of something like a database commit to the disk when the snapshot occurs the database may be in an inconsistent state. This can also happen to the VMs file system and even the iSCSI device's file system. One of the ways that this is combated is with the FreeNAS 9.3 VMWare snapshots. It tells the ESXi host to tell the VMs to quesce themselves, then a snapshot is taken, then the VMs begin performing as before. There is no alternative to Xen as far as I know, and if there is it is definitely not supported on FreeNAS.

Hardware needs for VMs on ZFS are steep. I generally don't recommend it without at least 64GB of RAM and a well-sized L2ARC (probably around 256GB max with 64GB of RAM). Additionally you want as many vdevs as you can to prevent I/O bottlenecks, and that means mirrored vdevs, and lots of them. L2ARCs should not cause multipath errors. This tends to tell me either your L2ARC is horribly oversized for your RAM, or you don't have enough RAM to support an L2ARC. Considering other users on the forum think they can put a system together with 16GB of RAM, a 512GB SSD (or something else that is absurdly large) I'm betting you don't have enough RAM to even make an L2ARC a viable option.

Frankly, if you don't have at least 64GB of RAM and a well-sized L2ARC, you are probably creating other problems already (just like the multipath errors that were manifesting themselves because of your L2ARC). So I think the first step is to get appropriate hardware for your FreeNAS machine so that things are working "as they should" if you aren't already there. Unfortunately, I don't call a single vdev that is RAIDZ1 appropriate for VMs, especially VMs like exchange servers. Exchange servers (and database servers in general) are the worst workload you an possibly put on VMs. As such, the hardware required for them to work properly is very high (read: expensive).

For systems that have inadequate hardware, the problem often becomes obvious when doing high-I/O activities like moving VMs (sound familar?). Unfortunately there is no way around this except to increase the I/O capacity of your server with more hardware. Even then, if high-I/O is straining the pool, you run the risk of writes being discarded because of excessive storage latency. So this is an issue you definitely want to address.

Also, if you are using this in a production environment, I'd consider ditching a USB stick and going with either SATA DOMs or SSDs. Failing boot devices create lots of problems and, aside from having to replace them, can cause you to run in all sorts of other directions spending lots of time diagnosing issues that are the result of a bad boot device.

Lastly, you are using Xen. I have very very little experience with Xen itself. I will say that you are in a very small pool of people by using Xen with FreeNAS. While I would expect it to work properly, since you are having problems you are going to find very very few people here that are going to have knowledge and experience that will help you.
 
J

jpaetzel

Guest
Xen seems to get in this weird state from time to time where nothing you do from the GUI seems to help. I've had some luck repairing Xen SRs from the command line. If you send me your teamviewer 10 info I can take a look at your system.

(Yes, I know FreeNAS and Xen)
 

Flapjack

Dabbler
Joined
Mar 4, 2015
Messages
22
cyberjock:
I will get a debug log tonight. I will try to address your comments/questions below:

Xen has had issues with snapshots. Even after several patches, the issues weren't resolved until after version 6.0. I'm on 6.2 (current release is 6.5), but I have reason to believe that there still may be a lingering issue regarding snapshots. I place the blame for the original problem on Xen. I am lost as to why rolling back snapshots in FreeNAS did not restore the volumes. It wasn't that Xen didn't "see" the restored VDIs. They simply were not there. Period. I scoured the volume, and found the LVM associated with the VDIs, and there appeared to be zero change after rolling back the snapshot in FreeNAS. My original question was why this might be. That's why I was hoping for someone with experience with both solutions had some insight.

I haven't replied simply because I haven't had any free time the past few days. As for the hardware list, I will provide that for the XenServers tonight, but as for the FreeNAS machines, they are running on HP N40L Microservers, with 8GB of ECC (non-registered) memory. The drives are WD Red 2TB SATA models. I use 120GB-160GB SSD drives for l2arc. Each system has a quad-port Intel NIC with four dedicated VLANs for iSCSI traffic. Those are configured for multipath instead of 802.3ad (per Citrix recommendations).

To address the questions about my particular Xen pool, actual backups work very, very well. It was actually a backup from the day before that saved this particular machine. The backup software I use not only creates an .xva file (equivalent of .ova), it also creates the VM, imports the image, configures the networking, and boots it up so that it's automatically running when the restore is done. This is all via SSH over XenServer. The backup solution only handles the scripting behind the scenes to make this happen. The snapshot was taken prior to maintenance. I do not rely on snapshots for backup. I use a licensed copy of Xackup, which has never failed me. Fortunately, I had full backup 3 days prior to the loss of the volume, with differential backups each day after (full backup being completed every three days). The restore merged the master and differential files, created an .xva, and imported it to Xen without any issue. I do not rely on snapshots for backup. Snapshots are a convenience. I decided to try the FreeNAS snapshot first, as it was more recent than the actual backup (by about 2hrs). I would've lost less data with a FreeNAS snapshot restore than with a restore of a VM backup. When the FreeNAS restore didn't work, I restored the backup.

Regarding XenServer snapshots, that is something that Citrix has had huge issues with in the past. However, most of those problems were associated with snapshots not actually being deleted. That being said, there are a number of people, usually from the XenServer 5.6 days (I am running 6.2 on all my hosts) that have deleted a snapshot, and had the associated volume deleted as well. The original issue that made the restore necessary is on Xen. I was under the impression that rolling back the snapshots in FreeNAS, back to a time before the deleted volume issues, would bring the VDI back in Xen. It did not, but due to a lack of understanding on my part, I was not sure if it was Xen that was to blame for still not showing the volumes (after rolling back the snapshots), or if it was FreeNAS somehow not actually restoring anything. I know snapshots in FreeNAS are very small and efficient... but the size of the snapshots I had were incredibly small. Something that didn't seem possible at 3hr intervals, with 10+ VMs on that particular vol. We're talking about less than a mb, in some cases... which even one email on Exchange could account for.

Regarding the differential in hardware requirements to support VMs on ZFS... your suggestion compared to the minimum requirements for FreeNAS itself, maybe I missed it in the documentation. I cannot say that I've read every word of it. Some of the key points I've garnered from both reading the documentation, and reading about real-world deployments, is that if you're using ZFS and iSCSI, you need more than the minimum: 8GB minimum, 16GB preferable. Never once have I heard 64GB, and to be honest, that sounds ludicrous. I run no more than 20 VMs max. If FreeNAS really needs that much memory, it's time to move to another platform.

On your recommendation for moving away from a USB stick, I wholeheartedly want to agree with you. However, just in reading from these forums, others have posed the same question and been advised to stick with the flash device. Granted, that advice didn't come from you with your broader base of expertise, but it was still advice nonetheless. I give your advice more weight, and will start researching that change this weekend. Obviously, a USB device failing is going to be a big motivator.

Lastly, Xen with FreeNAS has been great. Before my current configuration, I had no reliability issues with FreeNAS over the past four years. I honestly didn't have a problem until the cam errors when using l2arc (which didn't appear until 9.3), and of course the USB device failure. Honestly, I believe the device is actually fine from a hardware standpoint... but I replaced it regardless. My gut tells me it was corrupted during a storage volume move, which I feel was related to the cam errors and the l2arc. If I remove the l2arc, those issues go away on the initial release version of 9.3. I also do not have the l2arc/cam issues on a fully updated version of 9.3.
 

Flapjack

Dabbler
Joined
Mar 4, 2015
Messages
22
Xen seems to get in this weird state from time to time where nothing you do from the GUI seems to help. I've had some luck repairing Xen SRs from the command line. If you send me your teamviewer 10 info I can take a look at your system.

(Yes, I know FreeNAS and Xen)
Thank you so much for the offer. My VMs are completely moved to the updated, stable FreeNAS machine. I can clone/copy a few to the problematic ZFS instance, if that helps. However, I want to run an all-day memtest, just to rule out faulty memory.

I don't know when you're availability is, but I can be free any time during nights/weekends. Unfortunately, I do not have the slack time during the workday. If that's the only time you're free, I'll gladly take a few hours off. I probably need it any way. :)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The 64GB of RAM and such that I discussed above is not just my opinion. Lots of people have learned the hard way, as well as other admins on this forum, that you need massive resources to prevent iSCSI with VMs from making a mess of your data and your VMs. ZFS protects your data, at all costs. Those costs translate to more hardware required to have good performance. While you and I might say "eh, who cares if the VM is slow" ESXi cares, alot. If your datastore goes offline for more than about 30 seconds, it kills the VMs as it has decided that the datastore is gone for good. Believe it or not, it is pretty darn easy to rack up 30 seconds worth of transactions from just a few VMs. If you read around on other users that talk about iSCSI, you'll find they have major problems that result from insufficiently resourced systems.

The issue with VMs, iSCSI, and needing lots of resources is not a "FreeNAS" problem. This a ZFS problem because of how ZFS works. Even iXsystems doesn't sell system that the end-user plans to use VMs on with less than 48GB of RAM in the prior gen (most of the one's I've worked on run with 96GB or more, some as much as 256GB of RAM). Today's gen the minimum is 64GB of RAM, with strong recommendations for more, and an L2ARC is pretty much a given for the last 2-3 years.

The boot device changes are relatively new with 9.3. It's kind of a transition period right now from USB to SSD. I'm somewhat expecting that 10.x is going to make USBs unfeasible because they are just too pathetically slow with random writes and random reads. They're great to boot up from and use for day-to-day operations, but when you have heavy loads like updates, the USBs just don't cut the mustard. I use SATA DOMs in all of my machines but one, and the one that doesn't has problems with updating even now. I hate updating it because I never know if it's going to succeed or not. If it fails it takes me about 20 minutes to recover so I can attempt to upgrade again. But I use USB just so I can have experience with 9.3 and USB since so many forum users use that, and I like to know what forum users are experiencing first-hand. If some manufacturer made a USB stick that had good random-write and random-read performance, even if it was just 16GB or so, they would probably own a very large part of the FreeNAS market. I'd gladly go from a SATA DOM to a USB stick and save my SATA port if I could.
 

Flapjack

Dabbler
Joined
Mar 4, 2015
Messages
22
The 64GB of RAM and such that I discussed above is not just my opinion. Lots of people have learned the hard way, as well as other admins on this forum, that you need massive resources to prevent iSCSI with VMs from making a mess of your data and your VMs. ZFS protects your data, at all costs. Those costs translate to more hardware required to have good performance. While you and I might say "eh, who cares if the VM is slow" ESXi cares, alot. If your datastore goes offline for more than about 30 seconds, it kills the VMs as it has decided that the datastore is gone for good. Believe it or not, it is pretty darn easy to rack up 30 seconds worth of transactions from just a few VMs. If you read around on other users that talk about iSCSI, you'll find they have major problems that result from insufficiently resourced systems.
I work with VMware as well, though never with FreeNAS as storage. I do know that Xen handles this kind of situation much better. I've actually had a full power loss that was not resolved until I had hands-on the system. The system was down for several hours. When I brought the storage back online, Xen automatically restored the iSCSI connections and the machines picked up exactly where they left off. Not even so much as a logged out session. Now, performance was an issue prior to moving to the four dedicated iSCSI connections. Mostly, that was seen if multiple VMs were booting up at once. It has not been a problem since.

The issue with VMs, iSCSI, and needing lots of resources is not a "FreeNAS" problem. This a ZFS problem because of how ZFS works. Even iXsystems doesn't sell system that the end-user plans to use VMs on with less than 48GB of RAM in the prior gen (most of the one's I've worked on run with 96GB or more, some as much as 256GB of RAM). Today's gen the minimum is 64GB of RAM, with strong recommendations for more, and an L2ARC is pretty much a given for the last 2-3 years.
Your personal experience, and the recommendations of others are important, but simply had never heard that before. I've tried to follow the official FreeNAS documentation whenever I could. I believe you, but just know that the documentation is nowhere near what you are saying in regards to memory requirements. It says the minimum is 8GB, while 16GB is recommended. The 9.2 documentation says a general rule of thumb is 1GB per TB of space, with a minimum of 8GB. I've attached a debug from the currently running host. It is the one that had the hard crash and the corrupted USB flash drive. It's back up and running with a new flash drive, but as I mentioned earlier, I'm going to consider moving to SSD next week (some questions on that below). The main difference between this machine and the other is that it's running 8GB of non-ECC memory. The box that is currently down has 8GB (2x4GB) of unbuffered Kingston ECC memory. Normally, the VMs run on that machine, and the other is where the rsync'd snapshot replication goes.

Now that this is the primary machine, I will probably swap the memory around, and use the box that is down for some memory experimentation. By experimentation, I've read that there are a few 8GB ECC modules (w/samsung chips) that actually work with the latest firmware. The system has two slots, and the limit is supposed to be 4GBx2. Others have had 16GB running in the system with the Samsung modules, though. Since memory seems to be the critical issue here, it's worth trying.

I also have two Dells 2950s that I've been working on refurbishing to replace the hand-built Xen servers. They each have two quad-core Xeon E5420s, with 32GB (4GBx8) of buffered ECC memory. To be honest, I had not considered using either of those machines for storage, as the Xen servers are doing most of the work. I have room in the rack, so I could procure two more, but I did like that the HP microservers had such a low power footprint.

I will try and go with your recommendation, though 64GB is out of the question... even for the 2950s. You can run 64GB in them... but it is cost-prohibitive. 32GB is doable, though. Maybe more information on the environment would help. I'm going off memory here, so be easy on me:

- (2) Xen servers in pool, each has a six-core AMD CPU, 32GB of non-ECC memory
- The current FreeNAS host is configured with three WD Red 2TB drives in RAID-Z2 (2.5TB lvm-over-iscsi presented to Xen). There is also a 180GB SSD for l2arc (not sure the brand)
- No more than 10 VMs running at any given time, most with roughly 50GB VDIs for the boot volume (mixture of Windows and linux)
- Two VMs (Exchange and WSUS) have additional VDIs attached (300GB) as data volumes

To me, this environment would have pretty light storage requirements. As I mentioned earlier, I have not had performance problems.

The boot device changes are relatively new with 9.3. It's kind of a transition period right now from USB to SSD. I'm somewhat expecting that 10.x is going to make USBs unfeasible because they are just too pathetically slow with random writes and random reads. They're great to boot up from and use for day-to-day operations, but when you have heavy loads like updates, the USBs just don't cut the mustard. I use SATA DOMs in all of my machines but one, and the one that doesn't has problems with updating even now. I hate updating it because I never know if it's going to succeed or not. If it fails it takes me about 20 minutes to recover so I can attempt to upgrade again. But I use USB just so I can have experience with 9.3 and USB since so many forum users use that, and I like to know what forum users are experiencing first-hand. If some manufacturer made a USB stick that had good random-write and random-read performance, even if it was just 16GB or so, they would probably own a very large part of the FreeNAS market. I'd gladly go from a SATA DOM to a USB stick and save my SATA port if I could.
This part of your post is the one I'm most interested in. I would rather not take up a SATA slot, but your point about updates hits home. I mentioned that one host had issues running l2arc (cam errors) and the other did not... even after I replaced the SSD with an Intel one I had laying around, the cam errors were still present. They were not on the updated 9.3 host. For whatever reason, I simply could not get the one not in use to update. If you say updates on a non-USB drive would go better, that is reason enough to explore a reconfig.

In another thread that I posted when I first started seeing USB corruption issues, someone else mentioned redundant USB devices. How hard is that to set up, and is it really feasible if I cannot go to SATA?
 

Attachments

  • debug-nas2-20150410071621.tgz
    386.8 KB · Views: 301

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Mirrored devices are very easy to set up.

I haven't tried doing it after the installation, but it worked perfectly during the original installation. It's highly recommended for USB devices, but can be done with SATA (or SAS) devices as well - depends on your needs.

Along with regular scrubs, uptime should not be influenced by failing boot devices, unless you're exceedingly unlucky.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I did some mirrored boot devices, and they work find for redundancy's sake. But they don't resolve the poor write and read performance that comes with USB.

So go with mirrors for better uptimes.

Go with SATA drives of some kind if you want to avoid problems with the poor write/read performance.
 

Flapjack

Dabbler
Joined
Mar 4, 2015
Messages
22
Mirrored devices are very easy to set up.

I haven't tried doing it after the installation, but it worked perfectly during the original installation. It's highly recommended for USB devices, but can be done with SATA (or SAS) devices as well - depends on your needs.

Along with regular scrubs, uptime should not be influenced by failing boot devices, unless you're exceedingly unlucky.

I did some mirrored boot devices, and they work find for redundancy's sake. But they don't resolve the poor write and read performance that comes with USB.

So go with mirrors for better uptimes.

Go with SATA drives of some kind if you want to avoid problems with the poor write/read performance.
It will probably be something I experiment with, just to be familiar with it... but it sounds like an SSD SATA DOM is the way to go.
 
Status
Not open for further replies.
Top