rolling back to snapshot from backup produces IO errors in virtual machine

jaaassh

Dabbler
Joined
Apr 23, 2015
Messages
49
I've done these things:
1) Stopped my (ubuntu) VM
2) Clicked Rollback on a snapshot months ago
2a) I skipped checks / confirmed - which deleted all the snapshots between now and that old snapshot
3) booted the VM, was able to SSH in, worked great (except my data wasn't there - oh well)
4) stopped the VM again
5) created a replication task to replicate snapshots from my backup drive back to the real location
5a) this seemed to go OK. it took a while, slowly progressed and finishes successfully
6) "Rolledback" to the most recent snapshot (this morning)
6a) Used default Rollback options since this was most recent snapshot
6b) didn't really get a confirmation - got a the Truenas modal/spinner thing for a hot second, then it disappeared.
6c) assumed it rolled back the VM?
7) Booted the VM

Here's the problem now - ubuntu won't boot. Lots of filesystem errors.

Did I do something wrong here?

First thing I can see in VNC viewer:
Cursor_and_bhyve_-_noVNC.jpg


Then eventually:
truenas2.jpg
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
You can just send and receive the latest (or an intermediate) snapshot from your backup system to a new zvol on your production system. Then configure that as the virtual disk and you should be able to boot and run just fine.

I cannot spot what in particular went wrong in your case but I would not bother with a replication task for restore when a shell one liner will do.

Keep in mind though, that your replicated snapshots are images of your VM equivalent to pulling the power from a live machine. Unless you always shut down your VM first. So minor data loss and the need for a filesystem check and possibly database check etc. is to be expected.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
...
Keep in mind though, that your replicated snapshots are images of your VM equivalent to pulling the power from a live machine. Unless you always shut down your VM first. So minor data loss and the need for a filesystem check and possibly database check etc. is to be expected.
This is one of the most over looked things with VMs & above VM snapshots. Both VMWare & ZFS zVols can have this problem. Unless the VM and the VM server, (VMWare or ZFS), is writing everything in perfect order, corruption of the VM is always possible,

The only way to be absolutely certain that a ZFS snapshot of a zVol with a VM in it, will work perfectly, is to shut down the VM gracefully. Take the ZFS snapshot, then boot the VM.

At my day job, we have a VMWare RHEL VM with huge amount of storage. Turns out we can't use VMWare snap shots to get a stable copy of that VM's storage. The data is just too volatile. If we shut that VM down, took the snapshot and then booted the VM, it would probably be okay. But, during recovery exercises, we had to develop special methods for recovery because we can't shut that VM down every night for VMWare snapshots.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The key is understanding the difference between a "crash-consistent" and "quiesced filesystem" backup. The TrueNAS VMware integration is able to request quiesced snaps from vCenter, but this process can take a long time, or even time out on high-IO machines.

@Arwen I had to do similar operations for a nightly backup of a large RHEL machine - fsfreeze was invaluable there.
 

jaaassh

Dabbler
Joined
Apr 23, 2015
Messages
49
The issue was because for whatever reason, replicating my snapshot from my backup back to production left it in readonly mode and I overlooked that

All the errors above are "normal" if you try to boot ubuntu on a readonly filesystem -- it fails pretty hard.

Fix was to edit the zvol and uncheck the readonly box.
 
Top