Stopped at kdb_enter+0x3b: movq $0,0xaf01d2(%rip) on boot

rogerjmw · Dec 13, 2014

I have Freenas v9.2.1.8 running in a virtual machine under VMWare Workstation 10.0.4.
HW Configuration is:
Motherboard: ASUS P9X79 WS
CPU: 4 virtual cores (from Intel i7 4930K)
RAM: 12GB
System Disk: 10GB Virtual disk
Storage: 8 * 2TB SATA Seagate NAS disks connected via LSI SAS 9207-8i HBA. These are connected directly to the VM as physical disks. Configured as RAID-Z2 (6+2).

All this has been running fine for several months with about 4.3TB of the available 10.1TB in use.

Yesterday, we had a power cut and the server lost power. On rebooting this VM now, I get a panic message:

Screenshot here:
https://app.box.com/s/clrmwtqdskedbnya9y5w

This occurs just after "Mounting Local File Systems" prompt. I have tried doing an install/upgrade of FreeNAS again and it still fails with the same error. There is nothing logged at the host OS level (Ubuntu 14.04), which I would have expected if this were any kind of hardware issue.

Suggestions please as I really don't want to have to rebuild this from scratch. I do have a backup copy of the data, but it's the time it will take to restore.

Ericloewe · Dec 13, 2014

If I'm reading this right, the pool is a goner. New pool time.

ZFS does not take kindly to cut power. The virtualized environment doesn't help, but shouldn't be at fault (directly) in this case.

Seriously consider a UPS, as well as a Xeon E5 (should work in your motherboard) and ECC RAM.

mjws00 · Dec 13, 2014

Install a new clean 9.2.1.9 vm. Add the existing pool drives. Import the pool. First with auto import then manually. Post the status of zpool import if it pukes.

This looks like an rdm hack vs. hba passthrough. But your wording isn't definitive.

cyberjock · Dec 13, 2014

Ericloewe said:
If I'm reading this right, the pool is a goner. New pool time.

ZFS does not take kindly to cut power. The virtualized environment doesn't help, but shouldn't be at fault (directly) in this case.

Actually, the fact it's virtualized is more than likely the actual problem.

rogerjmw · Dec 14, 2014

Oh dear, it looks like the pool is trash.

I tried building a new 9.2.1.9 VM and importing the zpool automatically as well as direct from the command line using zpool import command and both ways it goes straight into a kernel panic.

At this point, I think I will give up on trying to get this zpool back and just go ahead and build a new one, chalking this one up to experience.

For the record, I do have an APC SmartUPS 2200 but I am unable to use it since the house was rewired - after a few seconds of being plugged in, it always trips the power circuit RCD. I am loath to invest in another heavyweight UPS if it's liable to suffer from the same issue. :(

I'm interested in why you think the virtualized environment is more susceptible to this sort of corruption. I have disabled write caching on all the data disks which are essentially just a JBOD. Is there anything else I should be doing to reduce the risks of this corruption happening again? That is assuming I keep the same virtualized hardware configuration. I was hoping the use of RAID-Z2 would reduce the risk of total data loss due to hardware issues to the vanishingly small but it seems a simple power outage can corrupt the whole zpool and render it useless. As it stands, my confidence in zfs has taken rather a knock in terms of its ability to keep my data safe.

mjws00 · Dec 14, 2014

The odds on a zpool corruption during power loss are the same as any other file system. Less if you consider zfs is almost alway multi-disk redundant, and others are not. That may not be the case in a virtualized scenario as there are more moving parts, but it is the same for each filesystem. Anecdotally, I have a virtualized install that routinely gets its power kicked and sometimes hard crashes... the pools are always happy. I've been trying to find a natural corruption scenario... and frankly it won't happen.

I'd take a shot at a read-only mount, and some of the more invasive commands before I gave up. Unfortunately I doubt you'll get a walkthrough as the odds of success are low, so it would require reading and research on your part.

The problem with virtualization, is that it often ends just like this. In a perfect world, RDM would actually play nice but the reality is there are dozens and dozens of users that get burnt. There are too many variables to do much more than speculate as to the cause. So I'd consider any explanation pretty much an educated guess. In my head, it is along the lines of Unbuntu is cacheing, VMWare is cacheing, FreeNAS is cacheing, ZFS is expecting direct access... something doesn't get written correctly so we panic in order to keep from corrupting data. Or one of the other moving parts glitches and the pool is stuck in an unusable state. ZFS is pretty much built on the premise that the platform/hardware is reliable and COW is infallible. Unfortunately the guys that could fix it have much larger fish to fry, so no real progress has been made, imho. The best workaround is vt-d on proper server gear ala jgreco which allows for baremetal recovery scenarios.

There must be 1000 posts on here begging and pleading for you not to virtualize FreeNAS, as it often ends poorly. So the sympathy factor is quite small. I find the VM problems more interesting than inane install questions. But no one has time to get into deep zfs recovery on a virtualized platform that has been warned against over and over. Truth is if recovery is possible it is almost always accomplished easily.

Two bits. I'm a pretty moderate opinion.

Important Announcement for the TrueNAS Community.

Stopped at kdb_enter+0x3b: movq $0,0xaf01d2(%rip) on boot

rogerjmw

Cadet

Ericloewe

Server Wrangler

mjws00

Guru

cyberjock

Inactive Account

rogerjmw

Cadet

mjws00

Guru

Similar threads

Important Announcement for the TrueNAS Community.

Stopped at kdb_enter+0x3b: movq $0,0xaf01d2(%rip) on boot

rogerjmw

Cadet

Ericloewe

Server Wrangler

mjws00

Guru

cyberjock

Inactive Account

rogerjmw

Cadet

mjws00

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Stopped at kdb_enter+0x3b: movq $0,0xaf01d2(%rip) on boot"

Similar threads