iSCSI Write Error causing VMFS corruption

laser47 · Nov 16, 2011

I've been running ESXi at home for a while now, and have set up my datastores using a Linux mdadm / VMFS / iSCSI system. One of my drives was beginning to go bad in a mirror, and I decided to give FreeNAS a try to take advantage of ZFS.

I set up a new FreeNAS box, set up a new volume and shared out some zvolumes with iSCSI. My intention was to get them the guest machines to this box temporarily, then rebuild the 12-slot server I was moving them from and imprt the volumes there. I began to move some systems over using the VSphere console, and everything seemed to be working without an issue.

After getting most of the smaller systems migrated over the course of a week, I began to copy my large file store (about 2TB total) and let it run for a while. At some point, it failed and I began to get messages on the console:

Nov 16 00:40:33 freenas istgt[7437]: istgt_lu_disk.c:3960:istgt_lu_disk_lbwrite: ***ERROR*** lu_disk_write() failed
Nov 16 00:40:33 freenas istgt[7437]: istgt_lu_disk.c:6051:istgt_lu_disk_execute: ***ERROR*** lu_disk_lbwrite() failed
Nov 16 00:40:33 freenas istgt[7437]: istgt_lu_disk.c:3960:istgt_lu_disk_lbwrite: ***ERROR*** lu_disk_write() failed
Nov 16 00:40:33 freenas istgt[7437]: istgt_lu_disk.c:6051:istgt_lu_disk_execute: ***ERROR*** lu_disk_lbwrite() failed

I would get a few of these every minute, so I decided to restart the ESX host to reset the iSCSI connection to see if that would help. It had quite the opposite affect, and I began to get several of these every second. Some of my guest machines also became unavailable.

VMware has logs that it is trying to playback a journal, but that is failing. After trying several reboots and diconnects/reconnects I was able to see all but 2 of the VM guests. I've copied those out to a USB drive, and I'm pushing them back into my old Linux LVM box.

I found a page that suggested to scrub my ZFS volume, and that completed without errors. Also, S.M.A.R.T tests found nothing wrong with the drives. This doesn't seem to be a hardware issue with the disk subsystem, but something wrong with the istgt / VMware connection.

One of the guest machines that has disappeared has data on it that I don't have a recent backup for. When I look at the disk free pie chart in VMware, it shows much more used than I can view in the datastore browser. I think if I can get this write error to go away, it may replay the journal and I can get access to that machine long enough to copy it off.

My initial setup before the failure was:

Install FreeNAS 8.0.2-RELEASE x86 - this was a temporary machine
Create 4 x 2TB RAIDZ2 zpool
Create 3 x 1TB zvolumes
Configure iSCSI initiator / target / device extents (1 target, 3LUNs)
Attach ESXi initiator
Use VSphere console to move guest machines

After the failure, I tried:

Rebooting the ESX server
Shutting down the ESX and rebooting the NAS
Scrubing the zpool with ESX offline (15 hours, and no corruption found)
Exporting the zpool, rebuilding using 8.0.2-RELEASE-amd64 and re-importing the volume
Re-configuring the iSCSI settings to have 3 targets, each with only 1 LUN (VMware didn't recognize the datastore and I had to revert back)

Any suggestion on how to fix this? I'm at the point of scrapping FreeNAS and going back to Linux LVM, but don't want to give up on the orphan machine until I have to.

jgreco · Nov 19, 2011

You trying to export the zvolumes directly? I wasn't finding any love for that between ESXi and FreeNAS as of pre-8.0R, so we've been doing file-based extents here and have had no problems to speak of. It works, and works well enough that our next round of ESXi boxes will each be sporting a FreeNAS VM to export their local resources into the network.

I was seeing >1Gbps read performance via iSCSI between a host and a FreeNAS VM hosted on it just today... very cool.

laser47 · Nov 19, 2011

jgreco, thanks for the reply

Yes, I exported the zvolumes using device extents. I followed the documentation at http://doc.freenas.org/index.php/ISCSI#Device_Extents, and follwed the device extent advice. If there's issues with external systems, it would be good for the documentation team to mention it there.

It worked fine for a week until I put it under load

The performance was better than Linux / LVM, but the "lack of love" as you describe is disheartening. Is there a way to covert the zvolumes that I have now

On VMware, the "used" space shows a larger amount used than I can actually access. I'm thinking that if I can mount these LUNs on another system, and byte-for-byte copy them to another set of LUNs on a Linux / LVM system, I might be able to exose them and journaling may be able to recover the data. I just need to be able to buy 4 more 2TB drives :(

jgreco · Nov 20, 2011

The thing that's generally spooked me about istgt configuration is that the configuration and the ground truth don't necessarily turn out to be the same thing. It bothers me, for example, that you can create an extent of a given size and the configuration can specify a different size, which wouldn't be dangerous if it's smaller, but is potentially a disaster if you're relying on being able to write to your entire device... I've avoided some complications this way. I think.

It isn't necessarily clear to me that ZFS is a winner at this sort of thing, though I expect specific configurations will be. I've had great performance with a SSD-based UFS system on limited memory, and I could mirror that and still have reliability. But I also expect that some of the problems I was running into involved the fact that FreeNAS was real BETA at the time. I'm old-school, though, I prefer my disk devices simple, and ZFS with its memory- and CPU-hunger kind of feel wrong anyways, so my punting to file-based extents should be taken with appropriate salt.

laser47 · Nov 27, 2011

I hope I managed to pull out of this, and wanted to explain what I had done in case anybody else Google stumbles across the thread.

I set up some temporary space on my linux LVM box using 1 of the 2TB drives out of the RAIDZ2 + some 500Gigs that I was replacing. I set this up as a single 3.5TB LVM Volume Group, and created 3 new Logical Volumes. When I created them, I used the volsize parameter from zfs get -p all (1099511627776 bytes i my case) and used that to create the Linux LV - i.e. sudo lvcreate -L1099511627776b -n LUN0 iSCSI-1-Temp. I wanted to make sure that there wan't going to be a difference that would haunt me again later.

I then set the iSCSI target on the FreeNAS box to read only - just to make sure that Linux didn't do something stupid with it - and configured the open-iscsi initiator on the Linux LVM box to connect to FreeNAS. Then I copied the 3 drives using dd, one at a time - i.e. sudo dd if=/dev/sda of=/dev/iSCSI-1-Temp/LUN0.

After 3 days of copying, I the re-configured the /etc/iet.confd on the Linux system to create a new iSCSI target, and connected my ESXi system to it. It saw the drives, and I was able to import the filesystem. :D I see the .vmdk files now, and my first step is to copy them out to my USB external drive - again, to make sure nothing corrupts them and I have to copy everything again. It's copying now, and going to take another few hours to complete.

This began as a write error, and the ESX system was unsucessfuly trying to replay a journal. By getting the data onto a writeable LUN, it was able to perform the replay and effectivly fsck the datastore.

I agree with jgreco that a difference between what is presented to the iSCSI initiator, and the real size of the device needs to line up or bad things happen. It's possible that I was experiencing something similar to http://forums.freenas.org/archive/index.php/t-1468.html?. I didn't have compression or dedupe on, but my used size was larger than the volsize. Here's some zfs command outputs if anybody's interested: View attachment ZFS Info.txt .

I plan on going back to my Linux / LVM model (now with backup goodness!) and not use FreeNAS for the forseeable future. I really do like the ZFS model and wish this hadn't been the case. If this is found to be a bug or someone suggests a better way that this could have been done, I may look back into it. :(

jgreco · Dec 1, 2011

Awesome followup and fix. The whole iSCSI thing is a bit complicated, and this just underlines that you really need to grasp what's going on - and even if you think you do, you can still end up needing to go a bit further to fix things.

I've been getting ready to transition some of our datastores onto SSD, and I've really been giving the whole iSCSI thing some serious thought and experimentation these last few weeks in order to get the most I can out of these expensive devices. Our typical OS image is in the 5-10GB range, and so I ought to be able to fit a bunch of VM's onto a pair of mirrored 240GB SSD's. I've been waffling about whether to go UFS or not, the UFS method clearly works just fine, but ZFS offers checksumming of your data... since the trick is that I want each VMware host to self-host its own NAS, I'd prefer to keep the memory requirements minimal (strike against untuned ZFS) and the annoyance of a crash (caused by under-resourced ZFS) is such that it would offset any convenience gains of checksummed data storage. ZFS is definitely more complex in the unintended side effects department, and threads like this give me insomnia. ;-)

James · Dec 2, 2011

Please create a ticket at support.freenas.org so the root cause can be determined and, if workarounds are required, they can be documented.

Important Announcement for the TrueNAS Community.

iSCSI Write Error causing VMFS corruption

laser47

Cadet

jgreco

Resident Grinch

laser47

Cadet

jgreco

Resident Grinch

laser47

Cadet

jgreco

Resident Grinch

James

Guest

Similar threads

Important Announcement for the TrueNAS Community.

iSCSI Write Error causing VMFS corruption

laser47

Cadet

jgreco

Resident Grinch

laser47

Cadet

jgreco

Resident Grinch

laser47

Cadet

jgreco

Resident Grinch

James

Guest

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "iSCSI Write Error causing VMFS corruption"

Similar threads