possible resource starving of iscsi target

Status
Not open for further replies.

featherly

Cadet
Joined
Oct 2, 2012
Messages
3
FreeNAS box is:
Build - FreeNAS-8.2.0-RELEASE-p1-x64 (r11950)
Platf - Intel(R) Core(TM) i5-2405S CPU @ 2.50GHz
Mem - 8093MB
NICs - 2 Intel Pro/1000 PCIe with MTU 9014, iscsi on one, file share on the other
System - 8G USB 3.0 Flash
Data - 2 WD20EARX 2T drives in a ZFS mirror


There is a Xen server/cluster connected to an iscsi target along with an NFS share on the single ZFS volume. When there is heavy - 110 to 120 megabytes per second for more than 30 or 40 seconds - incoming load on the file share extent (via NFS, FTP or CIFS doesn't matter) the Xen server crashes and reboots. Data arrives full speed, drops to zero, goes back to full speed, drops to zero again ... with a period of about 15 seconds.

Is it likely a ZFS bug, iscsi bug, OS bug or tuning problem?
 

featherly

Cadet
Joined
Oct 2, 2012
Messages
3
problem isn't in file sharing leg

FreeNAS box is:
Build - FreeNAS-8.2.0-RELEASE-p1-x64 (r11950)
Platf - Intel(R) Core(TM) i5-2405S CPU @ 2.50GHz
Mem - 8093MB
NICs - 2 Intel Pro/1000 PCIe with MTU 9014, iscsi on one, file share on the other
System - 8G USB 3.0 Flash
Data - 2 WD20EARX 2T drives in a ZFS mirror


There is a Xen server/cluster connected to an iscsi target along with an NFS share on the single ZFS volume. When there is heavy - 110 to 120 megabytes per second for more than 30 or 40 seconds - incoming load on the file share extent (via NFS, FTP or CIFS doesn't matter) the Xen server crashes and reboots. Data arrives full speed, drops to zero, goes back to full speed, drops to zero again ... with a period of about 15 seconds.

Is it likely a ZFS bug, iscsi bug, OS bug or tuning problem?

Further testing shows the same failure mode when simply using a dd write load to the disk so file sharing bits are out of the loop. I'm guessing it happens right after the RAM cache for the disk fills up. I'll try and increase the cache size and see if the behavior changes.

Also upgraded to 8.3.0 Release and see no change in failure mode.
 

Yell

Explorer
Joined
Oct 24, 2012
Messages
74
So your iSCSI CLIENT crashed when writing data ?
When you have time, setup a simple Virtual machine which a supply a iscsi target and test if the iscsi-client still crashes
[prefered a different machine than the freenas box]

This should rule out whenever your freenas box is to blame.

You might want to add a dedicated ZIL mirror for heavy load to decrease the seek times of the data drives.
(should see a HUGE performance gain for sync write like NFS)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
No, this is a known issue. See bug #1531. ZFS needs to be carefully tuned for heavy write loads if you don't want the system to go catatonic for (short?) periods. ZFS likes to pile up writes into a "transaction group" (TXG) and then dump it out to storage all at once; if you are piling up data to be written more quickly than your I/O subsystem can actually handle, the ZFS subsystem will helpfully make your applications (in this case, iSCSI or dd) wait until it is flushed. The interactions are nonobvious unless you've worked with this for a while.

First, the overall speed of your drives is much higher than ZFS may be able to actually commit data, especially if you're using something like RAIDZ or RAIDZ2, or to a much lesser extent, mirroring. For example, on a 4-drive RAIDZ2 setup that had drives capable of writing at about 70MB/sec each, the pool was exhibiting horrifying behaviour trying to write data at 70MB/sec to the RAIDZ2.

Second, more memory can make the problem worse, because ZFS sizes its txg groups in part based on memory size. This can be tuned manually, however, so reducing memory is not necessarily a good solution. You don't want to increase the cache size, you want to DECREASE it.

Third, fixing this problem is pretty much impossible due to the design of ZFS, at least as far as I can tell. It can be made tolerable, however, which effectively means reducing the performance of the pool to the point where it isn't trying to cram more data out to the pool than the pool can cope with, AND making the values for frequency-of-flush and size-of-flush fit into an iSCSI-compatible performance envelope. As you noticed, many clients freak when their SCSI devices don't work in a generally reasonable manner.

Fourth, the problem probably gets much more tolerable as the number of devices in a pool increases, because of course the number of IOPS increases dramatically. A mirror is the best you can get for IOPS with data protection, as far as I can tell, so you're already in a good place there.

Fifth, for gigabit-level file service, then, the limiting speed of gigabit will serve as a choke point that artificially enforces a limit on how much data ZFS is being asked to write. Once your pool is able to sustain that level, you've effectively mitigated the problem. This is not the same thing as fixing the problem, however. I suspect there are lots of ZFS pools out there that "work fine" despite having an unreasonably large txg size, and it's because there is some other aspect of the system limiting the actual txg size. Those systems might melt under a dd. You'll want to remember that if you "mitigate" the problem.

Sixth, iSCSI extents... ZFS is a copy-on-write filesystem. That means that if you write blocks 0, 1, 2, 3, 4, and 5 in your iSCSI extent, and then later rewrite block 3, block 3 will be written elsewhere in the ZFS pool, and your blocks of data are no longer contiguous. You probably want to avoid unnecessary small writes to iSCSI extents on ZFS. You probably also want to keep more free space on your ZFS pool than the average ZFS user (normal advice is beyond 75-80% capacity and you start hurting); more free space means that ZFS is more likely to be able to allocate space nearby for written blocks, therefore reducing the impact of nonlocality for sequential reads on your iSCSI device.

Seventh, while ZFS is great and all that, it is worth considering whether ZFS's features are required for what you are trying to accomplish. I've been tending towards thinking that ZFS is not relevant for our iSCSI needs here, or at least not our primary iSCSI needs. FreeNAS supports UFS as well, which seems to work blazingly fast.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Good read jgreco. I've seen alot of threads with problems between using iscsi and ZFS. I'm not an iscsi user myself, but from reading other threads, and this one, I'm starting to question the validity of ZFS for use as iscsi extents. Maybe the recommended configuration should be to use UFS.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yeah, well, if you've been paying any attention, you know I've been fighting this for a long time. As much as I'd love for FreeNAS and ZFS to be the ideal iSCSI SAN storage solution, it quite possibly isn't.

One of the real big issues is the copy-on-write thing. Quite frankly, ZFS snapshotting an iSCSI extent is probably not all that useful when compared to the performance hit that results from loss of locality for what-ought-to-be contiguous blocks on the extent, so there's a big fat question mark if you're deploying systems that aren't tuned to avoid superfluous writes to the filesystem (atime and other metadata for example, or automatic updates just for the sake of having "the latest", or building from source, or etc.)

On the other hand, I'm seriously considering putting together an Atom based system with all SSD for iSCSI; this is getting to the point of being reasonable cost. I've generally been an advocate of avoiding the one-size-fits-all storage models so popular in many places, so there's some appeal to right-sizing a solution for VM storage and then one for general purpose file storage, or even more fine-grained than that...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Actually, I haven't been paying attention that YOU have been fighting this for a long time. I have noted multiple posts(I'm guessing a majority were from you) that iscsi just doesn't work as well with ZFS as we'd like to think. It would be interesting to see someone take their FreeNAS server and do iscsi on a standard ZFS zpool, then test the same iscsi drives on SSD and then on UFS with SSD(and perhaps spinning rust drives too).
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Done all of that. iSCSI on SSD with (or without) ZFS, fscking fast and awesome (defined as: limited by gigE). iSCSI on UFS with standard mirrored drives, sequential read/write, speeds were somewhat lower, which I interpreted as being related to latency, but still extremely decent. Don't push for numbers, I don't have them right now. My highly unscientific but based on experimentation and years of storage experience suggested that you probably want your pool to be capable of at least two or three times as much I/O capacity as you expect to need, at least with a small number of disks.

I think the real problem is that ZFS itself doesn't work as well as we'd like to think. As with so many of the things that Sun did with Solaris, there are upsides and downsides to abstraction layers and complicated subsystems. I have very little doubt that ZFS would be awesome the more drives you can throw at it, as long as the I/O can be spread out across the pool. However, on these smaller systems, some of the "big scale" design decisions hurt little systems. FreeNAS has tried to work around some of that by suggesting large amounts of memory, but memory and cache are only one of the things that I'm talking about. The number of spindles is harder to work around. Bug 1531 and related testing clearly showed that slower disks were substantially more problematic than faster disks, but even faster (non-SSD) disks could be problematic.

The best solution seems to be to trade off performance-in-terms-of-MBytes-per-second for performance-in-terms-of-responsiveness; you can make your FreeNAS+ZFS guaranteed-slower but significantly-more-responsive via tuning.

Or you can ditch ZFS and use UFS, losing all the benefits of ZFS but gaining a more traditional I/O subsystem responsiveness. One should note that it is possible to melt UFS into a catatonic state too, it just seems much harder to do.
 

featherly

Cadet
Joined
Oct 2, 2012
Messages
3
Thanks for the info and background on the problem. Have been away for awhile so just now seeing it. Being new to FreeNAS I did not realize UFS was an option.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
UFS is an option. If you read the updates to 1531 and follow the steps I outline to characterize your performance capabilities, you can then follow the steps I outline to tune writes appropriately and you'll suddenly have a very responsive ZFS as well, I think. ZFS could be completely awesome for many use cases due to the ARC and L2ARC; the ability to have your working set on an SSD is an awesomely tempting thing.
 
Status
Not open for further replies.
Top