15000 ms response time during iSCSI writes

Status
Not open for further replies.

Astendagan

Cadet
Joined
Feb 6, 2016
Messages
3
I'm testing FreeNAS on a HP MicroServer (N40L). It has 8 GB ECC RAM (tested OK with memtest86), and 3 HDD (WD Red 4 TB, WD Green 2 TB, and a Samsung 2 TB) set as a 3 way mirror.

I wrote about 100 GB of data to the pool and everything is fine. Speed is consistent around 90 MB/s (the Samsung 2 TB is limiting here), scrubbing speed is about the same and found no problems.

I then created a zvol and shared it with iSCSI. The Windows 10 Pro client connects without problems but a test with with CrystalDiskMark shows serious problems during the write part.

During the read part everything is normal. The Windows Task manager shows a constant ethernet throughput and iSCSI disk transfer rate, the disk average response time varies a bit but doesn't go over 20 ms.

During the write part the disk average response time sits around 15000 ms most of the time, sometimes dropping to "only" 5000 ms but I've also seen it go up to 30000 ms a few times. The Windows Task Manager always shows the disk as 100% busy, but sometimes the ethernet throughput and disk transfer rate drop to zero for 15-20 seconds and then resume. The Windows Event Viewer shows errors and warning 9, 27, 39, 129 from the iScsiPrt source (target didn't respond in time, sent command to reset target, ...).

When this happens pinging FreeNAS is OK (< 1 ms response time), SSH connection is OK, top never shows the CPU being fully busy.

Bad/weird things I've seen:
- Some "ctl_datamove: tag 0xXXXXXXXX on (0:3:0) aborted" at the console and in /var/log/messages, but not always. I can go through a few whole tests with the same transfer stalls without seeing these messages.
- I've seen bsdtar taking 80-90% CPU in top one time?

Any advice?
 

Astendagan

Cadet
Joined
Feb 6, 2016
Messages
3
3 way mirror so 2 TB available. I wrote about 100 GB "locally" that are still there, then I created a 1 TB zvol for iSCSI.

I destroyed the zvol and recreated a 100 GB one, same behavior.
 

Astendagan

Cadet
Joined
Feb 6, 2016
Messages
3
I did some more tests, found 2 things:
- The very high response times and the "stall" of the network transfer and disk writes goes away if I completely disable the write cache for the iSCSI disk in Windows. But the performance loss is huge.
- The zvol block size was set to 4k (to be the same as the NTFS block size), the Windows event viewer errors disappear when setting the zvol block size to 16k (default) or 128k but the random write performance goes down. With zvol block size = NTFS block size the random write performance (if there is a queue) is the same as the sequential write because of the COW.

This makes iSCSI over ZFS less interesting. You lose 50% of the space because of the mirroring, then 50% again because you can't have a zvol too full because of the fragmentation caused by COW, then some performance loss or space loss (bigger NTFS block size) if you want to store small files.

I'd still be interested to know why I get these errors with a 4k zvol block size if anyone knows.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well, not to put too fine a point on it, but basically you've got a way undersized system and you're probably just asking too much out of it. 8GB isn't going to be a good starting point for iSCSI. 32GB is kinda the minimum I'd consider, but most iSCSI systems are better off at 64GB or more. (ZFS doesn't really care that your platform peaks out at 16GB).

Disabling the write cache is probably mostly just slowing down the flow of data to something your platform can manage. I'm kinda thinking that you're stressing the system in multiple directions that makes it hard for it to learn the pool characteristics; benchmarks can do that. You might find that it stabilizes if you push it hard for awhile.

You probably want to use a zvol block size around 4x the underlying sector size of the ZFS pool, so that's probably 16K. What that's getting you is mostly compression. Glad to see you've identified that sequential and random writes aren't likely to be significantly different speed-wise because of CoW (or more precisely because blocks are allocated the same way regardless). This is basically a situation where the best you can hope for is to not do any one thing very well, but to do most things somewhat awfully, and then optimize to get that bad situation as good as you can get it.

If you haven't already lowered the transaction group window from 5 seconds to 1, do so.

Then get another 4TB disk, and set it up as two vdevs, one a 2TB vdev, and the other a 4TB vdev.
 
Status
Not open for further replies.
Top