high write latency on iSCSI - possible after 9.3

reqlez · Dec 31, 2014

HI.

I have been running iSCSI with the old iSCSI stack before 9.3 without an SLOG and have been fine ( because the writes were async ).

Now I set-up a new rig with 9.3 ( using CTL this time, of course ) and the write performance on my ex 5.5U2 datastore mounted over iSCSI via 1GBPS network ( mtu 9000 ) is showing 169 millisecond write performance and 6 millisecond read performance ( latency ).

Am i missing something ? does CTL do SYNC writes by default now instead of async? I think the high latency is messing with my backup appliance, thats why I'm asking :)

mav@ · Jan 2, 2015

CTL does not change the default. By default the policy is set by ZFS and writes are still asynchronous. But CTL supports more cache control primitives then istgt supported, including DPO/FUA bits and cache control mode page. It allows initiator to request synchronous I/O if it wants to. I am n0t sure that VMWare uses any of those primitives, but I guess it may pass through those flags from virtual machine.

If for some reason somebody wish to override that behavior -- he can always set sync=disabled for specific ZVOL, but then data consistency is not guarantied.

cyberjock · Jan 2, 2015

Actually, things are much more complex than that.

CTL is kernel-mode iSCSI. It performs better when it is zvol based and performs worse when file-based.

The old istgt was userland iSCSI and performed better when file based.

So if you were doing file-based iSCSI extents then you were seeing the best performance you could get with pre-9.3 setup. But now that you have upgraded to 9.3 you are in the worst combo, and the best choice is to move to zvol based. Obviously this isn't easy and can't be done without destroying one to create the other.

I'm not saying that's your problem, but there's a bunch of machinations at work when you moved from pre-9.3 to 9.3, so you'll have to narrow them down.

It's very possible your MTUs are causing network problems too. We strongly discourage jumbo frames because they cause far far more problems than they do with gains. We've even got a sticky where we "stick it to the MTU" because it doesn't matter. It's one of those things where I have to laugh when people use jumbo frames because that was a best practice 10 years ago, but today it's almost a "worst practice".

jgreco · Jan 2, 2015

cyberjock said:
It's one of those things where I have to laugh when people use jumbo frames because that was a best practice 10 years ago, but today it's almost a "worst practice".

That might be a little unfair. However, if I've got 1GbE and want more speed, it is easier to contemplate the jump to 10GbE for about the same amount of annoyance/trouble/etc...

reqlez · Jan 2, 2015

Interesting remark about jumbo frames, I will sure test. I'm not concerned about the speed as in "throughput" more concerned about latency because I never saw it that high with my previous iSCSI arrays.

By the way, I am indeed using ZVOLS. I will test using ASYNC on dataset just to see if indeed vmware could be passing some extra parameters to force SYNC. of course maybe an SLOG is the way to go here but i will test before I consider it.

reqlez · Jan 12, 2015

okay i did a test with crystaldiskmark on a volume mounted to a VM via ESXi and i get 95 MB/S writes ... that means to me that sync writes are not happening. That is probably a bad thing since most people who deploy iSCSI with ESXi and freenas are not going to even look into enabling sync=always on a dataset and their data my get lost. How come CTL still doesn't respect the ESXi requests to write data as sync ? like NFS does ...

This still doesn't answer why i have not ideal latency but maybe thats because i'm using low RPM WD RED drives ...

jgreco · Jan 12, 2015

ESXi requests to write sync over iSCSI? Justify that claim please.

mav@ · Jan 12, 2015

CTL respects all existing SCSI synchronization primitives (Caching Mode Page, SYNCHRONIZE CACHE commands, and FUA bit). If after that writes are not synchronous, then it means it was not asked by initiator.

reqlez · Jan 12, 2015

thats still weird ... why would ESXi ask NFS to write sync but not iSCSI ... I guess i'll just set sync always on my important volumes that have an SLOG

About justifying my claim ... ESXi doesn't know what writes are supposed to be sync coming form a VM so it has to treat all writes as sync ... isn't that WHY you can run windows on a crappy H300 controller and be fine, but try to run ESXI on it and its slower than hell without a controller with a BBU cache ?

jgreco · Jan 12, 2015

Pretty sure I wrote the sticky on that one.

In any case, my recollection is that a variety of iSCSI gear does all sorts of stupid stuff if an initiator requests sync writes. By comparison, it is hard to screw up sync NFS. There's a tunable to tell ESXi to request sync over iSCSI, I just don't recall the details offhand.

reqlez · Jan 12, 2015

jgreco said:
Pretty sure I wrote the sticky on that one.

In any case, my recollection is that a variety of iSCSI gear does all sorts of stupid stuff if an initiator requests sync writes. By comparison, it is hard to screw up sync NFS. There's a tunable to tell ESXi to request sync over iSCSI, I just don't recall the details offhand.

Interesting ... i never seen that post, i only seen the post that explains why iSCSI is fast versus NFS lol

reqlez · Jan 12, 2015

Oh here is another question while i'm troubleshooting this... I hear that large block size sync writes ( over 64K ) go direct to the storage and bypass the SLOG or ZIL ... is there a way to increase this size ?

L · Jan 12, 2015

removed

reqlez · Jan 12, 2015

wait i'm confused. So if i change the block size on the ZVOL to 256K for example ...then you are saying any writes that are under 256K will still go to the ZIL ( SLOG SSD in my case ) but write requests that are over 256K will go direct to storage and bypass zil ?

cyberjock · Jan 12, 2015

reqlez said:
wait i'm confused. So if i change the block size on the ZVOL to 256K for example ...then you are saying any writes that are under 256K will still go to the ZIL ( SLOG SSD in my case ) but write requests that are over 256K will go direct to storage and bypass zil ?

I don't even know how you came to that conclusion at all. But not even close (and considering you can't go over 128k block size in ZFS you're already hypothesizing an impossible scenario). :P

reqlez · Jan 12, 2015

well why is she saying that you change block size! anyway, i make impossible possible !!! I'll just go in and modify ZFS source code and get 256kb block size going... not that hard ;-)

no but for real ... is that thing i heard about larger than 64kb NFS writes bypassing the ZIL is true or is that something SunOS related ???

cyberjock · Jan 12, 2015

reqlez said:
well why is she saying that you change block size! anyway, i make impossible possible !!! I'll just go in and modify ZFS source code and get 256kb block size going... not that hard ;-)

no but for real ... is that thing i heard about larger than 64kb NFS writes bypassing the ZIL is true or is that something SunOS related ???

Yes, in certain circumstances, writes that are >32KB an marked as sync writes from NFS will by pass the slog and go straight to the pool.

jgreco · Jan 13, 2015

I believe that's wrong, I don't think the SLOG is ever bypassed. Pools without a SLOG do hold an optimization in the form of "immediate write", a pretty good summary here:

http://www.racktopsystems.com/zfs-and-throughput-vs-latency/

Important Announcement for the TrueNAS Community.

high write latency on iSCSI - possible after 9.3

reqlez

Explorer

mav@

iXsystems

cyberjock

Inactive Account

jgreco

Resident Grinch

reqlez

Explorer

reqlez

Explorer

jgreco

Resident Grinch

mav@

iXsystems

reqlez

Explorer

jgreco

Resident Grinch

reqlez

Explorer

reqlez

Explorer

L

Guest

reqlez

Explorer

cyberjock

Inactive Account

reqlez

Explorer

cyberjock

Inactive Account

jgreco

Resident Grinch

Similar threads