Write behavior for iSCSI?

Status
Not open for further replies.

rruss

Dabbler
Joined
Sep 14, 2015
Messages
35
I have a general question about iSCSI that I will ask within the context of my setup.

I have a FreeNAS host running the following hardware:
C2750D4I 8-core atom motherboard
16GB ECC RAM
4x4TB WD RED

The 4x4TB is configured to provide an 8TB volume, and under that volume there are various NFS and CIFS shares, as well as an iSCSI.

I have an ESXi 6 host which uses the iSCSI as it's primary datastore for the disk images of the VMs, as well as a collection of linux physical hosts which mount the NFS shares, and windows laptops which mount the CIFS.

I see what looks to me as strange behavior on the iSCSI when data is written to the disk of any of the VMs (either linux or windows). As the data is written, there is a corresponding amount of data read (but less because it is being read across 4 disks). The data is coming from a dedicated disk in another machine on the network and not originating on the NFS shares, or anywhere on the FreeNAS host.

So - in short - my question is why is an iSCSI write generating so much read traffic on the disks? Is this normal? This behaviors seems to cause the system to write at only 400Mb/s, while it can read over iSCSI at 800Mb/s.

For NFS, read and write speeds are about equal at around 800Mb/s.

It might be important to note that we saw this same behavior when the NFS/iSCSI traffic was all on the same network interface, and it continues to behave the same when with the iSCSI traffic split off onto a dedicated NIC with a connection directly between the ESXi host and the FreeNAS host.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
while it can read over iSCSI at 800Mb/s.
Reading is fast if you have a small amount of frequently-accessed blocks that ZFS can cache in RAM.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
This is akin to saying you have an 8 cylinder engine, but we don't know if it's a Straight 8 or a V-8.

Are the 4x4TB drives in a RAIDz2 volume or a stripe of mirrors? The latter is preferable for use with iSCSI and NFS.

The 4x4TB is configured to provide an 8TB volume, and under that volume there are various NFS and CIFS shares, as well as an iSCSI.
 

rruss

Dabbler
Joined
Sep 14, 2015
Messages
35
The array is a stripe of mirrors. Sorry - I should have noted that.

I'm intentionally moving around a variety of files that are much larger than physical memory so that I can see the sustained read/write levels.

Remember - when I read or write to NFS, it seems to be mostly limited by network bandwidth to around 800Mb/s. iSCSI is the same on reads, but half that on write. The difference I'm seeing between NFS writes and iSCSI writes is the nature of disk activity during the write. For an NFS write, the disk activity is pure writes. For iSCSI writes, there is continuous reads from the disks during the write. All data is coming from outside the freenas host, so there is no reason for read activity on these disks unless it is part of the write process for iSCSI.

Does anyone else see this write/read combined activity on their drives during an iSCSI write? Is this some inherent write/read/verify behavior of iSCSI?
 

rruss

Dabbler
Joined
Sep 14, 2015
Messages
35
Here is the system performance over 5 different scenarios. Both NICs are intel, I'm showing just one drive, but ada2/3/4/5 are identical and show the same traffic profiles. igb0 is general network link (with all NFS traffic). igb1 is dedicated link to ESXi host for iSCSI only. All operations are on linux CentOS 6.7 hosts.

I've labeled 5 scenarios below. I show data moved to/from NFS mounts, to/from ISCI mounts, and to/from a "local disk" on a physical host (networked but not in an ESX or Freenas, call it drive X below).

A - copy from drive X to iSCSI disk (notice reads on ada2 even though we're only writing to the disk) Network igb1 limps at 400Mbps
B - copy from drive X to NFS (notice that ada2 is purely writes) Network full speed at > 900Mbps.
C - copy from iSCSI disk to drive X (Network still not full speed, but faster than writes at 600Mbps)
D - copy from NFS to another NFS share (Network at 600Mbps read and write simultaneously - drives can sustain writes while reading well above levels in case A)
E - copy from NFS to drive X (Network at about 800Mbps)
F (not shown) - copy from drive X on one host to drive Y on another physical host runs at > 900Mbps, same as B - so drive X above can easily read/write at full network speed and isn't a bottleneck in any of these tests)

My observations - CPU is never heavily loaded, so that doesn't seem to be a bottleneck for iSCSI. So the biggest mystery is still the "unnecessary" reads that are happening during an iSCSI write.

transfers.png
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
What is the recordsize/blocksize of your ZVOL? One easy way to get reads is if the iSCSI is sending write requests which are smaller than the blocksize. It would then have to read the existing block so it can rewrite it with the smaller change. iSCSI being SCSI, the writes are most likely 512-bytes. That could also add network overhead vs. NFS.

Maybe temporarily set your ZVOL sync=disabled and see what happens.
 

rruss

Dabbler
Joined
Sep 14, 2015
Messages
35
rs225 - thank you, that is very helpful insight.

The ZVOL appears to be 16k block size, and the iSCSI is indeed 512 bytes. If the block-level iSCSI behavior is to read a 16k block to modify a 512 byte portion and re-write the 16k block, that could certainly explain things. And framed in this way, I'm now seeing other threads on the forum that discuss this. From what I'm reading, however, it would appear that iSCSI is asynchronous by default... so I'm confused by the thought that my read/write iSCSI behavior could be a side effect of sync. I can tell you that prior to implementing iSCSI, I had the vm datastore on freenas mounted to the ESX host over NFS, and it was orders of magnitude slower*. It was completely unusable. This system is perfectly usable, it just appears to be running at about 1/2 the speed of NFS writes and i was hoping to understand why.

* Edit - I went back and ran a VM with it's storage on NFS, and did a copy from a networked disk to the local disk (which is over NFS), and the write ran at 50Mbps. So not orders of magnitude slower, just one order of magnitude slower than the current iSCSI performance.
 
Last edited:

rs225

Guru
Joined
Jun 28, 2014
Messages
878
I'm confused by the thought that my read/write iSCSI behavior could be a side effect of sync.

Probably isn't, if you know your iSCSI is async.

And for other readers, async iSCSI or NFS risks corruption of the virtual disks in case of ZFS panic/crash/powerfail. That's why ESXi defaults to 'slow/safe' NFS. The proper solution is a ZIL on SSD or other flash.
 

rruss

Dabbler
Joined
Sep 14, 2015
Messages
35
I don't know the iSCSI is async - I'm making assumptions here based on information garnered from other posts on the forum.

In general - people seem to find NFS to be very slow with ESXi due to sync writes, and iSCSI seems to alleviate that for many people by being async. I'm not sure which side of the link is causing it to be sync. Is this something set on the ESX side?

So, I was assuming based on those discussions that my iSCSI would be async, which then made me question whether sync could be what is causing the 'read during write' behavior that I'm seeing.

I've looked everywhere in the FreeNAS GUI and I can't see where to set sync=disabled, and I've searched the forum and documentation as well, but clearly I'm over looking something. If anyone can point me in the right direction here, I'd appreciate it. And as rs225 and others have said - I'm not suggesting that sync be turned off on a production system!! I just want to experiment to understand what is causing certain behaviors on my system so I can tune them WITH sync turned on.

Thanks - for this help and the rest of the information I've been reading on this forum over the last few months bringing my freenas system online.
 

rruss

Dabbler
Joined
Sep 14, 2015
Messages
35
Ok - I apologize for the rapid-fire questions here... But I think I'm converging on what I need to fix, I just can't quite figure out the process to do it.

I've found that my zvol has a block size of 16k (default), and the iSCSI mapped to it is using a block size of 512.

Is there a way to change the block size of my zvol without destroying it and starting over? I can shut down the ESXi system that uses it, and turn off iSCSI on freenas to make the zvol idle. If there is a way to restructure the zvol without destroying the data, that would be ideal. Otherwise, I need to move all of my data to another location, recreate the zvol with block size 512, rebuild the iSCSI configuration and then copy my data back. All possible, but it's certainly the long way around.

Thanks in advance for your continued help.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I don't know the iSCSI is async - I'm making assumptions here based on information garnered from other posts on the forum.

In general - people seem to find NFS to be very slow with ESXi due to sync writes, and iSCSI seems to alleviate that for many people by being async. I'm not sure which side of the link is causing it to be sync. Is this something set on the ESX side?.

ESX? Or ESXi? Different things...

ESXi requests sync for NFS writes because it is the paranoid and correct thing to do.

There's no good way for a client to request sync with iSCSI that doesn't screw up someone's implementation somewhere so it isn't done.

Of course, you can disable sync for NFS within FreeNAS (from the CLI) or enable sync for iSCSI as appropriate.
 

rruss

Dabbler
Joined
Sep 14, 2015
Messages
35
ESXi 6 (I'm new to VMware as well)

Now that I went back and see that NFS with sync limits me to 50Mbps, the fact that I'm limited to 400Mbps with iSCSI makes me think that sync is not my issue at all, and my iSCSI is indeed running without sync. I'm now thinking that the block size mismatch might be the culprit, and causing continuous reads during my writes, and slowing me down to about 1/2 speed.

I'd like to get the iSCSI/ZVOL block size set to the same 512 (or 1024, 2048 or 4096 if those make more sense). Right now the ZVOL is 16k because that was the default, and 16k isn't an option for the iSCSI configuration so I have no way to match it without changing the underlying zvol.

Once I get that worked out, I'd like to turn on sync for iSCSI... and if that slows it way down then I'd like to explore the ZIL on SSD.

Does this sound like a reasonable path to explore?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Sorry things are a bit busy here so I don't have a lot more help for you. Experiment to identify what works best. Remember that benchmarks aren't always representative of performance in production, especially over time. Also be aware that adding a poorly selected SLOG device (i.e. one without supercap) is probably worse than simply omitting the SLOG and running without sync, because it's fooling you into believing that you're protected from an unlikely event and it is substantially reducing your throughput while not providing useful protection.
 

rruss

Dabbler
Joined
Sep 14, 2015
Messages
35
Thanks for the insight.

Rather than trying to migrate my existing data from 16k block size to 512 bytes, I created a new zvol with block size set to 512 bytes, and created another iSCSI target on that zvol. (I fought with my ESXi not seeing the new iSCSI until I realize that the ESXi had a firewall that was blocking iSCSI on ports other than 3260!) Once I had the iSCSI visible, I built a new linux VM with a disk on it. I tried to do my copy from a non-freenas networked drive onto the VM's local file system to test iSCSI writes, and with the write chugging along at about 25MB/s, and before I could see the disk I/O graph to see if it was generating any read traffic on the disks... freenas crashed. (I won't bog this thread down with debugging the crash. I've had this issue for some time, despite what should be a stable platform, intel NICs, etc. Once I work through that, i'll come back to this issue.)
 
Status
Not open for further replies.
Top