iSCSI Performance Problems

Status
Not open for further replies.

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
I set up some network attached iSCSI drives over a 10Gbit network and the performance of it looks like this:
Screenshot_1.png.fe736d6cb4daf96f8d166220a1e97a2a.png

From what I have tested it copies about 6~8GB before it crashes like this. For the remainder of the transfer it will jump between 100 - 300MB/s.

The physical connection from my computer to the server goes:
Broadcom BCM57810S > SFP+ transceiver > OM4 fiber > SFP+ transceiver > Ubiquiti Networks US-16-XG-US > SFP+ transceiver > OM4 fiber > SFP+ transceiver > Broadcom BCM57810S

I don't know what could be causing this issue. Ideas?

Other Details:
The physical connection is actually double (I've run 2 fiber cables on both devices to the switch) because it's a dual port card on both ends with the goal being to enable multipath and acquire 20Gbit. The caveat of this is that multipath for whatever reason appears to be broken on Windows 10 Pro. Regardless it is enabled.

I set the MTU on the client NICs to 9014 and on the servers to 9014 to alleviate the load off the CPU as it was causing a bottleneck on the server. Jumbo Frames was also enabled on the switch.

I set the zvol blocksize to 128K and the iSCSI logical blocksize to 4096
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Other Details:
The physical connection is actually double (I've run 2 fiber cables on both devices to the switch) because it's a dual port card on both ends with the goal being to enable multipath and acquire 20Gbit. The caveat of this is that multipath for whatever reason appears to be broken on Windows 10 Pro. Regardless it is enabled.
For the time being, take that configuration out of the entire system on both ends. Lets get one working first. Walk before we run.
I set the MTU on the client NICs to 9014 and on the servers to 9014 to alleviate the load off the CPU as it was causing a bottleneck on the server. Jumbo Frames was also enabled on the switch.
Why don't you tell us all about the hardware in the FreeNAS system because that is most likely the source of your issues. No detail is too small.
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
For the time being, take that configuration out of the entire system on both ends. Lets get one working first. Walk before we run.
I've removed the 2nd NIC from the configuration on both ends and rebooted both machines.

Why don't you tell us all about the hardware in the FreeNAS system because that is most likely the source of your issues. No detail is too small.
Motherboard: ASRock EP2C602-4L/D16
CPUs: Intel Xeon E5-2670's
RAM: 16x8GB ECC UDIMM from Kingston
HBAs: 3x LSI 9207-8i
NIC: Broadcom BCM57810S
SSDs: 12x 960GB Intel DC S4500
HDDs: 8x WD gold 2TB

The SSD's are an encrypted 12 drive raidz2
The HDD's are an encrypted 8 drive raidz2
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Which pool are you targeting with that copy? The SSDs should be able to keep up, but the HDDs could struggle if free space is badly fragmented, at a premium, or you've got a 10Gbps link and might honestly just be throwing too many bits at them. ;)

zpool list -v on both please, as well as are you using compression (should be "yes, LZ4") or dedup (should be "no, of course not")
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
Which pool are you targeting with that copy? The SSDs should be able to keep up, but the HDDs could struggle if free space is badly fragmented, at a premium, or you've got a 10Gbps link and might honestly just be throwing too many bits at them. ;)

zpool list -v on both please, as well as are you using compression (should be "yes, LZ4") or dedup (should be "no, of course not")
The SSD array is in the example. LZ4 is enabled. It might be worth noting that the performance is identical on both arrays over iSCSI. The same issue occurred so I don't think the array is at fault.

storage 10.4T 2.43T 8.00T - 1% 23% 1.00x ONLINE /mnt
raidz2 10.4T 2.43T 8.00T - 1% 23%
gptid/9089f1a9-66bc-11e8-b6e5-d05099c1f0d5.eli - - - - - -
gptid/90e389c1-66bc-11e8-b6e5-d05099c1f0d5.eli - - - - - -
gptid/9141df16-66bc-11e8-b6e5-d05099c1f0d5.eli - - - - - -
gptid/91a08384-66bc-11e8-b6e5-d05099c1f0d5.eli - - - - - -
gptid/91ffdd22-66bc-11e8-b6e5-d05099c1f0d5.eli - - - - - -
gptid/92652bb7-66bc-11e8-b6e5-d05099c1f0d5.eli - - - - - -
gptid/92cedf75-66bc-11e8-b6e5-d05099c1f0d5.eli - - - - - -
gptid/933b3a12-66bc-11e8-b6e5-d05099c1f0d5.eli - - - - - -
gptid/93a85b3d-66bc-11e8-b6e5-d05099c1f0d5.eli - - - - - -
gptid/9415324a-66bc-11e8-b6e5-d05099c1f0d5.eli - - - - - -
gptid/948b30a2-66bc-11e8-b6e5-d05099c1f0d5.eli - - - - - -
gptid/9501608c-66bc-11e8-b6e5-d05099c1f0d5.eli - - - - - -
storage-backup 14.5T 2.60T 11.9T - 0% 17% 1.00x ONLINE /mnt
raidz2 14.5T 2.60T 11.9T - 0% 17%
gptid/e60f23c5-bb7b-11e8-b222-d05099c1f0d5.eli - - - - - -
gptid/e6f7094b-bb7b-11e8-b222-d05099c1f0d5.eli - - - - - -
gptid/e7eba46c-bb7b-11e8-b222-d05099c1f0d5.eli - - - - - -
gptid/e8e96192-bb7b-11e8-b222-d05099c1f0d5.eli - - - - - -
gptid/e9d316fb-bb7b-11e8-b222-d05099c1f0d5.eli - - - - - -
gptid/eab9c94a-bb7b-11e8-b222-d05099c1f0d5.eli - - - - - -
gptid/ebb79bf0-bb7b-11e8-b222-d05099c1f0d5.eli - - - - - -
gptid/ec9f9a7b-bb7b-11e8-b222-d05099c1f0d5.eli - - - - - -
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I'm going to bring in the old DTrace script here again because the "transfer rate drops off a cliff" is a pretty strong indicator of "your pool can't keep up."

Create a file dirty.d using vi or nano and dump this code in there:
Code:
txg-syncing
{
		this->dp = (dsl_pool_t *)arg0;
}

txg-syncing
/this->dp->dp_spa->spa_name == $$1/
{
		printf("%4dMB of %4dMB used", this->dp->dp_dirty_total / 1024 / 1024,
			`zfs_dirty_data_max / 1024 / 1024);
}


Then from a shell do dtrace -s dirty.d YourPool and wait. You'll see a bunch of lines that look like:

Code:
dtrace: script 'dirty.d' matched 2 probes
CPU	 ID					FUNCTION:NAME
  4  56342				 none:txg-syncing   62MB of 4096MB used
  4  56342				 none:txg-syncing   64MB of 4096MB used
  5  56342				 none:txg-syncing   64MB of 4096MB used


Start up a big copy and watch the first number.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
It would seem there was an error: dtrace: failed to compile script dirty.d: line 3: syntax error near ")"

Works here even on an old FN 9.10 box; what version of FreeNAS are you running?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Did you paste the code directly into an SSH session, or into a Notepad/shell session launched from the GUI?

I'm wondering if somehow the formatting got messed up; because that exact code runs just fine here. Maybe UNIX vs DOS CR/LFs.

My suspicion is that your 10Gbps link is still too much for your drives to absorb; for the spinning disks that's not a big surprise, but I'd have thought your SSDs would be up to the task. (They are all in a single Z2 though.)
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
Did you paste the code directly into an SSH session, or into a Notepad/shell session launched from the GUI?
First I tried copying it directly into the file using vim over PuTTY but it kept chopping off the first couple characters. I saved the file and then doubled back using SFTP in WinSCP and used their file editor to copy paste the data correctly. Within vim this is what I see right now:
Screenshot_1.png

I did change the permissions to the file using the chmod 700 dirty.d command so it was executable by root or a user with sudo permission. Perhaps I shouldn't have done that?

I can sustain writes of around 900MB/s to my SSD pool using Z2, not using iSCSI though, just SMB.
This is another interesting thing I experience. With SMB and no optimization using auxiliary parameters I only get right around 500MB/s HDD array & SSD array. W/ some optimization I only peak around 700MB/s but most of the transfer sits around 600MB/s. I've never seen full NIC utilization w/ SMB. iSCSI maxes the NIC but it crashes after about 8GBs. I will say though SMB usually doesn't crash.

I have experimented with other protocols. I'll willing to ditch iSCSI if something else saturates 10Gbit stably the only other protocol I haven't gotten working at all is NFS on windows.
 
Joined
May 10, 2017
Messages
838
I've never seen full NIC utilization w/ SMB.

I get full 10GbE with jumbo frames and the 10Gbe tweaks I found somewhere on the forum, transfers start at 1.1GB/s and then drop to around 900MB/s which appears to be maximum write speed of my SSD pool.

index.php


HDD pool starts at the same 1.1GB/s but then drops to around 400MB/s, when it was emptier an less fragmented it sustained around 600MB/s.

index.php
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
Build is in my signature, CPU is a Pentium G4560.
Yeah my chips are almost a full Ghz slower per core only difference being I have 4x as many per socket. I'm wondering if the CPU clock has anything to do with the bottleneck. To be honest I'm leaning towards no but you have to start the process of elimination somewhere.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The dtrace script will help narrow it down, but I'm still at a loss as to why it won't run. That's the exact script I have, that runs on my machine - is it 234 bytes exactly in an ls? If it's not, there might be an extra batch of characters.

Edit: I'm updating an extra machine from 11.1-U5 to 11.1-U6, once it's done I'll make sure the dtrace script still runs there.
 
Last edited:

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
The dtrace script will help narrow it down, but I'm still at a loss as to why it won't run. That's the exact script I have, that runs on my machine - is it 234 bytes exactly in an ls? If it's not, there might be an extra batch of characters.
Ditto, mine is also 234 bytes. I'll del & re-copy/paste the file from scratch but this time I wont touch the permissions or anything else. See what happens.

Update: It did exactly the same thing. Claims there's a syntax error on line 3.
 
Last edited:
Status
Not open for further replies.
Top