64TB iSCSI targets on Storinator

SweetAndLow · Jan 26, 2017

RAIDTester said:
OK Thanks.
What's the best card I can get for the amount of drives?

Depends on your expander. You can find more card examples in the hardware guide.

Sent from my Nexus 5X using Tapatalk

RAIDTester · Jan 26, 2017

SweetAndLow said:
Depends on your expander. You can find more card examples in the hardware guide.

Sent from my Nexus 5X using Tapatalk

Thanks so much. I'll do some research.

tvsjr · Jan 26, 2017

Forget the iSCSI bit for the moment, that matters not because you're testing on the local box. Get the performance as tested by dd to rock, then you can start looking down the line. At this point, you should be concerned about only what's in the box.

Just out of random curiosity, try this from the command line and post results. Obviously replace /dev/somedrive with a path to one of your pool drives.
camcontrol modepage /dev/somedrive -m 0x08

RAIDTester · Jan 26, 2017

camcontrol modepage /dev/da31 -m 0x08
camcontrol: error sending mode sense command

tvsjr · Jan 26, 2017

Huh. That's odd. You should get a report back including a bunch of data, the key point being WCE - write cache enabled. I got a batch of older drives a while back that were 528-byte-sector NetApp drives with write cache disabled, and the performance was dismal. All of my drives... SAS, SATA, magnetic or SSD... respond to that command.

Holt Andrei Tiberiu · Jan 28, 2017

On the sekond HBA, wich has 30 disks, can you do a riad 10 (striped mirrors ) and leave lz4 compresion on.

RAIDTester · Jan 30, 2017

Holt Andrei Tiberiu said:
On the sekond HBA, wich has 30 disks, can you do a riad 10 (striped mirrors ) and leave lz4 compresion on.

We've done it without lz4. It wasn't that great. See my posts earlier in the thread.

Vendor agreed to send us LSI controllers. We'll see what happens with that.

Weird thing is, I saw the pool pushing over 900MB/s. Checked it out - it's running a scrub. So it seems that the controller can actually push data at more than acceptable speed for a scrub, but not for iSCSI. Starting to suspect it's actually NOT the controller.

Did anyone ever see anything like this?

bigphil · Jan 30, 2017

What model are the Intel NIC's? Do you have multiple interfaces on the FreeNAS target and Windows initiator or are you configured to only use one 10Gbe interface on each box?

RAIDTester · Jan 30, 2017

bigphil said:
What model are the Intel NIC's? Do you have multiple interfaces on the FreeNAS target and Windows initiator or are you configured to only use one 10Gbe interface on each box?

We're using Intel X540-T2 10GBE adapters
2 interfaces on the FN box - going to 2 interfaces on the Windows box. MPIO

bigphil · Jan 30, 2017

There is another thread with a user having iSCSI issues with those same NIC's....maybe a coincidence, but interesting nonetheless. From an elevated command prompt on the initiator, what is the output of "mpclaim -s -d" and then "mpclaim -s -d num" where num is the disk number from the previous command, i.e. "mpclaim -s -d 1" for Disk1.

RAIDTester · Jan 30, 2017

bigphil said:
There is another thread with a user having iSCSI issues with those same NIC's....maybe a coincidence, but interesting nonetheless. From an elevated command prompt on the initiator, what is the output of "mpclaim -s -d" and then "mpclaim -s -d num" where num is the disk number from the previous command, i.e. "mpclaim -s -d 1" for Disk1.

MPIO Disk1: 02 Paths, Round Robin, Symmetric Access
Controlling DSM: Microsoft DSM
SN: 6589CFC00000011FE0FA73E244CCB84F
Supported Load Balance Policies: FOO RR RRWS LQD WP LB

Path ID State SCSI Address Weight
---------------------------------------------------------------------------
0000000077010002 Active/Optimized 001|000|002|000 0
TPG_State : Active/Optimized , TPG_Id: 1, : 3

0000000077010000 Active/Optimized 001|000|000|000 0
TPG_State : Active/Optimized , TPG_Id: 1, : 3

MatthewSteinhoff · Jan 30, 2017

RAIDTester said:
We're using Intel X540-T2 10GBE adapters

I wouldn't even worry about the network until you get reasonable performance locally. If dd is slow to read and write, nothing at the network side is going to make it better.

What SSDs are you using? If you create a pool without SSDs is it still slow?

My guess is that the Highpoint Rockets are the bottleneck. Never used them with FreeNAS because I had so many problems with them before FreeNAS. I know some people swear by them but I've pretty much only sworn at them.

Cheers,
Matt

RAIDTester · Jan 30, 2017

MatthewSteinhoff said:
I wouldn't even worry about the network until you get reasonable performance locally. If dd is slow to read and write, nothing at the network side is going to make it better.

What SSDs are you using? If you create a pool without SSDs is it still slow?

My guess is that the Highpoint Rockets are the bottleneck. Never used them with FreeNAS because I had so many problems with them before FreeNAS. I know some people swear by them but I've pretty much only sworn at them.

Cheers,
Matt

That's been the thinking up to now. LSI replacements are on the way. I just can't believe how a scrub was hitting even 1GB/s+ and I can't even get dd performance similar to that!

bigphil · Jan 30, 2017

RAIDTester said:
MPIO Disk1: 02 Paths, Round Robin, Symmetric Access
Controlling DSM: Microsoft DSM
SN: 6589CFC00000011FE0FA73E244CCB84F
Supported Load Balance Policies: FOO RR RRWS LQD WP LB

Path ID State SCSI Address Weight
---------------------------------------------------------------------------
0000000077010002 Active/Optimized 001|000|002|000 0
TPG_State : Active/Optimized , TPG_Id: 1, : 3

0000000077010000 Active/Optimized 001|000|000|000 0
TPG_State : Active/Optimized , TPG_Id: 1, : 3

MPIO settings look good. Strange that scrub was seeing good perf but not DD test.

RAIDTester · Feb 2, 2017

Got the LSI cards installed. Performance does not seem to have increased.
Single disk performance doesn't even hit 7MB/s

Cache seems warm

Evi Vanoost · Feb 2, 2017

a) So what is the current configuration of your pool? (15 VDEV's of 2 6TB mirrors?). A printout of zpool status perhaps? Is the pool empty when you do these tests?

b) What is the iostat of ALL pool(s) on the controller(s) "zpool iostat 1"

c) Have you checked the SMART status of all your drives, even if you don't get any reported errors, a single drive could tank the performance of an entire system.

d) You get an ARC hit ratio of 50% and L2ARC Hit Ratio of 3%. Is the pool in use? If you don't intend to use the caches for whatever reason (lots of streaming), you should perhaps reduce it's size.

e) What is the fill rate of your pool, if it's over 70%, your pool may be heavily fragmented. Get the relevant statistics using zfs.

A scrub may be able to hit a high IO because it's just sequentially reading everything while your actual data may be jumbled all over the place.

Another thing, you're using 6TB drives which should be 4k capable, yet your controller (HighPoint) only supports 512/512e on PCIe2.0, that slashes your individual drive performance significantly as each write your disk has to read/write an entire 4k block. Your maximum throughput will theoretically peak at 4GB/s for all drives divided over 30 (or 60) drives yet each write to your pool results in 2 writes (since they are mirrors) so you're looking at ~50MB/s per VDEV.

RAIDTester · Feb 8, 2017

Sorry for the delay, was waiting for a replace & resilver to finish

a)
NAME STATE READ WRITE CKSUM
zvol1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/125aaa00-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/12b10cfc-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/12fcf46f-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/1346ef47-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/139ce72d-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/13f54a70-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/1455c170-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/14b2eec5-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/1510de5d-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/15679b47-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
gptid/15c0f293-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/1622bf66-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/16763412-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/16c9eeb5-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/1722b462-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/1774c38f-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/17d48eee-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/182af0e7-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/82c510b0-ecc1-11e6-ab01-0cc47a7693ea ONLINE 0 0 0
gptid/18db0382-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
gptid/194cacce-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/19d716cf-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/1a2a52db-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/1a7e23fb-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/1ad00704-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/1b204880-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/1b74faa1-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/1bc6a1b9-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/1c1a4cf6-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
gptid/1c6f6bc3-8354-11e5-9d44-a0369f607ad0 ONLINE 0 0 0
logs
gptid/7736759b-c903-11e5-8446-a0369f607ad0 ONLINE 0 0 0
cache
gptid/b7c75247-e297-11e6-8ec9-a0369f607ad0 ONLINE 0 0 0

b)

freenas-boot 1.46G 118G 0 0 0 0
r10 200G 43.3T 0 0 0 0
zvol1 43.2T 120T 1.71K 0 53.5M 0
---------- ----- ----- ----- ----- ----- -----

r10 is the test volume with mirrored vdevs (raid 10)

c) SMART is fine

d) We're using iSCSI formatted as NTFS. We are doing backup file consolidations. I don't know if that qualifies as streaming, but I've experimented with turning the caches to metadata only. Do you suggest reducing the size as well?

e) Pool is only using 26% of capacity pretty empty, 22% fragmented

We got replacement LSI controllers, but didn't see much of a difference in testing.
Read performance currently barely hitting 60MB/s - it needs to be 8-10x this in order to be usable for us.

Totally stumped.

RAIDTester · Feb 8, 2017

THAT'S IT!!!
I FOUND IT!!! :D:D:D
ONE SETTING!
THE ONLY ONE I DIDN'T TOUCH

15 sets of mirrored vdevs on Highpoint Rocket 750 HBA

zfs set recordsize=1M r10

#iozone -t 1 -i 0 -i 1 -r 1M -s 200G -+n -e -w

Children see throughput for 1 initial writers = 1232264.62 KB/sec
Parent sees throughput for 1 initial writers = 1232251.57 KB/sec
Min throughput per process = 1232264.62 KB/sec
Max throughput per process = 1232264.62 KB/sec
Avg throughput per process = 1232264.62 KB/sec
Min xfer = 209715200.00 KB

Children see throughput for 1 readers = 2137824.25 KB/sec
Parent sees throughput for 1 readers = 2137751.54 KB/sec
Min throughput per process = 2137824.25 KB/sec
Max throughput per process = 2137824.25 KB/sec
Avg throughput per process = 2137824.25 KB/sec
Min xfer = 209715200.00 KB

RAIDTester · Feb 9, 2017

Now I just need to get similar speeds over iSCSI over 2x 10GBe
Over iSCSI, read speed is about 10-15% of what it is locally
In round robin, iSCSI is only doing about 2gb per port for sequential read performance (a little over 280MB/s)
Are there any posts for tuning iSCSI/NIC for this type of performance?

bigphil · Feb 9, 2017

Did you ever test your 10Gb connection using iperf to see if you can get the proper speeds out of the link? Are the interface IP's being used for iSCSI on different subnets (post IP's for all of your iSCSI interfaces)? Because your zvol block size is large, you should also make sure that you have physical block size reporting turned off on your iSCSI extent in FreeNAS. I cant find any documentation to validate that fact, but I highly doubt Windows likes such a large physical block size.

Important Announcement for the TrueNAS Community.

64TB iSCSI targets on Storinator

Sweet'NASty

Dabbler

Guru

Dabbler

Guru

Contributor

Dabbler

Patron

Dabbler

Patron

Dabbler

Guru

Dabbler

Patron

Dabbler

Explorer

Dabbler

Dabbler

Dabbler

Patron

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "64TB iSCSI targets on Storinator"

Similar threads