Slow ZFS read performance in Truenas

alex711

Dabbler
Joined
Feb 21, 2021
Messages
13
I am getting incredibly slow read speeds from magnetic drives with Truenas. Shouldn't I see better than 314-400MB/s read performance from 6vdev mirrors with a total of 12 3TB drives? I have tested different drives, The only thing that seems to not exhibit this problem are SSDs when testing with SSDs I get good read and write performance. 10gbps line rate in both directions with a single 4 drive raidz1 vdev.

I have tested different HBAs and different cables (I am using LSI 9200-16e in the pictures, but have also tested this with LSI 9206-16e) both had the same results
I have reinstalled truenas from scratch (vanilla) problem persists.
I have tested all of the drives individually with hdd sentential -- none of the drives had any issues and read and write speeds were ~140MB/s average per drive (tested all at the same time in aggregate I saw 2GB/s at peak.)
I have tested this using fio and smb. If I'm reading from ARC things are line rate 10gbps otherwise they read speeds are 1/3rd write speeds or worse.
I've created smaller pools and get about the same results read speed wise.

System specs:
dual e5-2667 CPUs
128GB ddr3 ecc memory
LSI 9200-16e
LSI 9206-16e
separate power supplies for drives (tested 2 different PSUs)

Read speeds:
read speeds with arc and without arc.png


Write Speeds are fine:
write speeds smb.png
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
What motherboard are you using to host those 2x CPUs?
Also, what ashift value are you using in your pool, and ZFS record size in your datasets?
 

alex711

Dabbler
Joined
Feb 21, 2021
Messages
13
Seagate Constellation ES.3 drives... I think this has something to do with the drives themselves... but how i cannot explain.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Are these the standard ES.3 or the self-encrypting model?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
What slots are your HBAs in? According to the X9DRi-LN4F+ manual, slots 1-3 are controlled by the front CPU, and slots 4-6 by the rear CPU.
 

alex711

Dabbler
Joined
Feb 21, 2021
Messages
13
ah yes -- I looked at the block diagram earlier today and I tested by isolating the cards to a single numa node -- IE: I placed 2 HBAs (9200-16e) in slot 1 and 2, and placed the 10gbe nic in slot 3. I also tested by putting the cards in the second set of pcie slots. The overall performance never changes.

The thing that makes me scratch my head is -- If I put 16 SSDs on the LSI 9206-16e and test -- the performance is great -- I can saturate 10gbe all day in both directions with only a handful (4-5) of the ssds... The moment I put the ES.3's on that same card without changing any of the configuration i start to see some artificial ceiling of 500MB/s no matter how many drives I add... and at some point It almost seems like the more drives i add the slower the performance gets going from say 8 to 12 to 16...

I wish i had some other magnetic drives to test with.. but i only have 4 6TB drives and when i test with those i get about the same performance as i do with 4 of the 3TB drives.
 

alex711

Dabbler
Joined
Feb 21, 2021
Messages
13
I didn't try that troubleshooting before today because the QPI link between numa nodes is supposed to be quite a bit faster than the speeds I'm seeing.. and I cant imagine it adds enough latency to slow things down at these data rates... maybe I'm wrong?
 

alex711

Dabbler
Joined
Feb 21, 2021
Messages
13
Additionally -- the copy to dev/zero seems to be as fast as i'd expect from 12 of these drives.. scrubbing is incredibly fast as well...
 

alex711

Dabbler
Joined
Feb 21, 2021
Messages
13
It only takes 3 SSDs to get me 1.1GB/s read and write in a stripe... It doesn't matter if i stripe or mirror these drives -- 8, 10 or 12 of the ES.3 drives only nets me 500MB/s read

1613969274300.png
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
With striped mirrors, you're optimized for IOPs, so with a single client, you wouldn't necessarily expect all conditions to produce the maximum possible throughput (depends which... and how many... VDEVs and underlying disks are involved in serving the requested blocks... demonstrated well by your screenshots in the first post).

If you have multiple clients seeking data from a better spread of files (which will ideally even out the IOPs across more/all of the VDEVs), you may see closer results to the SSDs in terms of overall throughput.

Don't forget, if you're comparing IOPs here, 3 SSDs (let's estimate conservatively here at 10K per device, so 30'000 IOPS with only 3 disks of spread), vs 12 HDDs (again conservatively 100 per device, so 1'200 IOPs). I suggest your bottleneck is IOPs.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Repeated runs with the same source file /mnt/mpools1/test1/day1/test can inflate the results of the dd test because there's potentially pieces of the file in the ~128GB ARC.

Is it possible to reboot or drop caches and then immediately test with the dd command?

As @sretalla suggests I think you might be facing an IOPS/mechanical bottleneck. ES.3's are fast for spinning disks but still limited by their need to seek across the surface.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,909
To somewhat illustrate the consequences of the huge difference in IOPS: My last company-provided notebook with a mechanical hard disk (Win 7 with MacAffee and loads of background stuff) needed about 25 minutes for a complete restart. Going to an SSD changed that to less than 2 minutes.
 

alex711

Dabbler
Joined
Feb 21, 2021
Messages
13
@sretalla Thanks for the reply! I think i had too high of expectations -- coupled with a few tests I did while running windows server and striping the disks with disk management using the same dataset and getting 10gbps in both directions... I dont know how to explain that result ...
 

alex711

Dabbler
Joined
Feb 21, 2021
Messages
13
Something about this just doesn't sit right with me... We are literally talking about 20 drives here -- Can someone help me understand how 10 mirrored vdevs can only produce 400MB/s-500MB/s read speeds? Why wouldn't the read speed improve at all with an additional 8 drives?

1614128329735.png
 

alex711

Dabbler
Joined
Feb 21, 2021
Messages
13
I retract my previous statement.. The single threaded nature of windows explorer for file copies is terrible and I knew that before doing these tests so shame on me. Thanks for everyone's help/input.
 
Top