Unexpectedly Slow Read Performance with RaidZ (Scale)

im.thatoneguy

Dabbler
Joined
Nov 4, 2022
Messages
37
Intel Xeon Silver 4314 24MB 16-core 2.4GHz
256 GB ECC RAM
Supermicro X12SPI-TF Motherboard
Broadcom 3808 IT Mode HBA
Intel X550 10gbe (but slow locally not even over SMB)
21x Western Digital WDC HC550 WUH721816ALE6L4
3x7
RAIDz-2

I've got three 7 Drive RAIDZ2 VDevs and a 16 core processor and I'm only seeing read speed of ~600MB/s. Write speeds are actually higher: about 900MB/s which is surprising since, if anything the RaidZ2 writes should be getting bogged down by parity calculations?

Process: Run once to create the data file. Run a decoy data file to get fed into ARC. Run the read test again on the original file.

Code:
sudo fio --filename=/mnt/pool/dataset/file.dat --rw=read --direct=1 --bs=1M --ioengine=libaio --numjobs=1 --group_reporting --name=seq_write --iodepth=32 --size=128G

sudo fio --filename=/mnt/pool/dataset/decoy.dat --rw=read --direct=1 --bs=1M --ioengine=libaio --numjobs=1 --group_reporting --name=seq_write --iodepth=32 --size=128G

sudo fio --filename=/mnt/pool/dataset/file.dat --rw=read --direct=1 --bs=1M --ioengine=libaio --numjobs=1 --group_reporting --name=seq_write --iodepth=32 --size=128G


zpool iostat 5 also running to confirm in the BG what's actually being read from the disks.

If I let it read from ARC it hits 3,300MB/s so the CPU doesn't seem to have any inherent issues.

Am I just testing wrong? I'm not sure where to start even troubleshooting. My goal for the system was ~10gb read/write.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
OK, well I have a pool similar to yours (8 disks per VDEV rather than 7).

The read test I usually run looks like this:

fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=read --size=50g --io_size=1500g --blocksize=1M --iodepth=16 --direct=1 --numjobs=16 --runtime=120 --group_reporting

Assuming you have no compression and have set the recordsize of the dataset to 1M.

What I see out of that is:
Code:
bw (  MiB/s): min=  319, max=16477, per=100.00%, avg=11145.28, stdev=206.48, samples=3808
iops        : min=  304, max=16464, avg=11138.68, stdev=206.56, samples=3808


direct=1 (I see you also used) should be avoiding ARC (by requesting unbuffered I/O) and I see all the disks' activity lights almost fully lit the whole time, so I have relative confidence it's doing that. I guess that would make the size of your test and the decoy read a little overkill.
 
Last edited:

im.thatoneguy

Dabbler
Joined
Nov 4, 2022
Messages
37
Well you're getting 11GiB/s so you're definitely just testing your RAM. :D

Someone on Discord also randomly ran into the same thing and posted last night:

BloodyGent — Yesterday at 7:24 PM
Did a quick benchmark for large file transfers using smb internally to and from an NVME pool. These are some quick results. Interesting why reads are so horrible with RaidZ below 1 drive performance.

SMB large file transfer bandwidth:
Z1 (3+1) vdev: 360-400 MB/s write | read 80-110 MB/s
Z1 (2+1) vdev + hotspare: 100-250 MB/s write | read 80 - 110 MB/s
Z2 (2+2) vdev: 340-380 MB/s write | read UNTESTED
2+2 mirror striped vdevs: 380-420 MB/s write | read 580-650 MB/s

These are fast WD HC550 16TB HDDs with 512MB cache. Their peak bandwidth is around 260MB/s according to the spec sheet.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Well you're getting 11GiB/s so you're definitely just testing your RAM. :D
At least partially true...

The setting we had different (that matters) is numjobs.

At 16, I get 16 "clients" requesting data, so that results in at least a little (and in the case of 16, a lot) benefit from data already in disk cache, so I'm able to push at what is more-or-less the speed of the controller. (which is SAS 12GB/s)

If I push that number down to 1, I get more like 500MB/s, for 2 I get 1200MB/s.

I can confirm that ARC isn't coming into play as I ran your decoy process and even with 16 jobs, I can get 10, 9 and 8GB/s on the 3 runs.
 
Last edited:
Top