Setup a 6x SSD RAIDz1 array .... and it is SLOW!

TrumanHW

Contributor
Joined
Apr 17, 2018
Messages
197
Perhaps the T320 is just too slow for it ... but that seems unlikely, bc my SPINNING drives get faster rates.

I'm getting

~650MBs READ
~300MBs WRITE

(Granted, the speeds are VERY consistent) ... but I could get that performance with a single drive).

RAIDz1 array comprised of 6x EVO 870 4TB SSD (SATA) ....
As in, Read is just barely faster than a single drive's performance.
And Write is half the performance of a single drive.

All transfers were of large (1GB+) video files.

Any idea what the bottleneck might be ...?

If I can't get faster than this with a SATA Flash array .....
Seems like a waste to try to build an NVMe Flash array.

I have an R730 I could retry this on, but, I wasn't CPU limited.

Any help of what I could look into would be appreciated.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Try a pool of 3 mirror vdevs for comparison.
 

Morris

Contributor
Joined
Nov 21, 2020
Messages
120
If you want a bit more storage you can do 2 x 3 drive Z1 striped. Many drives in Z1 become CPU limited
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
The other's have put you on the right track here. The key is to understand how the data gets handled with changes in pool geometry. The RAIDz1 case you chose typically presents a write speed equivalent to the slowest device in the pool, and a slight bump in read speed. Other pool geometries permit multiple data transactions to be "in-flight" in parallel. In the case of mirrors, the reads can be issued to devices in round-robin fashion, yielding a read rate that gets close to a sum of the individual device rates. Writes have to be issued to both devices, so single pair mirror write rates stay at single device write speed. Adding additional mirror pairs acts as a multiplier. Similar performance improvements can be obtained from parity geometries by adding vdev's. A 3 device RAIDz1 striped with another 3 drive RAIDz1, should have a write rate roughly 2x single device.

But once you get up into SSD territory the CPU & memory also have to be able to keep up with the demands of calculating the parity, checksum's, and organizing & managing the ARC, and the bulk operations as they're split up on each device. There are bottlenecks here at each step. How many trips thru memory does the data make to get that 300Mb/sec? A T320 is DDR3 and probably not the later DDR3-1600. Make sure you're maximizing your memory interleave, and have the DIMM's installed in the correct sequence. You have 6 SATA devices, are they all on the same SATA controller? Are they all negotiating 6Gb/sec? Does the controller itself have 36Gb/sec of PCIe bandwidth? Can you move half the drives to another SAS/SATA controller? If so, does that extra SAS/SATA controller's PCIe lanes get serviced by the same CPU socket or does it incur a NUMA access penalty, force the kernel to schedule the device driver on the other socket, etc... Remember, in a NUMA system free CPU's doesn't necessarily translate to the CPU having access to the data or the devices needed to contribute to the effort.
 

ikue1966

Cadet
Joined
Sep 24, 2013
Messages
6
Hi all, just for curiosity reasons: are there other limitations when swapping HDDs for SSDs?
I tried to replace HDD one by one and the newly replaced SSDs were kicked out after about 30 min during resilver (status: Detached)
The setup:
- homelab for lowest energy consumption
- ASrock ITX J4125, 16 GB RAM, PCIe to 6xSATA controller
- RAID Z2 with 6x 3TB HDD (to be replaced by SSDs)
The individual SSDs work fine standalone as a external drive via USB
 

Attachments

  • IMG_9897.jpg
    IMG_9897.jpg
    278 KB · Views: 133

rvassar

Guru
Joined
May 2, 2018
Messages
972
Hi all, just for curiosity reasons: are there other limitations when swapping HDDs for SSDs?
I tried to replace HDD one by one and the newly replaced SSDs were kicked out after about 30 min during resilver (status: Detached)
The setup:
- homelab for lowest energy consumption
- ASrock ITX J4125, 16 GB RAM, PCIe to 6xSATA controller
- RAID Z2 with 6x 3TB HDD (to be replaced by SSDs)
The individual SSDs work fine standalone as a external drive via USB

The pic you include seems to indicate some kind of drive controller issue. ZFS is kind of a beast, and has been known to expose the flaws in many retail SSD offerings with poorly engineered or feature deficient controllers. You should start a new thread, as this is unrelated to the performance discussion here. Include the drive models, both the drives being replaced, as well as the SSD's.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
You should read the following resource.
 

TrumanHW

Contributor
Joined
Apr 17, 2018
Messages
197
OKay, just to go to the absolute extreme ... I just made it into a STRIPED array ... no parity ... to see the limit of performance I could garner with it. As I have two identical systems, one with spinning drives and another that's flash ... I tested both.

I did DD tests of both (random and ostensibly sequential) ... and copied some video crap a friend of mine watches.

Note 1: Compression is disabled to avoid it ambiguating results.
Note 2: I'm too stupid to figure out the man info to use FIO, and no one provides a diagramed explanation of what those commands I have seen are accomplishing with each argument, thus, I used DD:

Results:

6x SSDs (STRIPED) performing a DD test - RANDOM:

246.3 MBs dd if=/dev/random of=tmp.dat bs=1M count=10k
236.4 MBs dd if=/dev/random of=tmp.dat bs=1M count=10k
240.7 MBs dd if=/dev/random of=tmp.dat bs=1M count=10k

6x SSDs (STRIPED) performing a DD test - SEQUENTIAL:

3,614.0 MBs dd if=/dev/zero of=tmp.dat bs=1m count=10k
2,301.5 MBs dd if=/dev/zero of=tmp.dat bs=1m count=10k
2,641.1 MBs dd if=/dev/zero of=tmp.dat bs=1m count=10k



8x 7200-rpm RAIDz2 performing a DD test - SEQUENTIAL:
1,839.8 MBs dd if=/dev/zero of=tmp.dat bs=1m count=10k
1,733.4 MBs dd if=/dev/zero of=tmp.dat bs=1m count=10k


8x 7200-rpm RAIDz2 performing a DD test - RANDOM:

60.9 MBs dd if=/dev/random of=tmp.dat bs=1M count=1k
38.8 MBs dd if=/dev/random of=tmp.dat bs=1m count=1k



And in "real world" tests ...??

The striped SSD array gets barely more than the speed of a single drive when READING:

6x striped Evo 870 "READ": ~650 MBs (very consistently)
6x striped Evo 870 "WRITE": ~345 MBs (quite consistently)

An identical unit which has my 8x RAIDz2 array ....

READS between 203 - 265 MB sec
WRITE at about 545 - 650 MB sec
As in ...
- The Striped SSD array allows me to DL data almost 2x as fast as it'll upload ...
- The Striped SSD array writes data almost half as fast as the spinning drives do.

Is it not odd that the a spinning array seems to outperform the SSD read performance..?

And cannot even exceed the bandwidth of 10GbE ...?


It this is a reasonable result for STRIPED SSDs ... I am definitely going to stick with spinning drives, perhaps adding some disk shelves, as this is far from impressive.

That said, I have a slightly newer Dell I could try this out on (not that CPU utilization ever exceeded 20%) ... it's the R730xd ...

But I was about to purchase an R7415 for use with NVMe SSDs ... but it seems like there's a real drop off between money spent and performance garnered (if my experiences are anything indicative of what they should be).
 

TrumanHW

Contributor
Joined
Apr 17, 2018
Messages
197
You should read the following resource.

I have. It doesn't (and I can't imagine anything does) explain why faster drives in an identical system reverses the R-W performance..?

Another idea I have (aside from testing it in another machine) ...? Maybe trying it on TrueNAS Scale ..?

This is utterly paradoxical behavior.
The 7200-rpm drives are SAS and the SSD drives are SATA ..??

But we're talking (spinning) drives which AT BEST ... barely break 220 MB sec (on the outer edge)
Vs SSDs which reliably (not just the first 20GB or something) get over 400MB/s.

The consistency is great ... but I'm sorry, a 6-SSD STRIPED VOLUME that can't write at the speed a SINGLE drive does..?
And reads at only 1.2x ..? Just doesn't justify the price per TB for me.

I really appreciate the time and attention everyone's offered ...
And I'm sorry I'm frustrated; this is just some unbelievably confusing hardware behavior.

I mean, it doesn't even give the performance I'd expect from RAIDz1 in RAID-0 with no protection!
Spinning drives out-perform it in scenarios (I get up to 1GB/s with my spinning RAIDz2 array).

This just makes NO sense. (Certainly not economic sense at that).
 

TrumanHW

Contributor
Joined
Apr 17, 2018
Messages
197
The pic you include seems to indicate some kind of drive controller issue. ZFS is kind of a beast, and has been known to expose the flaws in many retail SSD offerings with poorly engineered or feature deficient controllers. You should start a new thread, as this is unrelated to the performance discussion here. Include the drive models, both the drives being replaced, as well as the SSD's.

Not replacing the drives ... I have two identical systems (and a 3rd system I could pop them in).

When you say "controller" ... you mean on the SSD ..? Or you mean like the SAS controller I'm using ..?

I have a BUNCH of SAS controllers. In fact, I could throw a bunch of NVMe drives in this with my U.2 drives...

Or even use my R730xd (I just need to order the RAM for it) ...
 

TrumanHW

Contributor
Joined
Apr 17, 2018
Messages
197
I tried a pool
Try a pool of 3 mirror vdevs for comparison.


I went straight to striping the entire set ... I mean, if that doesn't "speed it up" all the other variations won't ..... right..?

And ... it didn't (as you can see). Obviously I have a few more ideas I can try, and I'll report back after trying them. Thanks again.
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
Thoughts:

1. /dev/random blocks on the entropy pool. The /dev/random device is supposed to provide a source of randomness suitable for cryptographic purposes. When the pool does not contain enough randomness, it blocks on read until the random number generator can pick up enough entropy to provide more. Hence, the more you read from it, the more it blocks. So throw out all your /dev/random results, they are invalid from the start. Try /dev/urandom as a non-blocking pseudo-random source.

2. All you've identified is there's another bottleneck. You need to figure out if it's CPU, memory or SAS/SATA/AHCI controller(s).

And I'm going to repeat myself a bit more bluntly... You're using 2020 disks in a 2011 server. There is going to be a bottleneck somewhere.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Disable compression on a test dataset and use /dev/zero.
 

ikue1966

Cadet
Joined
Sep 24, 2013
Messages
6
The pic you include seems to indicate some kind of drive controller issue. ZFS is kind of a beast, and has been known to expose the flaws in many retail SSD offerings with poorly engineered or feature deficient controllers. You should start a new thread, as this is unrelated to the performance discussion here. Include the drive models, both the drives being replaced, as well as the SSD's.
will do so right away
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Using the solnet array will esclude testing issues.
 

TrumanHW

Contributor
Joined
Apr 17, 2018
Messages
197
Thoughts:

1. /dev/random blocks on the entropy pool. The /dev/random device is supposed to provide a source of randomness suitable for cryptographic purposes. When the pool does not contain enough randomness, it blocks on read until the random number generator can pick up enough entropy to provide more. Hence, the more you read from it, the more it blocks. So throw out all your /dev/random results, they are invalid from the start. Try /dev/urandom as a non-blocking pseudo-random source.

2. All you've identified is there's another bottleneck. You need to figure out if it's CPU, memory or SAS/SATA/AHCI controller(s).

And I'm going to repeat myself a bit more bluntly... You're using 2020 disks in a 2011 server. There is going to be a bottleneck somewhere.


Results with -- dd if=/dev/urandom of=tmp.dat bs=1M count=1k: 1073741824 bytes / sec or 1073.74 MBs

Results with -- dd if=/dev/urandom of=tmp.dat bs=1M count=10k: 1073741824 bytes / sec or 241.14 MBs

Basically...?? Dogshit.
Disable compression on a test dataset and use /dev/zero.

Yup, I disabled it when I was making it. I knew there was no point introducing a variable, especially for testing performance without obscuring the actual matrix. And I agree ... I'm not sure it is worth figuring out wtf the bottleneck is here ... but ultimately, my acumen says it really should NOT be an issue. Anyway, I found some good deals on hardware that allows me to just sell off the 4TB SATA SSDs and make this whole issue moot.


Grabbed a Dell R7415 with 24x SFF // NVMe + 256GB RAM for $2k

And found 8 Micron 7300 7.68TB drives for around $425 ea.

If an Epyc system with support for 24 NVMe drives can't handle ≤ 12 NVMe SSD to reliably saturate SFP28 I'm gonna flip a bit. :cool:
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
/dev/urandom cannot sustain a constant data rate. Please try with /dev/zero and compression disabled.
 
Top