ZFS performance testing with SAS SSDs

Diff

Dabbler
Joined
May 9, 2020
Messages
33
Probably discussed many time, and I spent 3 days reading different articles, still not sure I have a good sense how to properly test setup and how to design ZFS pool in most efficient way. So Have to ask for help and advice.

Here is my setup:

Code:
TrueNAS-12.0-U2.1
Dell R730xd
CPU: 2 x Intel(R) Xeon(R) CPU E5-2650L v3 @ 1.80GHz
Memory: 128GB ECC
Dell HBA330 mini (IT mode)
Disks:
  2 x 250GB 12Gb/s SAS SSD
  14 x 1.9TB 12Gb/s SAS SSD
  2 x 512GB NVMe SSD
Network:
  LACP - 2 x 40Gbe Mellanox MT27500 [ConnectX-3]


For zpool config I am trying:
- TrueNAS installation on mirror 2 x 250GB 12Gb/s SAS SSD
- Z2 with 12 x 1.9TB 12Gb/s SAS SSD
- spares 2 x 1.9TB 12Gb/s SAS SSD
- ZLOG mirror 2 x 512GB NVMe SSD

Before I configured all as ZFS I booted of CentOS 8 Live CD and run Phoronix Test Suite, against multiple individual drives (some did not even ended up in this system), but for specific individual drive I have in Z2 (1.9TB 12Gb/s SAS SSD), I got:

Read: 3,303MB/s

1618276278878.png


Write: 378MB/s

1618276306218.png


More extensive report is here https://openbenchmarking.org/result/2005053-NI-R730XDMUL59

After I build a TrueNAS and ZFS, I created a Jail, loaded Phoronix Test Suite and run IOzone test on ZFS mount inside iocage mounted from Z2 pool.. result is more than puzzled to me

1618276481822.png


Report is here https://openbenchmarking.org/result/2104096-HA-12XSSDZ2Z42

Where READ is slightly better than single drive, WRITE is 10X worse. Seems like I am doing something wrong here or this testing methodology is not right. I was reading online about testing with `dd`, but also that refers mostly to spinning disks, not to SAS SSDs.

Here are some result of `dd` as well

Write: ~ 532Mb/s

Code:
dd if=/dev/zero of=/mnt/z2/files/test/file.out bs=4096 count=1000000 oflag=direct
1000000+0 records in
1000000+0 records out
4096000000 bytes transferred in 7.339412 secs (558082826 bytes/sec)


Read: ~216Mb/s

Code:
dd if=/mnt/z2/files/test/file.out of=/dev/null bs=4096
1000000+0 records in
1000000+0 records out
4096000000 bytes transferred in 18.042795 secs (227015825 bytes/sec)


at this point I am looking for any advice or pointers of relevant materials to read.
 
Last edited:

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
The following points come to mind:
  • How are your disks connected? RAID controller does not work, you need an HBA in IT mode
  • Several people that had upgraded from FreeNAS 11 to TrueNAS 12 have reported that for them performance was considerably worse with TrueNAS. This seems to be a bit of an edge-case, but perhaps it hits you.
  • For high performance requirements, the general recommendation is to use mirrors and not RAIDZ
  • I am not an expert on FreeBSD jails, but I don't think they are suitable for performance testing. At the very least you are comparing apples with pears (container with bare metal). As to the performance degregation imposed, I have to leave that to others.
Not related to the performance difference, but your CPUs have an awfully low clock speed. Fewer cores with higher clock speed are recommended for pure file sharing. 4 cores at 3+ GHz are typically sufficient, although that may be too little for your performance requirements. If you want to do something like Plex live transcoding, you need to get a bigger CPU.
 
Last edited:

Diff

Dabbler
Joined
May 9, 2020
Messages
33
  • How are your disks connected? RAID controller does not work, you need an HBA in IT mode

yes, connected via Dell HBA330 mini, so TrueNAS/FreeBSD has direct access to drives

Code:
smartctl -i /dev/da3
smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.2-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SAMSUNG
Product:              MZILS1T9HCHP/003
Revision:             TT00
Compliance:           SPC-4
User Capacity:        1,920,383,410,176 bytes [1.92 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate:        Solid State Device
Form Factor:          2.5 inches
Logical Unit id:      0x5002538a05a0ee90
Serial number:        S2B3NAAGA00142
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Tue Apr 13 00:41:50 2021 PDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled



  • For high performance requirements, the general recommendation is to use mirrors and not RAIDZ

Do you mean to do something like this?
- VDEV
- Z1 with 6 x 1.9TB 12Gb/s SAS SSD
- Z1 with 6 x 1.9TB 12Gb/s SAS SSD

Seems like this is not first time I am bumping into opinion that 11.x could be better performant comparing to 12.x.
Not really worried about Plex and transcoding, this is mostly for my Home lab (file shares for backup and powering xcp-ng pool with bunch of VMs, Kubernetes)
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Do you mean to do something like this?
- VDEV
- Z1 with 6 x 1.9TB 12Gb/s SAS SSD
- Z1 with 6 x 1.9TB 12Gb/s SAS SSD

Not Z1 - no RAIDZn at all. Mirrors! :wink:
- mirror with 2x 1.9 TB
- mirror with 2x 1.9 TB
- mirror with 2x 1.9 TB
- mirror with 2x 1.9 TB
- mirror with 2x 1.9 TB
- mirror with 2x 1.9 TB
 

Diff

Dabbler
Joined
May 9, 2020
Messages
33
Interesting idea, I can probably try after I recover back TrueNAS from reboot loop

In meantime I still wonder;

1. Even with current configuration (Z2 over 12 x SAS SSDs) numbers I am seeing does look slow, right?
2. What is a good methodology to properly test performance of the pool based on fast SAS SSDs?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Interesting idea
Excuse me? This is the standard well documented way to set up a ZFS pool if you need IOPS. Don't even think about RAIDZ if performance is your main concern.
 

Diff

Dabbler
Joined
May 9, 2020
Messages
33
Excuse me? This is the standard well documented way to set up a ZFS pool if you need IOPS. Don't even think about RAIDZ if performance is your main concern.

Understand Patrick, sorry, maybe I should have clarified that my goal is not to squeeze top IOPS. I would like to find a balanced solution for durability with resilience to failed drives (considering I use this as my main storage for home network and home lab), and I do not have unlimited budget, so most of this hardware is used off eBay, so potential for failures quiet high.
So my hope was to stay more on resilience / redundancy with reasonable performance.

Sorry, maybe I am still being new to ZFS, being too naive here.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Understood. Sorry if I was a bit snarky.

IMHO a RAIDZ1 with 6 disks has a higher probability of failure than 3 mirror pairs. I would not run RAIDZ1 at all if resilience was my prime concern. RAIDZ2 minimum. Everything else is a tradeoff. 6 mirrors will give you the IOPS of 6 disks in parallel while two RAIDZ vdevs will give you the performance of 2 disks at best.

If you can afford to experiment, do try the mirror configuration.
 

Diff

Dabbler
Joined
May 9, 2020
Messages
33
Understood. Sorry if I was a bit snarky.

No worries :rolleyes:

IMHO a RAIDZ1 with 6 disks has a higher probability of failure than 3 mirror pairs. I would not run RAIDZ1 at all if resilience was my prime concern. RAIDZ2 minimum. Everything else is a tradeoff. 6 mirrors will give you the IOPS of 6 disks in parallel while two RAIDZ vdevs will give you the performance of 2 disks at best.

Make sense, just to clarify, current setup is Z2, like this:

For zpool config I am trying:
- TrueNAS installation on mirror 2 x 250GB 12Gb/s SAS SSD
- Z2 with 12 x 1.9TB 12Gb/s SAS SSD
- spares 2 x 1.9TB 12Gb/s SAS SSD
- ZLOG mirror 2 x 512GB NVMe SSD


If you can afford to experiment, do try the mirror configuration.

as soon as I recover it, now completely down
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
IMHO a RAIDZ1 with 6 disks has a higher probability of failure than 3 mirror pairs. I would not run RAIDZ1 at all if resilience was my prime concern. RAIDZ2 minimum. Everything else is a tradeoff. 6 mirrors will give you the IOPS of 6 disks in parallel while two RAIDZ vdevs will give you the performance of 2 disks at best.
Just out of interest:

See below for the outcome of 2 disks failed
Red = Pool loss
Yellow = Degraded pool/VDEV

So if in the first example of RAIDZ1 disk A fails followed by disk B, your pool is dead. Whereas the example of RAIDZ2 A failing followed by B is a degraded (but still alive) pool.

With 30 possible orders in which 6 drives can fail (6x5=30), that means:

RAIDZ1 30/30 = 100% chance of pool loss with 2 disks failed

RAIDZ2 0/30 = 0% chance of pool loss with 2 disks failed

Mirrors 6/30 = 20% chance of pool loss with 2 disks failed

1618395901429.png
 
Last edited:

Moc

Cadet
Joined
Apr 24, 2021
Messages
1
I've been using TrueNas for a short while, and my experience have been good, except I always found it slow... So I decided to build a new with everything relatively recent, and it still slow !

It not an all SSD setup like this here... But if 200MB/sec sequencial write is considered OK on a 12disk, in a 6 strip of 2 mirror of SSD drive... I mean I have +10year old server with Raid6 12 disk doing over 500MB/sec write...

I have 5x18TB EXOS drive in a Z2 setup, and I got about 120MB/sec write, and I consider that hyper slow ! I can see it write about 40MB/sec per disk, and the disk are rated at 200MB/sec sequential. I had these speed from the first 1GB being written to it ! It also have a 2TB NVME cache drive (which I read only help read), but I only see write at the same disk as the other drives... So it not even writing the full data to it... So when I seek the data I just written (While I was still under 1 TB data on the pool), it would just read it from all the drive. (System is Ryzen 3600 with 32GB ECC memory).

So I'm I understand that others here consider the write speed for those 12xSSD disk normal at 200MB/sec ?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Just out of interest:

See below for the outcome of 2 disks failed
Red = Pool loss
Yellow = Degraded pool/VDEV

So if in the first example of RAIDZ1 disk A fails followed by disk B, your pool is dead. Whereas the example of RAIDZ2 A failing followed by B is a degraded (but still alive) pool.

With 30 possible orders in which 6 drives can fail (6x5=30), that means:

RAIDZ1 30/30 = 100% chance of pool loss with 2 disks failed

RAIDZ2 0/30 = 0% chance of pool loss with 2 disks failed

Mirrors 6/30 = 20% chance of pool loss with 2 disks failed

View attachment 46499
It's a good analysis.... on the basis that 2 drive fail simultaneously. In reality the drives fail independently most of the time. Where there are simultaneous failures (fire, sprinklers), you may also lose more than 2 drives.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Probably discussed many time, and I spent 3 days reading different articles, still not sure I have a good sense how to properly test setup and how to design ZFS pool in most efficient way. So Have to ask for help and advice.

Here is my setup:

Code:
TrueNAS-12.0-U2.1
Dell R730xd
CPU: 2 x Intel(R) Xeon(R) CPU E5-2650L v3 @ 1.80GHz
Memory: 128GB ECC
Dell HBA330 mini (IT mode)
Disks:
  2 x 250GB 12Gb/s SAS SSD
  14 x 1.9TB 12Gb/s SAS SSD
  2 x 512GB NVMe SSD
Network:
  LACP - 2 x 40Gbe Mellanox MT27500 [ConnectX-3]


For zpool config I am trying:
- TrueNAS installation on mirror 2 x 250GB 12Gb/s SAS SSD
- Z2 with 12 x 1.9TB 12Gb/s SAS SSD
- spares 2 x 1.9TB 12Gb/s SAS SSD
- ZLOG mirror 2 x 512GB NVMe SSD



Here are some result of `dd` as well

Read: ~ 532Mb/s

Code:
dd if=/dev/zero of=/mnt/z2/files/test/file.out bs=4096 count=1000000 oflag=direct
1000000+0 records in
1000000+0 records out
4096000000 bytes transferred in 7.339412 secs (558082826 bytes/sec)


Write: ~216Mb/s

Code:
dd if=/mnt/z2/files/test/file.out of=/dev/null bs=4096
1000000+0 records in
1000000+0 records out
4096000000 bytes transferred in 18.042795 secs (227015825 bytes/sec)


at this point I am looking for any advice or pointers of relevant materials to read.


THe problem is probably that you are testing with dd and a block size of 4kB.

DD doesn't have a queue depth and so is very latency sensitive.

216MB/s @ 4K block size is over 50K IOPS.... is probably worse than that because the dataset recordsize is probaly not tuned for 4K accesses and the RAIDz2 is certainly not tines for 4K I/O.

If you want bandwidth, test with a 128K - 1M I/O size. Fio is a performance test tool. DD is a simple copy tool.

I haven't used iozone, so I don't know what its issues are when running inside a jail. Things to check are whether writes are sync or async, what is the queue depth? What is the recordsize for the file/dataset?

If you need max performance, mirrors are faster. THey require fewer disk I/Os.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I've been using TrueNas for a short while, and my experience have been good, except I always found it slow... So I decided to build a new with everything relatively recent, and it still slow !

It not an all SSD setup like this here... But if 200MB/sec sequencial write is considered OK on a 12disk, in a 6 strip of 2 mirror of SSD drive... I mean I have +10year old server with Raid6 12 disk doing over 500MB/sec write...

I have 5x18TB EXOS drive in a Z2 setup, and I got about 120MB/sec write, and I consider that hyper slow ! I can see it write about 40MB/sec per disk, and the disk are rated at 200MB/sec sequential. I had these speed from the first 1GB being written to it ! It also have a 2TB NVME cache drive (which I read only help read), but I only see write at the same disk as the other drives... So it not even writing the full data to it... So when I seek the data I just written (While I was still under 1 TB data on the pool), it would just read it from all the drive. (System is Ryzen 3600 with 32GB ECC memory).

So I'm I understand that others here consider the write speed for those 12xSSD disk normal at 200MB/sec ?

No, it not normal, but it is very much a function of the configuration and how you test. Typically, a 6 SSD system can yield over 1GB/s.

Single client tests are very sensitive to the client software and how it uses the NAS. For max bandwidth you need larger I/Os and a queue depth that is reasonable. Without queue depth, then the drives cannot be kept busy.

How do you test your system? Internally, we use vdbench and fio https://fio.readthedocs.io/en/latest/fio_doc.html
 

Kailee71

Contributor
Joined
Jul 8, 2018
Messages
110
One more argument for the stripe of mirrors is that resilvers are much quicker after a disk failure, so the time spent in degraded state is much reduced, further reducing the likelihood of a dead pool. And, lastly, also with a stripe of mirrors increasing total capacity is as simple as replacing a mirror half, resilver, replace the other mirror half, done.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,949
and it only takes 2 disks to add another vdev.
 

TrumanHW

Contributor
Joined
Apr 17, 2018
Messages
197
You know -- I see people always make this argument that SSDs need the same parity considerations that a spinning array does.

Granted -- being able to sustain only 1 failure is riding on the edge, but, part of the premise to RAIDz2 is "if another drive fails during the looong resilver time of spinning (and huge) volumes ...

1, a decent SSD can write 2TB in a couple of hours. ... which mitigates the risks partially ...
2. dRAID (when it gets released later this year) will actually improve recovery time and efficiency even more (kinda can't wait).

You could use a pair of arrays, one of fast SSDs with minimal parity (RAIDz1) ...
And keep replications of contrast (differential) to a 2nd zVol so that the time in which they're out of sync is minimized.

I'm thinking of setting up something along those lines, though, replication tasks have always been harder to setup than I'd like to admit.

I think (as morganL mentions above) ... DD's low QD (and if it was writing 4k blocks) is a great point, because I cannot imagine RAIDz1 or 2 of all SAS drives (assuming they actually can exceed the SATA-3 500MB/s for large files) is almost certainly not limited to 400MB/s; as another person mentions, their SPINNING drives exceed that. My plan is to set up a RAIDz1 NVMe array ... which backs up to a spinning array.

Again, anything short of ultra expensive drives are going to be slow when they get down to 4k files and low QD.
Just the way it is unless you're using an Optane array, as I bet you already know.

I hope nothing I said came across as arrogant, bc honestly, I think you likely know more than I (and wrote very clearly).
Best of luck.
 
Top