Misadventures with ZFS + SSDs (both SATA and NVMe)

TrumanHW · Jul 10, 2023

What do I buy (or what do I do) ... to get NVMe levels of performance in TrueNAS..?

What kind of performance can I expect from each drive? 200MB/s per SSD doesn't feel like good value.

I want to see over 1GB/s per drive... maybe 700 or 800 MB/s is fine if in aggregate and if IOPs stays high
... but lower than that is just not what I was expecting.
Granted, if it was actually pegging the CPU at 100%..? I'd understand. But this is not that.

I purchased a Dell R7415 (Epyc) sold with 24 slots wired for NVMe
Thinking it'd at least support 16 NVMe no problem...& should be fast ... right??

The previous owner said that he was getting about 3GB/s with much much slower SSD than I was going to use, but, alas ... whether I used 4 or 8 NVMe drives (which each get 2GB/s min up to 3.2GB/s) the machine just doesn't get the 2-3GB/s out of the group that each drive individually should get.

Hell, even using 1 drive in TNC or TNS ... I'm getting 165MB/s per drive in a 4 drive RAIDz1 pool. This is literally the same performance I get with 8 spinning drives.

When I tested with 6 SATA SSD (Evo 870) I got about the same 500MB/s W and 600MB/s R.

This just DOES NOT warrant the cost of NVMe gear!!

Now, I see [potential] issues ... such as the R7415 only has 32 PCIe lanes connected to the 24 NVMe drives!

But, I get this same exact performance whether I have 2 drives in bank 0 (which has 16 lanes) and 2 in bank 1 (another 16) ... or all 4 in either bank.
Yes, 16 lanes is enough for 4 NVMe drives. But I'm looking for explanations as to why they're getting x1 performance.
And even SATA drives which are connected to an HBA330 ... get a much smaller percent of each drives available performance than spinning drives get.

And THAT is the real question:

Why do I get such a tiny fraction of each drives available performance compared to spinning arrays..? And is this just the way it is ??
As in ... is the actual benefit of NVMe drives or SATA SSD ... that they have an excellent lower floor which ... whatever you get, they keep irrespective of how small the files are and thus how high the IOPs is ..?

Or is this an issue related to Epyc (AMD)..?
Or is this actually an issue with the topology (lanes allocated) ..?

I ask bc I'm open to buying an R7525 (which has two Epyc CPUs ... so apparently only after you have 256 lanes will Dell waste only 130 or so, and actually "wire up" the 96 lanes the NVMe slots need...?

And if the issue is that AMD just doesn't give great throughput per drive ... would a Dell R750 be better !?

Bc as TERRIBLE as 165MB/s is ... when I connect 8 drives..? Instead of 4 ... and at that, drives that are faster (3.2GB/s vs 2.2GB/s) ... the drive-performance DROPS DOWN to about 87MB/s.

And I cannot use a different NVMe controller ... (nor should I have to really) bc there's no way I could wire it to the backplane. This is what's available.

What's really pathetic..? Is using cheap consumer gear that I have (an i7-8700k with a HighPoint SSD7120) I've gotten 9GB/s ... obviously that's the local performance..but I tested everything locally in this case as well ... and it's literally identical. I installed Ubuntu and did benchmarks of individual drives as well as having made a RAID-5 array with 3 drives, 4 drives, etc ... which as a RAID array..? It was only 100MB/s faster than it was inside ZFS. BUT ... in Ubuntu a single NVMe drive got the full 3GB/s ... whereas in TNS ..?

Testing a single NVMe drive..? 520MB/s Write

Testing a mirror of 2 NVMe drives..? 555MB/s Write

With 8 drives it can get up to a WHOLE
- 700MB/s Write ...
- 800MB/s Read

And of course ... I bet it would do this even if it were tiny files (high IOPs) and would be very consistent (unlike spinning arrays).

But is that what you guys expect from the costs of NVMe..?? These drives are THOUSANDS if you buy them from Dell ... etc.
Yet we're supposed to be satiated with spinning drive performance?

If I'm doing something wrong (aside from my "unreasonable expectations") ... please, LMK what I should do.
Or ..? If maybe one of the other machines I mentioned would help.

Thanks

NickF · Jul 10, 2023

Can we baseline this conversation?
What are you trying to do with this system? Is it for fun, is it for work, or is it somewhere in between?
Either way, whats the workload, what are you aiming to accomplish? Let's establish those as the facts of the case.

You right, that some of the limiting factors exist on the hardware itself. How much I don't really know. But even given that, if you booted that thing into a Windows environment and striped together all of those drives in disk manager you'll likely see higher BIG NUMBER MARKETING STRAIGHT LINE SEQUENTIAL PERFORMANCE. But in reality, that's not going translate to much. Latency may even matter more to you than you realize. https://www.youtube.com/watch?v=tSUMBeaaiOo
Even still, I'm not sure that your system is performing as best as it should. But I have absolutely no idea what configuration of anything is, beyond trying to read the word wall. All I've gathered is that you have a Dell EPYC system and you setup a RAID-Z1 of NVME drives. How are you measuring the performance you have observed? Are you using the same methodology? Are you running other workloads simultaneously? Have you read this? https://static.ixsystems.co/uploads/2020/09/ZFS_Storage_Pool_Layout_White_Paper_2020_WEB.pdf

Now, lets talk about ZFS and NVME.
There are plenty of famous examples of epic failures of ZFS NVME deployments. This is the poster child: https://www.youtube.com/watch?v=2n-yJwuVE4k
But even he has corrected his mistakes in the years since: https://www.youtube.com/watch?v=coShLkCriXc&t

What I am trying to convey is that you are trying to perform at cutting edge speeds which are nearly as fast as system memory on a file system that was invented when the bees-nees was the Seagate Cheetah. There's some tuning and rough edges we need to polish to get you going. What's your ASHIFT, Whats your record size? https://arstechnica.com/information...01-understanding-zfs-storage-and-performance/ Maybe, have you seen this? https://www.truenas.com/community/r...ng-to-maximize-your-10g-25g-40g-networks.207/ We haven't even brought up networking, filesharing protocols or anything here...So please sir. Whatcha tryin to do

TrumanHW · Jul 10, 2023

NickF said:
What are you trying to do with this system?

SOHO // business storage for at most 2-3 simultaneous users.

NickF said:
Whats the workload, what are you aiming to accomplish? Let's establish those as the facts of the case.

There's literally NO workload if you mean VMs. The only "work" asked of it is R / W
Eventually I'll buy an LTO8 for it ... but nothing else right now.

NickF said:
But in reality, that's not going translate to much.

Pray tell ... for the love of god, WHY?? lol.
I'd love to complain about how "useless my 10GB/s R-W bandwidth is."

I mean, hell, these drives have no problem exceeding 2GB/s R/W when connected via TB, and that's due to TB3 limitations.
The latency..?? OMG, I'd love to have that be a problem. But right now? Why can't I have what I know exists? Throughput.
Which I use most often for video.

Do I really need to resort to spinning drives if I want "more performance" ..??? lol.

NickF said:
I'm not sure that your system is performing as best as it should. But I have absolutely no idea what configuration of anything is, beyond trying to read the word wall.

I know ... it's not that I want to write a novel, but people immediately start off with their mantra or dumb assumptions (well, reasonable, but based on the obligatory assumption that I did 12 things idiotically and withheld obvious facts they'd need to know that make my problem moot).

"It's your CPU!! SMB totally needs CPU ... It's your CPU!"
Without ever looking at the fact that I've answered that, utilization doesn't even pass 6% !!
And CPU temps are the same as idle.

Right now...as it is...drives that literally have no problem doing 2GB/s are arbitrarily limited to a whole 175MB/s.
Even if they had only getting 1 PCIe lane (Dell are assholes and wasted the 128 PCIe lanes Epyc has ... assigning only 32 for 24 NVMe slots )
Even at x1 !! It should still do 600MB/s PER drive ... meaning 2.4GB/s for 4 drives...

NickF said:
Dell EPYC, ... how'd you test the performance:

ALL TESTS with MICRON 7300 PRO

1 NVMe drive by itself
- TNC + TNS - 500MB/s W
- TNC + TNS - 600MB/s R
- Ubuntu - 2.2GB/s W
- Ubuntu - 3.2GB/s R
- Win 10 - 2.2GB/s W
- Win 10 - 3.2GB/s W

2 drives mirrored (TNC + TNS)
- 550MB/s W
- 650MB/s R

3 drives
- TNC + TNS - 550MB/s W in RAIDz1
- TNC + TNS - 650MB/s R in RAIDz1
- Ubuntu - 700MB/s W - in RAID-5
- Ubuntu - 800MB/s R - in RAID-5

4 drives in RAIDz1
- 550MB/s W TNC + TNS -- in the same 'bank' thus with access to 16 lanes
- 650MB/s R TNC + TNS -- in the same 'bank' thus with access to 16 lanes

- 550MB/s W TNC + TNS -- splitting the 'banks' thus with access to 32 lanes
- 650MB/s R TNC + TNS -- splitting the 'banks' thus with access to 32 lanes

BELOW TESTS done with Micron 9300 Pro

Single NVMe drive tested
- TNC + TNS - 550MB/s W
- TNC + TNS - 650MB/s R
- Ubuntu - 3.2GB/s W
- Ubuntu - 3.2GB/s R
- Win 10 - 3.2GB/s W
- Win 10 - 3.2GB/s W

8x 9300 Pro in RAIDz2 (TNC)
- 750MB/s W
- 850MB/s R

NickF said:
How are you measuring the performance you've observed?

Testing only ONE thing at a time; nothing else running. Nothing to complicate anything. Just the single copy or write task.
Methods:

TrueNAS (Core + Scale)

FIO (confirmed looking at "performance GUI" graphs)

Actual tests R/W identical data // files: ≥1GB, that my RAIDz2 8x 7200 rpm T320 sustains ≥550MB/s.....

Windows 10

Real-world tests and benchmarking application

Ubuntu

Real-world tests

Ubuntu benchmark utility

NickF said:
There are plenty of famous examples of epic failures of ZFS NVME deployments. This is the poster child: https://www.youtube.com/watch?v=2n-yJwuVE4k
But even he has corrected his mistakes in the years since: https://www.youtube.com/watch?v=coShLkCriXc&t

I'll check those out. I knew about the videos but personally find him uber annoying. :)

NickF said:
There's some tuning and rough edges we need to polish to get you going. What's your ASHIFT, Whats your record size?

Record size = 128kb (that's where IOPs gets good for most SSDs, yes..?)
ASHIFT however, I thought "setting" had become a thing of the past & that it's selected autonomously; Is that wrong?

NickF said:
https://arstechnica.com/information...01-understanding-zfs-storage-and-performance/ Maybe, have you seen this? https://www.truenas.com/community/r...ng-to-maximize-your-10g-25g-40g-networks.207/

Yeah, I have a decent level of comprehension of how data works, RAID, and while I have an SFP+ setup right now, I just sold my LOUD SFP28 and once I finish moving I'm picking up a Mikrotik SFP28 again ... but ... since I'm barely getting HALF of the SFP+ bandwidth I'm thinking there are other problems first ... ey? And as I said, I did start off testing with SFP28 to ensure it couldn't possibly be the bottleneck (and obviously, at sub-600MB/s, it's not).

NickF said:
We haven't even brought up networking, filesharing protocols or anything here...So please sir. Whatcha tryin to do

I'm using SMB ...

I'm trying to R and W data. That's it.
R/W data as fast as possible ... THAT is my use case.
Getting the performance I paid for too. (crazy ey?) lol

Epyc CPU
256GB of DDR4 ECC
SFP28 NIC etc (temporarily using an SFP+ switch. But I'm pretty sure I'll notice if it hits 1.25GB/s
I'm also the ONLY USER ON IT right now. :)

I also have
2 905P
-and-
2 P5800x
... but no point until I get this working "right" first.

I'm WILLING to upgrade to either an
R7525
-or-
R750
To actually provide the full 96 lanes I DESERVE dammit!!

But for now??? I'm not even getting x1 performance.

I'm not asking this machine to do ANYTHING but READ and WRITE.
That's it. And yet, it can't even outperform my spinning array !! lol.

And as you saw, when I threw the 8 drives in ... it just made each drive the arrays comprised of do LESS WORK!
Ensuring I'm at no risk of it performing anything like that which I paid for.

That's it. Simple enough. I'm not complicating it further.
I don't have 30 VMs running. I'm not using TOKEN RING. :)
I'm not withholding any little surprises ...
I'm not secretly using 1GbE but somehow getting 550MB/s.
I'm just trying to be candid, keep it as simple as possible, and get the MAXIMUM R/W performance.

It's like the system deliberately restricting performance; give it more drives..? It asks each to do LESS WORK.
As if the goal is to have the hard drives perform as CRAP as possible to barely yield 700MB/s

Yes, ashift, good idea.

NickF · Jul 10, 2023

So...You have more money than sense. Understood.

I too can be claimed as having more money than sense. https://www.truenas.com/community/threads/truenas-scale-nvme-performance-scaling.104641/
Lets get a baseline of performance whereby we can compare it to something roughly equivalent in my dataset in the link above.

fio --bs=128k --direct=1 --directory=/mnt/newprod/ --gtod_reduce=1 --ioengine=posixaio --iodepth=32 --group_reporting --name=randrw --numjobs=16 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based

That being said, this is a test you have to run on your truenas. This isn't something you should run externally on a computer over an SMB mount, there are no external variables. We are trying to baseline the performance of your system, not your network or the file share system. PLease provide the full output.

Patrick M. Hausen · Jul 11, 2023

@TrumanHW Your posts are long, difficult to read, and unless I missed something there is little information about the exact pool setup and the commands used to perform the tests. Also "Ubuntu" - does that refer to stock Ubuntu with ZFS or something different altogether?

Again, if I missed something, I apologize. Also I fully understand your frustration not getting anywhere near "SSD speed" from your setup. Yet it's important to focus on a single parameter and proceed step by step as @NickF is already trying to guide you to.

Nonetheless I get the impression that you are (a bit) insinuating that there is some inherent limit in TrueNAS that is hitting you. I can assure you there is not and I can back it up with some measurements.

How did I test?

Code:

# preparation
zfs create -ocompression=off <pool>/test
cd /mnt/<pool>/test

# write test
fio --filename=./test --direct=1 --rw=write --bs=1M --iodepth=2 --numjobs=12 --group_reporting --name=test --size=50G

# read test
fio --filename=./test --direct=1 --rw=read --bs=1M --iodepth=2 --numjobs=12 --group_reporting --name=test --size=50G

TrueNAS CORE 13.0-U5.1
Supermicro X10SDV-TLN4F
Supermicro AOC-SLG3-2M
Xeon D-1541
64 GB of memory
2x Samsung SSD 970 EVO Plus 1TB

2-way mirror

Code:

  WRITE: bw=1739MiB/s (1823MB/s), 1739MiB/s-1739MiB/s (1823MB/s-1823MB/s), io=600GiB (644GB), run=353305-353305msec
   READ: bw=2610MiB/s (2737MB/s), 2610MiB/s-2610MiB/s (2737MB/s-2737MB/s), io=600GiB (644GB), run=235409-235409msec

TrueNAS CORE 12.0-U8
Supermicro AS-1113S-WN10RT
EPYC 7401P
256 GB of memory
6x Intel SSDPE2KX010T8

3x 2-way mirror

Code:

  WRITE: bw=8927MiB/s (9361MB/s), 8927MiB/s-8927MiB/s (9361MB/s-9361MB/s), io=600GiB (644GB), run=68823-68823msec
   READ: bw=22.9GiB/s (24.6GB/s), 22.9GiB/s-22.9GiB/s (24.6GB/s-24.6GB/s), io=600GiB (644GB), run=26178-26178msec

As you can see I get completely satisfying throughput on both systems.

edge-case · Jul 11, 2023

Patrick M. Hausen said:

Hi, I'm interested in this topic too, as I was about to start purchasing / testing some NVMe SSD storage for potential TrueNAS use... as I also have more money than sense ;-) .. and you and NickF seem to performed lots of testing.

However, I'm confused by the results of the tests you detailed above when I run them on my systems... are you sure those test results are not being enhanced by ZFS LARC caching?

I ask because I just ran it on my HDD systems, expecting to get results in the 100s of MB/s, and got much higher numbers than I expected (on 3 different systems). So, I honestly don't understand my results unless I'm doing something dumb and/or missing something obvious...

With exact commands you used above....

6 x 18 TB 7200 RPM WD Red Pros / Seagate Exynos (in 3 Mirrored VDEVs)
( 6-core Ryzen 5600 and 64 GB DDR4 3200 ECC RAM )

Code:

WRITE: bw=6190MiB/s (6491MB/s), 6190MiB/s-6190MiB/s (6491MB/s-6491MB/s), io=600GiB (644GB), run=99253-99253msec
READ: bw=8031MiB/s (8421MB/s), 8031MiB/s-8031MiB/s (8421MB/s-8421MB/s), io=600GiB (644GB), run=76508-76508msec

8 x 12 TB 7200 RPM WD Reds (in 4 Mirrored VDEVs)
(6-core Ryzen 2600 and 32 GB DDR4 2666 ECC RAM)

Code:

WRITE: bw=5207MiB/s (5460MB/s), 5207MiB/s-5207MiB/s (5460MB/s-5460MB/s), io=600GiB (644GB), run=117992-117992msec
READ: bw=6616MiB/s (6937MB/s), 6616MiB/s-6616MiB/s (6937MB/s-6937MB/s), io=600GiB (644GB), run=92871-92871msec

4 x 14 TB 5400 RPM WD Reds (in 2 Mirrored VDEVs)
Athlon 240GE (2 Ryzen 100x cores and 16 GB DDR4 2400 non-ECC RAM)

Code:

WRITE: bw=2805MiB/s (2941MB/s), 2805MiB/s-2805MiB/s (2941MB/s-2941MB/s), io=600GiB (644GB), run=219030-219030msec
READ: bw=3381MiB/s (3545MB/s), 3381MiB/s-3381MiB/s (3545MB/s-3545MB/s), io=600GiB (644GB), run=181733-181733msec

Patrick M. Hausen · Jul 11, 2023

Mmmh ... I copied the test command from someone else in one of the numerous threads with the walls of text by @TrumanHW. I'll look back into it tomorrow. I got the impression that the --direct=1 parameter would take care of that issue. Also as you probably noticed I ruled out compression.

edge-case · Jul 11, 2023

Patrick M. Hausen said:
Mmmh ... I copied the test command from someone else in one of the numerous threads with the walls of text by @TrumanHW. I'll look back into it tomorrow. I got the impression that the --direct=1 parameter would take care of that issue. Also as you probably noticed I ruled out compression.

Yep, I noticed that. That's why my results surprised / stumped me...

In real world, when copying > 50 GBs of multiple large files from a computer with a fast SSD over 10 Gbe networking to my NAS(s), I get ~200-300 MB/s... so I expected something a bit higher than that from the fio results...

I just re-tested using the command that NickF listed above, and get more "sensible" looking results.
For the same systems as above, using:

Code:

fio --bs=128k --direct=1 --directory=/mnt/newprod/ --gtod_reduce=1 --ioengine=posixaio --iodepth=32 --group_reporting --name=randrw --numjobs=16 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based

READ: bw=733MiB/s (769MB/s), 733MiB/s-733MiB/s (769MB/s-769MB/s), io=43.0GiB (46.2GB), run=60111-60111msec
WRITE: bw=732MiB/s (768MB/s), 732MiB/s-732MiB/s (768MB/s-768MB/s), io=42.0GiB (46.2GB), run=60111-60111msec

READ: bw=507MiB/s (532MB/s), 507MiB/s-507MiB/s (532MB/s-532MB/s), io=29.8GiB (32.0GB), run=60166-60166msec
WRITE: bw=508MiB/s (533MB/s), 508MiB/s-508MiB/s (533MB/s-533MB/s), io=29.9GiB (32.1GB), run=60166-60166msec

READ: bw=253MiB/s (265MB/s), 253MiB/s-253MiB/s (265MB/s-265MB/s), io=14.9GiB (16.0GB), run=60280-60280msec
WRITE: bw=254MiB/s (267MB/s), 254MiB/s-254MiB/s (267MB/s-267MB/s), io=15.0GiB (16.1GB), run=60280-60280msec

so that makes a lot more sense.
I'm going to do a few quick tests with a couple of NVMe SSDs I have handy, and see how that compares...

Back to the OP, I have no idea what's making his system perform so badly...

NickF · Jul 11, 2023

FWIW, Heres some additional datapoints.
This is my HDD pool:

Code:

  pool: excitement
 state: ONLINE
  scan: scrub repaired 0B in 13:56:03 with 0 errors on Sun Jul  2 13:56:05 2023
config:

        NAME                                      STATE     READ WRITE CKSUM
        excitement                                ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            2ced9987-6798-4f05-a2e3-9d5351b1f95b  ONLINE       0     0     0
            535b5c44-486a-4977-b7c2-597d5a31bf26  ONLINE       0     0     0
          mirror-1                                ONLINE       0     0     0
            c61ef833-f973-4b81-968c-d94a79b1fb90  ONLINE       0     0     0
            51cc15b2-e222-4403-a195-748d66f35880  ONLINE       0     0     0
          mirror-2                                ONLINE       0     0     0
            43d34dbf-87c1-4332-b204-36b47e8ac20f  ONLINE       0     0     0
            96e07847-c29f-46a8-94f2-9e9d891c13e6  ONLINE       0     0     0
          mirror-3                                ONLINE       0     0     0
            0c3b4729-cae5-45ff-9be1-67777dd2820f  ONLINE       0     0     0
            16456181-64fd-432b-82f4-78cf946fb11c  ONLINE       0     0     0
          mirror-4                                ONLINE       0     0     0
            0f7f7a10-b918-4f74-9b81-f5330fcbace6  ONLINE       0     0     0
            60f71496-bb38-416d-a823-856138365923  ONLINE       0     0     0
          mirror-5                                ONLINE       0     0     0
            6039645b-ce3c-4a2e-9d59-6e2169dcae07  ONLINE       0     0     0
            df5b2a9a-8128-4ad5-820e-4c45d57cd167  ONLINE       0     0     0
          mirror-6                                ONLINE       0     0     0
            9dbdc93b-3c27-4726-b3f1-bd97abce20b6  ONLINE       0     0     0
            d0778fd4-13f8-48c9-91dc-a21c7cd044a1  ONLINE       0     0     0
          mirror-8                                ONLINE       0     0     0
            66d9089f-eb61-4989-bd4a-2afb868892da  ONLINE       0     0     0
            48ab80fc-a332-4b9b-a8fa-dc2bcfaae828  ONLINE       0     0     0
          mirror-10                               ONLINE       0     0     0
            fd0f155a-f03f-492d-b30a-389ae96a0b3d  ONLINE       0     0     0
            be8cea69-601f-4296-8274-3829c6474ae6  ONLINE       0     0     0
          mirror-11                               ONLINE       0     0     0
            0b11d083-e60a-4805-ad16-6536b93a5f75  ONLINE       0     0     0
            6d2e2a9d-df1f-44bd-bafe-c3ad1036b9d1  ONLINE       0     0     0
          mirror-12                               ONLINE       0     0     0
            aab70f6c-052b-4b49-a9ca-c4ea443ba551  ONLINE       0     0     0
            a0b556e8-4d22-47dc-a775-2c799abf7d9e  ONLINE       0     0     0
        special
          mirror-7                                ONLINE       0     0     0
            cef86ceb-b53a-430c-a497-16dddb0194fc  ONLINE       0     0     0
            c9942a62-7eb6-4274-a3dd-72b6263e3b24  ONLINE       0     0     0
          mirror-9                                ONLINE       0     0     0
            a0b53303-ebed-4133-959d-b5e08f0b7fc6  ONLINE       0     0     0
            2382a55b-7ffb-41aa-ae20-4e67ee58af89  ONLINE       0     0     0
        cache
          35d21db7-3546-4d0c-9889-dda134351b20    ONLINE       0     0     0
          2527d3b2-032b-4d2c-8469-26d988fced30    ONLINE       0     0     0

errors: No known data errors

This is the same benchmark I ran in the above linked thread. This test is meant to represent an analogy of a fairly stressful workload with multiple clients, and also to be as parallelized as possible to ensure greater than QD1 hitting the NVME. I couldn't be wrong, but that was my intent when I strung it together and started tracking this data.

Code:

root@prod[~]# fio --bs=128k --direct=1 --directory=/mnt/excitement --gtod_reduce=1 --ioengine=posixaio --iodepth=32 --group_reporting --name=randrw --numjobs=16 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based
randrw: (g=0): rw=randrw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=posixaio, iodepth=32
...
fio-3.25
Starting 16 processes
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
Jobs: 15 (f=15): [E(1),m(15)][100.0%][r=822MiB/s,w=813MiB/s][r=6573,w=6502 IOPS][eta 00m:00s]
randrw: (groupid=0, jobs=16): err= 0: pid=2508587: Tue Jul 11 17:10:32 2023
  read: IOPS=10.5k, BW=1308MiB/s (1372MB/s)(76.8GiB/60108msec)
   bw (  MiB/s): min=  326, max= 2923, per=100.00%, avg=1311.14, stdev=34.78, samples=1917
   iops        : min= 2608, max=23391, avg=10488.41, stdev=278.28, samples=1917
  write: IOPS=10.4k, BW=1306MiB/s (1370MB/s)(76.7GiB/60108msec); 0 zone resets
   bw (  MiB/s): min=  378, max= 2814, per=100.00%, avg=1309.22, stdev=34.20, samples=1917
   iops        : min= 3020, max=22516, avg=10472.88, stdev=273.66, samples=1917
  cpu          : usr=0.47%, sys=0.08%, ctx=304812, majf=1, minf=930
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=7.1%, 16=67.4%, 32=25.4%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=95.6%, 8=1.5%, 16=1.4%, 32=1.4%, 64=0.0%, >=64=0.0%
     issued rwts: total=628785,627876,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=1308MiB/s (1372MB/s), 1308MiB/s-1308MiB/s (1372MB/s-1372MB/s), io=76.8GiB (82.4GB), run=60108-60108msec
  WRITE: bw=1306MiB/s (1370MB/s), 1306MiB/s-1306MiB/s (1370MB/s-1370MB/s), io=76.7GiB (82.3GB), run=60108-60108msec
root@prod[~]#

So, 8x NVME is performing 6 or so times faster than my fairly big 22-disk HDD pool (for a homelab anyway). I wouldn't characterize that as "bad" performance, and there is certainly far more to the story here than this artificial testing created by monkeys like us. LATENCY MATTERS TOO https://www.youtube.com/watch?v=tSUMBeaaiOo

In Patrick's FIO test, that same pool gets this for Read, which I agree are being inflated by the ARC, and are probably not useful:

Code:

root@prod[~]# fio --filename=/mnt/excitement/test --direct=1 --rw=read --bs=1M --iodepth=2 --numjobs=12 --group_reporting --name=test --size=50G
test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=2
...
fio-3.25
Starting 12 processes
Jobs: 12 (f=12): [R(12)][100.0%][r=48.8GiB/s][r=49.9k IOPS][eta 00m:00s]
test: (groupid=0, jobs=12): err= 0: pid=2815044: Tue Jul 11 17:18:09 2023
  read: IOPS=49.2k, BW=48.1GiB/s (51.6GB/s)(600GiB/12482msec)
    clat (usec): min=63, max=35376, avg=242.60, stdev=119.31
     lat (usec): min=63, max=35376, avg=242.73, stdev=119.32
    clat percentiles (usec):
     |  1.00th=[  143],  5.00th=[  176], 10.00th=[  190], 20.00th=[  206],
     | 30.00th=[  217], 40.00th=[  227], 50.00th=[  235], 60.00th=[  245],
     | 70.00th=[  258], 80.00th=[  273], 90.00th=[  297], 95.00th=[  326],
     | 99.00th=[  412], 99.50th=[  469], 99.90th=[  717], 99.95th=[  963],
     | 99.99th=[ 2802]
   bw (  MiB/s): min=42818, max=52644, per=14.83%, avg=49240.42, stdev=213.61, samples=288
   iops        : min=42818, max=52644, avg=49240.42, stdev=213.61, samples=288
  lat (usec)   : 100=0.02%, 250=64.38%, 500=35.23%, 750=0.28%, 1000=0.04%
  lat (msec)   : 2=0.03%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=0.66%, sys=96.57%, ctx=142044, majf=0, minf=3222
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=614400,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=2

Run status group 0 (all jobs):
   READ: bw=48.1GiB/s (51.6GB/s), 48.1GiB/s-48.1GiB/s (51.6GB/s-51.6GB/s), io=600GiB (644GB), run=12482-12482msec
root@prod[~]#

In Patrick's FIO test, that same pool gets this for Write:

Code:

root@prod[~]# fio --filename=/mnt/excitement/test --direct=1 --rw=write --bs=1M --iodepth=2 --numjobs=12 --group_reporting --name=test --size=50G
test: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=2
...
fio-3.25
Starting 12 processes
test: Laying out IO file (1 file / 51200MiB)
Jobs: 12 (f=12): [W(12)][98.2%][w=2890MiB/s][w=2889 IOPS][eta 00m:01s]
test: (groupid=0, jobs=12): err= 0: pid=2749633: Tue Jul 11 17:17:16 2023
  write: IOPS=11.2k, BW=10.9GiB/s (11.7GB/s)(600GiB/54864msec); 0 zone resets
    clat (usec): min=131, max=196565, avg=993.63, stdev=1814.61
     lat (usec): min=155, max=196597, avg=1066.46, stdev=1821.71
    clat percentiles (usec):
     |  1.00th=[  217],  5.00th=[  269], 10.00th=[  314], 20.00th=[  388],
     | 30.00th=[  457], 40.00th=[  523], 50.00th=[  594], 60.00th=[  685],
     | 70.00th=[  799], 80.00th=[ 1020], 90.00th=[ 1909], 95.00th=[ 3425],
     | 99.00th=[ 6063], 99.50th=[ 7767], 99.90th=[21103], 99.95th=[29230],
     | 99.99th=[57934]
   bw (  MiB/s): min= 1766, max=24343, per=100.00%, avg=11214.07, stdev=497.17, samples=1306
   iops        : min= 1764, max=24343, avg=11213.46, stdev=497.20, samples=1306
  lat (usec)   : 250=3.39%, 500=33.12%, 750=29.62%, 1000=13.29%
  lat (msec)   : 2=11.10%, 4=5.63%, 10=3.56%, 20=0.19%, 50=0.10%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=7.03%, sys=44.57%, ctx=1105339, majf=2, minf=187
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,614400,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=2

Run status group 0 (all jobs):
  WRITE: bw=10.9GiB

Davvo · Jul 11, 2023

@TrumanHW what is your pool layout?

ZFS Storage Pool Layout

This resource was originally created by user: @Davvo on the TrueNAS Community Forums Archive. https://www.truenas.com/community/resources/zfs-storage-pool-layout.201/download [1] This amazing document, created by iXsystems in February 2022 as a “White Paper”, cleanly explains how to qualify...

www.truenas.com

Have you tried the following script? It might help troubleshooting.

solnet-array-test

Back in the late '90's, I was managing a bunch of large whitebox storage servers. For the largest of these, I had the pleasure of building and deploying a massive storage server, 8 shelves of 9 drives each, Seagate ST173404LW 73GB drives, a...

www.truenas.com

Can you please share the fio command you used? By any chance, are you virtualizing?

Also, did you flash the H330 with the proper firmware?

Important Announcement for the TrueNAS Community.

Misadventures with ZFS + SSDs (both SATA and NVMe)

TrumanHW

Contributor

NickF

Guru

TrumanHW

Contributor

NickF

Guru

Patrick M. Hausen

Hall of Famer

edge-case

Dabbler

Patrick M. Hausen

Hall of Famer

edge-case

Dabbler

NickF

Guru

Davvo

MVP

ZFS Storage Pool Layout

solnet-array-test

Similar threads

Important Announcement for the TrueNAS Community.

Misadventures with ZFS + SSDs (both SATA and NVMe)

Contributor

Guru

Contributor

Guru

Hall of Famer

Dabbler

Hall of Famer

Dabbler

Guru

MVP

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Misadventures with ZFS + SSDs (both SATA and NVMe)"

Similar threads