Unexpected performance of 1000MB/s Read from 2 striped 3.5 SATA 6Gb/s drives over 10GbE

rigel

Dabbler
Joined
Apr 5, 2023
Messages
19
CPU: AMD EPYC 7282
RAM: 2 x 64 GB Micron 3200 DDR4 Registered ECC
ZFS pool: Striped VDEV with Seagate ST1000VM002 1TB 5900 RPM 64MB Cache + Seagate ST1000DM003 1TB 7200 RPM 64GB Cache
Metadata, Log, Cache or other VDEVs: None
Connection: TrueNAS SCALE-22.12.2 with 10GbE Intel X550T NIC <-> Cat 6 cable <-> Thunderbolt 3 to 10GbE adapter <-> Mac Mini M1

I'm new to TrueNAS and built my first machine and added a "scrath" ZFS m̶i̶r̶r̶o̶r̶ stripe vdev pool to store some temporary data for pre-processing and before moving this data to a more resilient zfs pool. Therefore I do not care about loosing data in this m̶i̶r̶r̶o̶r̶ stripe vdev pool.

The "problem" is that I cannot understand how am I getting write speeds of around 750MB/s and read speeds of around 1000MB/s over 10GbE connection. These 3.5 SATA drives individually have about 150MB/s Read/Write speeds so I thought ok since I have 2 of them I will get around 300MB/s in stripe vdev. These HDDs are at least 10 years old and were used for cold archiving before.

Is it because of the 128 GB RAM that takes the load during read/write operations on this stripe ZFS pool? Does it mean that if I copy folders/files less than 128GB then ZFS ARC will take the heavy lifting and I will basically saturate 10GbE connection? I have noticed that during testing only about 64GB RAM is utilized by ZFS Cache and I copied folders less than 50GB.

Or is it because of the way I test it? I'm on Mac Mini and using BlackMagic Disk Speed Test app. None of the popular more trusted apps (CrystalDiskMark or ATTO) are not available on Mac as far as I know.

Is there a better way to test this setup?
truenas-10gbe-sata-6gbs-drives-1gb-files.png
 
Last edited:

rigel

Dabbler
Joined
Apr 5, 2023
Messages
19
As a comparison, I have a raidz1 ZFS pool with 4 x WD_850X 4TB NVMe drives each 7300MB/s read and 6300MB/s write speeds and they basically show the same file transfer speeds over 10GbE, obviously saturating the connection. Currently they are overkill but I'm planning to upgrade to 40GbE or 25GbE connection between my TrueNAS and Mac Mini.

truenas-10gbe-wd-850x-drives-1gb-files.png
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Are you writing compressible data by any chance, e.g. a long string of zeros? It's rather easy for a potato to deliver a mountain of data quickly if it compresses really well.

Try using your real workload, with real files and stuff.
 
Joined
Oct 22, 2019
Messages
3,641
Thunderbolt 3 to 10GbE adapter
How has that thing not melted yet? :oops: Did you ever touch it to see how hot it gets when doing a large transfer?


Therefore I do not care about loosing data in this mirror vdev pool.
You described a "striped" vdev, not a "mirror". Are you sure it's a mirror vdev? (The GUI would have warned you before creating it if it was a "striped" vdev.)


Not an expert by any means, but my hunch is that you're reading from the ARC (RAM), and your writes are "async" anyways, which means you're seeing speeds hitting RAM before they are flushed and confirmed on persistent storage (the HDDs).

EDIT: Obligatory question: For your transfers over the network, you do mean MB/s and not Mb/s?
 

rigel

Dabbler
Joined
Apr 5, 2023
Messages
19
Are you writing compressible data by any chance, e.g. a long string of zeros? It's rather easy for a potato to deliver a mountain of data quickly if it compresses really well.

Try using your real workload, with real files and stuff.

YES, You are absolutely right! I never trusted this BlackMagic disk test app. I was also testing with iperf3 (showing 9.90 Gbits/sec), but as far as I understood it can only test memory to memory transfer speed, and not a real disk to disk transfer speed?

I just copied from Mac to TrueNAS zpool in question (2 stripe 3.5 SATA drives) a 70GB Final Cut Pro X project file (it is actually a folder with many many media and metadata files) and I got only between 150-200MB/s:
screenshot-2023-06-10-17-23-09@2x.png


and here is the same test with raidz1 ZFS pool with 4 x WD_850X 4TB NVMe drives averaging at around 700MB/s write speed:
screenshot-2023-06-10-17-23-42@2x.png


Now I'm wondering why the NVMe ZFS pool is performing so slow? Each of these 4 drives claims 7300MB/s read and 6300MB/s write speeds. I was expecting at least get around 1100MB/s. Or is it because of the collection of many small files inside that Final Cut Pro X project file that slow down writing to ZFS pool?

Also, taking into consideration above slow real files transfer test to NVMe ZFS pool over 10GbE, does it mean that even if I upgrade connection to 40GbE or 25GbE that will not speed up write speed to this ZFS pool? I'm my dreams I thought with 25GbE connection I will get 2000MB/s read/write to this NVMe ZFS pool.
 
Last edited:

rigel

Dabbler
Joined
Apr 5, 2023
Messages
19
How has that thing not melted yet? :oops: Did you ever touch it to see how hot it gets when doing a large transfer?



You described a "striped" vdev, not a "mirror". Are you sure it's a mirror vdev? (The GUI would have warned you before creating it if it was a "striped" vdev.)


Not an expert by any means, but my hunch is that you're reading from the ARC (RAM), and your writes are "async" anyways, which means you're seeing speeds hitting RAM before they are flushed and confirmed on persistent storage (the HDDs).

EDIT: Obligatory question: For your transfers over the network, you do mean MB/s and not Mb/s?

Haha, it is actually not even hot, it is feels a little warmer than warm but definitely not hot. As I understood there are some heatsinks inside this adapter and active airflow.

Yes, thank you for correcting me, I fixed that mistake. I meant "striped", too many new words for me.

Yes, when I talk about transfer speed I mean MB/s. Like 10Gb ethernet connection can theoretically deliver only max about 1250MB/s transfer speed.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Is that over SMB? One thing to try would be two transfers in parallel, to see if throwing more cores at it improves performance. Could also be general protocol overhead (mostly latency-driven) around small files.
 
Joined
Oct 22, 2019
Messages
3,641
Could also be general protocol overhead (mostly latency-driven) around small files.
Most definitely it.

Besides, even those files in the folder might be compressible to some degree.

If you create a massive file filled with a random data, then you can "sort of" gauge the speeds over the network. But it's still not always accurate. Real-world usage is what matters.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
If you create a massive file filled with a random data, then you can "sort of" gauge the speeds over the network.
Or disable compression on a test dataset and use zeroes. Using e.g. /dev/random as a source is limited in bandwidth by the entropy pool. /dev/zero just spits out bytes as fast as the machine (CPU and memory) can.

Since the local storage is most of the time slower than the network, I prefer to just test on the NAS and expect to get that order of magnitude over the network, too.

If the net is limited and you can saturate e.g. 1 Gbit/s, then there's nothing to worry about concerning the storage.
 

AdrianB1

Dabbler
Joined
Feb 28, 2017
Messages
29
The "problem" is that I cannot understand how am I getting write speeds of around 750MB/s and read speeds of around 1000MB/s over 10GbE connection. These 3.5 SATA drives individually have about 150MB/s Read/Write speeds so I thought ok since I have 2 of them I will get around 300MB/s in stripe vdev. These HDDs are at least 10 years old and were used for cold archiving before.

Is it because of the 128 GB RAM that takes the load during read/write operations on this stripe ZFS pool? Does it mean that if I copy folders/files less than 128GB then ZFS ARC will take the heavy lifting and I will basically saturate 10GbE connection? I have noticed that during testing only about 64GB RAM is utilized by ZFS Cache and I copied folders less than 50GB.
You copy to RAM (cache), not to the disks, so you get close to the network speed. As soon as you fill the RAM, it will work at the disk speed, estimated by you at 300 MB/sec, even lower in my experience.

The reads should be slower, unless you repeat the reads and the data is cached in RAM.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
...
Is it because of the 128 GB RAM that takes the load during read/write operations on this stripe ZFS pool? Does it mean that if I copy folders/files less than 128GB then ZFS ARC will take the heavy lifting and I will basically saturate 10GbE connection? I have noticed that during testing only about 64GB RAM is utilized by ZFS Cache and I copied folders less than 50GB.
...
TrueNAS SCALE uses Linux which ZFS is generally limited to 1/2 of the RAM for ARC. This is a limitation of the implementation for ZFS on Linux.

TrueNAS Core uses FreeBSD and has much better memory integration with ZFS, thus can use almost all free memory.
 

rigel

Dabbler
Joined
Apr 5, 2023
Messages
19
TrueNAS SCALE uses Linux which ZFS is generally limited to 1/2 of the RAM for ARC. This is a limitation of the implementation for ZFS on Linux.

TrueNAS Core uses FreeBSD and has much better memory integration with ZFS, thus can use almost all free memory.

Wow, that is a huge detail! Does anyone know if this limit can be increased somehow using config files? It just does not make sense otherwise, if for example you have 256GB RAM, a couple of ZFS pools and a couple of VMs with dedicated RAM 8GB each, then you just end up with 112GB RAM being there with no use?

256GB - 128GB - 8GB - 8GB = 112GB

It is interesting, considering all I have heard so far is that "through as much RAM as you can at TrueNAS it will utilize it all", and nobody mentioned it is only applicable to TrueNAS Scale.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Wow, that is a huge detail! Does anyone know if this limit can be increased somehow using config files?
...
I think their is an option, which I don't have handy. Perhaps someone else can answer that question.

...
It is interesting, considering all I have heard so far is that "through as much RAM as you can at TrueNAS it will utilize it all", and nobody mentioned it is only applicable to TrueNAS Scale.
I think you meant "TrueNAS Core" for the last... Though since TrueNAS Core, (and FreeNAS before it), were based on FreeBSD, there did not have to be qualifications on that "rule" of thumb for memory until recently.


Note that iXSystems is looking at a patch to the Linux kernel that may improve ZFS ARC handling, thus supporting more memory. Not as good as FreeBSD has, but better. This is not expected soon. Estimated is the Corbia version of SCALE.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
I think their is an option, which I don't have handy. Perhaps someone else can answer that question.
iirc it was a tunable.
 
Top