Are reads Single-core but writes Multi-core?

bullerwins · Sep 11, 2022

Hi!

I had a system with this specs:

Truenas Core 13.0-U2

CPU: i7 920
RAM: 24GB drr3 1066mhz
NIC: Asus 10Gbit
with raidz2 4x14TB Exos and 250GB SSD L2ARC cache.

It topped out at around 3Gbit/s both reads and writes with the CPU at 100% all cores, which I expected as it was really old hardware.

I upgraded to:

CPU: i7 6700k
RAM: 16GB ddr4 2133mhz
Same NIC
Same storage.

Now for reads I get 6Gbit/s with 1 core at 100% and the rest idle, around 14% total usage.
And for writes I get the full 9+ Gbit/s with 1 core at 100% but more usage from the rest of the cores, at around 30% total usage.

I did this tests with OpenSpeed test running in a Jail.

Iperf3 has similar results.

Mac Studio with 10Gbit NIC as server and Truenas as client: 7Gbit/s
Truenas as server and Mac as client: 9.5Gbit/s

with similar CPU usage a s with Openspeedtests.

Would I just need a more powerful, specially in single core, to avoid this bottleneck?

I delidded the CPU and used liquid metal, so temps are great, 27ºC idle, 35ºC in use. So maybe overclocking might fix this?

Nick2253 · Sep 12, 2022

It's not clear how you're actually testing. I'm assuming you're using SMB, but nowhere do you specify that.

There are so many variables that go into read/write speeds, that it's important to start from the basics.

What kind of performance do you get using direct read/write tests from within TrueNAS (use dd or similar)? That will give you an understanding on what your actual hardware is capable of doing, before you introduce the variables of sharing services, network bottlenecks, etc.

When write speeds are higher than read speeds, that's usually an indication that writes are being cached in memory. If you have disabled sync writes, then you'll get this behavior. From a data-integrity perspective, that isn't a great step, but it can be appropriate in certain specific use cases.

ChrisRJ · Sep 12, 2022

I strongly second what @Nick2253 wrote about the testing.

In addition your NIC is a suspect. The recommended brands for 10 Gbps are Intel and Chelsio. Do have a chance to borrow such a NIC for a quick test?

Davvo · Sep 14, 2022

L2ARC with 24GB of RAM? Usually you want to have at least 64GB before considering it in order to not have a negative impact on performance.

bullerwins · Sep 14, 2022

Nick2253 said:
It's not clear how you're actually testing. I'm assuming you're using SMB, but nowhere do you specify that.

There are so many variables that go into read/write speeds, that it's important to start from the basics.

What kind of performance do you get using direct read/write tests from within TrueNAS (use dd or similar)? That will give you an understanding on what your actual hardware is capable of doing, before you introduce the variables of sharing services, network bottlenecks, etc.

When write speeds are higher than read speeds, that's usually an indication that writes are being cached in memory. If you have disabled sync writes, then you'll get this behavior. From a data-integrity perspective, that isn't a great step, but it can be appropriate in certain specific use cases.

I'm using a jail called Openspeed test. Installed it in the plugins community page. Access the url that the jail gives you from an Mac Studio, a make a couple of speed test, they are consistent. 6Gbit reads, 1 core 100%, 14% avg use across 8 threads.

I alse tested using iperf3, which is a tool used from the CLI. I installed in a Mac Studio. A bit better results but still "only" 7Gbit reads.

So not really SMB, i don't know that Openspeedtest under the hood, but I don't think is SMB.

Iperf3 doesn't use SMB.

I don't know if this tests actually write or read anything to disk, that's why I haven't checked caches/syncs etc. Maybe it's wrong to call it write and read and it should be send and receive.

ChrisRJ said:
I strongly second what @Nick2253 wrote about the testing.

In addition your NIC is a suspect. The recommended brands for 10 Gbps are Intel and Chelsio. Do have a chance to borrow such a NIC for a quick test?

I have the same card in an unraid system and I get the full speed. And it's weird that I get the speed on writes but not reads, I guess if there is any driver incompatibility it should affect both ways?

Davvo said:
L2ARC with 24GB of RAM? Usually you want to have at least 64GB in order to not have a negative impact on performance.

I have an extra 250GB SSD laying around, would be better to not even use it and just use the 24GB of RAM?

Davvo · Sep 14, 2022

bullerwins said:
I have an extra 250GB SSD laying around, would be better to not even use it and just use the 24GB of RAM?

You could use it as a special metadata vdev.

ChrisRJ · Sep 14, 2022

What NIC exactly do you have in the TrueNAS machine?

HoneyBadger · Sep 14, 2022

Based on the available information on their website, OpenSpeedTest is an HTML5 browser-based network speed test. It's not going to test the performance of your storage subsystem at all.

We've established that your network is running at "roughly 10Gbps" speeds based on the iperf results.

Using a Mac as an SMB client means you'll potentially be bumping into the strict sync=yes default on your actual application writes - this would choke small files and random write speeds, but might not be an issue if you plan to read and write multi-GB files in a single stream of I/O.

What is your intended use case from the client Mac machine, and your desired performance target (besides the "as fast as possible" that we're all after?)

bullerwins · Sep 14, 2022

Davvo said:
You could use it as a special metadata vdev.

But then I worry that only having that as the special metadata vdev, it that fails, I lose the whole pool.

HoneyBadger said:
Based on the available information on their website, OpenSpeedTest is an HTML5 browser-based network speed test. It's not going to test the performance of your storage subsystem at all.

We've established that your network is running at "roughly 10Gbps" speeds based on the iperf results.

Using a Mac as an SMB client means you'll potentially be bumping into the strict sync=yes default on your actual application writes - this would choke small files and random write speeds, but might not be an issue if you plan to read and write multi-GB files in a single stream of I/O.

What is your intended use case from the client Mac machine, and your desired performance target (besides the "as fast as possible" that we're all after?)

Totally true, but I thought that first I would need a network connection that reaches all the potential speed, before worrying about disk performance. That's why I did the test with those tools.

Real world test, using a 27GB iso file are, via SMB to a Mac Studio:
Reads: 291.3 MB/s
Writes: 198.4 MB/s

Of an unused file that is bigger that the L1ARC.

Files lower than 1GB are instant. 2-4GB are less than 5s

The intended use case is to use it to store videos and photos and edit remotely from the Mac. Each file is probably would be around 1-10GB. I don't have problems scrubbing in the timeline.

Small files and random writes are not a priority, it was only when populating the pool the first time from other sources (cloud and unraid). It will host all my photos and videos from my phones though, and it automatically syncs them each time. But I would not use them in batch.
No databases or similar use cases.

ChrisRJ said:
What NIC exactly do you have in the TrueNAS machine?

ASUS XG-C100C

HoneyBadger · Sep 14, 2022

bullerwins said:
But then I worry that only having that as the special metadata vdev, it that fails, I lose the whole pool.

And because you have a RAIDZ data vdev, you wouldn't be able to remove the special vdev if you attach it for testing - so don't add one.

bullerwins said:
Totally true, but I thought that first I would need a network connection that reaches all the potential speed, before worrying about disk performance. That's why I did the test with those tools.

Fair answer; we've eliminated the first bottleneck, on to the next!

bullerwins said:
Real world test, using a 27GB iso file are, via SMB to a Mac Studio:
Reads: 291.3 MB/s
Writes: 198.4 MB/s

Of an unused file that is bigger that the L1ARC.

Files lower than 1GB are instant. 2-4GB are less than 5s

With four disks in a RAIDZ2 you only have two of them writing actual data at any given point - so that's probably around the throughput limits of your pool as that breaks down to roughly 150MB/s reads and 100MB/s writes from each spindle. If you want faster, go with two mirrors, but then you run the risk of a double-disk failure killing your pool.

bullerwins said:
The intended use case is to use it to store videos and photos and edit remotely from the Mac. Each file is probably would be around 1-10GB. I don't have problems scrubbing in the timeline.

Small files and random writes are not a priority, it was only when populating the pool the first time from other sources (cloud and unraid). It will host all my photos and videos from my phones though, and it automatically syncs them each time. But I would not use them in batch.
No databases or similar use cases.

Try going into the SMB Advanced Settings (Services -> SMB -> Little Wrench Icon) and enter set strict sync = no into the advanced parameters. Restart the SMB service and see if that improves the write side of things.

bullerwins said:
ASUS XG-C100C

Aquantia AQC107 chipset, not the standard by any means, my understanding is that that driver is improving recently.

bullerwins · Sep 14, 2022

HoneyBadger said:
And because you have a RAIDZ data vdev, you wouldn't be able to remove the special vdev if you attach it for testing - so don't add one.

Sure thing. I don't have bad indexing performance at the moment.

HoneyBadger said:
Fair answer; we've eliminated the first bottleneck, on to the next!

Well, to be way to picky, I still only get 6Gbit for "reads" on the network test, so if I would to put extremely fast SSD in a pool, I wouldn't reach 10Gbit speeds for reads, but seem that I would for writes, right?

HoneyBadger said:
With four disks in a RAIDZ2 you only have two of them writing actual data at any given point - so that's probably around the throughput limits of your pool as that breaks down to roughly 150MB/s reads and 100MB/s writes from each spindle. If you want faster, go with two mirrors, but then you run the risk of a double-disk failure killing your pool.

Totally fair, with the HDDs I have, I'm having the real world performance I can expect. I was just refering to a theoretical scenario like in the previous quote if I were to upgrade.

HoneyBadger said:
Try going into the SMB Advanced Settings (Services -> SMB -> Little Wrench Icon) and enter set strict sync = no into the advanced parameters. Restart the SMB service and see if that improves the write side of things.

Is that for Scale? I'm using core, and I have a "pen" icon instead of a wrench in the Services panel next to the services name. And typing set strict sync = no into the auxility parameters gives and error.

HoneyBadger said:
Aquantia AQC107 chipset, not the standard by any means, my understanding is that that driver is improving recently.

Linux support might be better, that could explain the 10/10Gbit speeds in unraid. Except the weird "only" 6 up adn 10 down speed, it'w working fine.

Davvo · Sep 14, 2022

bullerwins said:
But then I worry that only having that as the special metadata vdev, it that fails, I lose the whole pool.

Yup, which means you should just not use it.
Especially since you only have 16GB of RAM.

HoneyBadger · Sep 14, 2022

bullerwins said:
Sure thing. I don't have bad indexing performance at the moment.

Even if you do, you'd probably be better served by either forcing metadata to remain in RAM:

https://www.truenas.com/community/threads/zfs-arc-doesnt-seem-that-smart.100423/#post-691965

or adding a metadata-only L2ARC (set secondarycache=metadata PoolName/DataSetName)

bullerwins said:
Well, to be way to picky, I still only get 6Gbit for "reads" on the network test, so if I would to put extremely fast SSD in a pool, I wouldn't reach 10Gbit speeds for reads, but seem that I would for writes, right?

I believe so. SMB is very dependant on single-threaded speed, and it looks like the i7-6700K might be tapping out trying to put that much traffic to a single client. Small thought - disable hyperthreading in the BIOS?

bullerwins said:
Totally fair, with the HDDs I have, I'm having the real world performance I can expect. I was just refering to a theoretical scenario like in the previous quote if I were to upgrade.

If there wasn't a CPU bottleneck, I'd expect 10Gbps reads from things that are in your ARC. Do you get those speeds with smaller (eg: 8GB) files?

bullerwins said:
Is that for Scale? I'm using core, and I have a "pen" icon instead of a wrench in the Services panel next to the services name. And typing set strict sync = no into the auxility parameters gives and error.
View attachment 58445

Sorry, leave out the "set" part, so just strict sync = no

bullerwins said:
Linux support might be better, that could explain the 10/10Gbit speeds in unraid. Except the weird "only" 6 up adn 10 down speed, it'w working fine.

Could be related to the AQC107 driver being more mature under Linux vs. FreeBSD. If you dd a large file from your pool into /dev/null (careful not to do it the other way!) how fast does it go using a large blocksize (eg: dd if=/mnt/poolname/directory/file of=/dev/null bs=1M)

bullerwins · Sep 14, 2022

Davvo said:
Yup, which means you should just not use it.
Especially since you only have 16GB of RAM.

Thanks! so just to be sure. I would get better performance with just 16GB of ram as L1ARC and nothing else. Than 16GB of L1ARC and 250GB SSD as L2ARC? Seemed counterintuitive to me. I guess maybe to use a L2ARC you need to use space in the L1ARC and with "only" 16Gb that would eat most of eat?

HoneyBadger said:
Even if you do, you'd probably be better served by either forcing metadata to remain in RAM:

https://www.truenas.com/community/threads/zfs-arc-doesnt-seem-that-smart.100423/#post-691965

or adding a metadata-only L2ARC (set secondarycache=metadata PoolName/DataSetName)

I believe so. SMB is very dependant on single-threaded speed, and it looks like the i7-6700K might be tapping out trying to put that much traffic to a single client. Small thought - disable hyperthreading in the BIOS?

If there wasn't a CPU bottleneck, I'd expect 10Gbps reads from things that are in your ARC. Do you get those speeds with smaller (eg: 8GB) files?

Sorry, leave out the "set" part, so just strict sync = no

Could be related to the AQC107 driver being more mature under Linux vs. FreeBSD. If you dd a large file from your pool into /dev/null (careful not to do it the other way!) how fast does it go using a large blocksize (eg: dd if=/mnt/poolname/directory/file of=/dev/null bs=1M)

Thanks for all the input! I'll do these tests tomorrow

Davvo · Sep 14, 2022

bullerwins said:
I guess maybe to use a L2ARC you need to use space in the L1ARC and with "only" 16Gb that would eat most of eat?

Exactly. You can consider @HoneyBadger suggestion of a metadata only L2ARC though.

bullerwins · Sep 16, 2022

Davvo said:
Exactly. You can consider @HoneyBadger suggestion of a metadata only L2ARC though.

HoneyBadger said:
Even if you do, you'd probably be better served by either forcing metadata to remain in RAM:

https://www.truenas.com/community/threads/zfs-arc-doesnt-seem-that-smart.100423/#post-691965

or adding a metadata-only L2ARC (set secondarycache=metadata PoolName/DataSetName)

Hi, quick question. Does the "set secondarycache=metadata PoolName/DataSetName" work per dataset? cannot I use it for the whole pool? or do I have to type the command for each dataset.
Also, is that a shell command? do I have to type it once, or do I need to add it to the start up scripts from the gui

HoneyBadger · Sep 16, 2022

bullerwins said:
Hi, quick question. Does the "set secondarycache=metadata PoolName/DataSetName" work per dataset? cannot I use it for the whole pool? or do I have to type the command for each dataset.
Also, is that a shell command? do I have to type it once, or do I need to add it to the start up scripts from the gui

It's a per-dataset command, but if it's set at the pool level all of the datasets will inherit the pool settings unless overridden.

So you could run

zfs set secondarycache=metadata PoolName

and then

zfs set secondarycache=all PoolName/DataSetThatShouldBeCached

to override the inherit.

You would run this once from a shell or SSH - it will persist at the ZFS level across reboots.

bullerwins · Sep 16, 2022

HoneyBadger said:
It's a per-dataset command, but if it's set at the pool level all of the datasets will inherit the pool settings unless overridden.

So you could run

zfs set secondarycache=metadata PoolName

and then

zfs set secondarycache=all PoolName/DataSetThatShouldBeCached

to override the inherit.

You would run this once from a shell or SSH - it will persist at the ZFS level across reboots.

As at the moments is caching "all", would I need to reset Truenas so it "clears" the "data cache" and only begins to store metadata in the SSD/L2ARC?
Would that command block the data vdevs from storing metadata now? and only go to the SSD/L2ARC and not even to RAM?

HoneyBadger · Sep 16, 2022

Honestly, I've never looked at the size of L2ARC before/after changing the settings in this manner. I assume ZFS would just stop submitting actual data records as L2ARC candidates and over time it would just invalidate the existing contents.

A reboot will of course do the trick if you don't have persistent L2ARC enabled as well.

bullerwins said:
Would that command block the data vdevs from storing metadata now? and only go to the SSD/L2ARC and not even to RAM?

Metadata will always be written back to the data vdevs in this case - this setting causes your L2ARC SSD to only capture metadata and not data records. These records will be read from RAM primarily, and if they are evacuated from RAM will be copied to the L2ARC SSD and read from there instead.

For metadata to be permanently stored/written to separate SSD vdevs requires the "special" vdev type, but that brings the risks discussed above re: loss of special vdev resulting in an UNAVAIL pool, and being unable to remove them if a RAIDZ data vdev is present.

souporman · Sep 19, 2022

HoneyBadger said:
It's a per-dataset command, but if it's set at the pool level all of the datasets will inherit the pool settings unless overridden.

So you could run

zfs set secondarycache=metadata PoolName

and then

zfs set secondarycache=all PoolName/DataSetThatShouldBeCached

to override the inherit.

You would run this once from a shell or SSH - it will persist at the ZFS level across reboots.

If one were to run these commands, would the following then be true?
- Metadata for the entire pool will be cashed in L2ARC (except the dataset called DataSetThatShouldBeCached)
- PoolName/DataSetThatShouldBeCached would use the L2ARC disk for data caching and not metadata(i.e. this dataset's metadata would only live on spinners)?

Important Announcement for the TrueNAS Community.

Are reads Single-core but writes Multi-core?

Dabbler

Wizard

Wizard

MVP

Dabbler

MVP

Wizard

actually does care

Dabbler

actually does care

Dabbler

MVP

actually does care

Dabbler

MVP

Dabbler

actually does care

Dabbler

actually does care

Explorer

Similar threads