Sector size for SSDs

xyzzy

Explorer
Joined
Jan 26, 2016
Messages
76
I've got a mixture of Intel SSDs (SATA and NVMe) that I want to use in a new TrueNAS Core server.

According to Intel's MAS CLI:
  • The SATA SSDs are 4K physical / 512 logical.
  • The NVME SSDs do not indicate a physical sector size but use 512 logical.
When creating VDEVs on these drives with the current version of TrueNAS, ZFS is using an ashift value of 12.

Question: Is there a performance benefit to configuring these drives to use 4K logical and/or tweaking the ashift value?
 

blanchet

Guru
Joined
Apr 17, 2018
Messages
516
There is no benefit to change the default values of TrueNAS, except if your NVME SSD has 8K physical sectors, in this case you have to use ashift=13

See this excellent article for more details:
 

xyzzy

Explorer
Joined
Jan 26, 2016
Messages
76
There is no benefit to change the default values of TrueNAS, except if your NVME SSD has 8K physical sectors, in this case you have to use ashift=13

See this excellent article for more details:
Unfortunately, I haven't been able to determine the physical sector size on my NVMe SSDs (P5510 and P4801X). It *appears* both can be set to use 512 or 4K logical. Is there any benefit to doing an "NVMe format" that sets them to use 4K logical?
 

xyzzy

Explorer
Joined
Jan 26, 2016
Messages
76
I found a few docs like this one (https://openzfs.github.io/openzfs-docs/Performance and Tuning/Hardware.html#sector-size) that indicate NVMe drives should be formatted to use 4096 byte sectors for optimal performance.

I tried this with a single P5510 drive where I changed the logical sector size from 512 bytes to 4096 bytes.

In both cases (before and after the change), TrueNAS automatically created the VDEV with ashift=12.

I ran some fio tests (on a VOL, then a ZVOL) and was surprised that performance difference between 512 and 4096 byte sectors was negligible.

My best guess is that the ashift=12 setting effectively meant ZFS was treating the drive as a 4096 byte sector drive even when the drive was set to 512 byte logical sectors.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
There is no benefit to change the default values of TrueNAS, except if your NVME SSD has 8K physical sectors, in this case you have to use ashift=13

I haven't seen SSD's with 8K physical sectors, perhaps you mean the page size?

In any case, this isn't just NVMe SSD. If you can find the underlying physical page size for an SSD, it is best to set ashift to that size. This eliminates the read-update-write cycle 4K or 512b sectors often incur with SSD's. A good explainer is here:

https://arstechnica.com/information...volution-how-solid-state-disks-really-work/3/

If you have an 8K page but a 4K or 512b sector size, you end up burning through the free page pool more quickly and also incur additional erase cycles and all the "tetris-like" shenanigans.
 

xyzzy

Explorer
Joined
Jan 26, 2016
Messages
76
I haven't seen SSD's with 8K physical sectors, perhaps you mean the page size?

In any case, this isn't just NVMe SSD. If you can find the underlying physical page size for an SSD, it is best to set ashift to that size. This eliminates the read-update-write cycle 4K or 512b sectors often incur with SSD's. A good explainer is here:

https://arstechnica.com/information...volution-how-solid-state-disks-really-work/3/

If you have an 8K page but a 4K or 512b sector size, you end up burning through the free page pool more quickly and also incur additional erase cycles and all the "tetris-like" shenanigans.
I'd like to get ashift=13 a try but it appears the GUI may be hard-coded to use ashift=12?

Is there a way to see the *exact* command line(s) the GUI is using to create a pool so I can do those manually in the shell but with ashift set to 13?

I've Googled around a bit and the command line args I've seen listed are pretty extensive. I'd like to keep my setup as "standard" and close to the GUI as possible.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I haven't tried recently, but in the past, yes, the exact ZFS commands being issued are available in the logs somewhere. You have the correct idea about trying to keep it as "standard" as possible IMO. I have always had to create ashift=13 pools manually from the CLI, and then just import them, but I haven't done this in some time.
 

xyzzy

Explorer
Joined
Jan 26, 2016
Messages
76
I tried searching the logs right after creating a new pool but didn't find the commands anywhere. My guess would have been that this info would have been in /var/log/middlewared.log or /var/log/debug.log.

Maybe I need to set a variable somewhere to enable extra logging by the TrueNAS GUI?
 

xyzzy

Explorer
Joined
Jan 26, 2016
Messages
76
Well, I couldn't find any way to get the UI to log what it's doing. However, I *did* find a way to temporarily change the UI's ashift=12 hardcode to ashift=13.

So, I re-ran my fio tests against the VOL and against the ZVOL. Surprisingly, I didn't see any difference between all 3 of these setups:

1) Drive using 512 byte logical sectors, TrueNAS using ashift=12
2) Drive using 4096 byte logical sectors, TrueNAS using ashift=12
3) Drive using 4096 byte logical sectors, TrueNAS using ashift=13

Am I doing something wrong?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
What sort of differences were you expecting to see with "fio tests"?

You're not likely to see any performance difference until the free page pool is being stressed, which is primarilyy going to happen when the disk is full (so that all sectors are mapped) and your write activity is high enough that the free page pool becomes depleted. By eliminating the read-update-write cycle when it is not necessary, performance should increase there.

But the real win is the elimination of flash-killing read-update-write cycles.
 

xyzzy

Explorer
Joined
Jan 26, 2016
Messages
76
Based on a few articles I saw, I was expecting (at least) better write performance on my assumption that bigger writes means less read-update-write cycles.

I also understand the elimination of the read-update-write cycles being a big benefit to drive longevity. But it's not one I can measure and simply have to trust it's happening.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Most SSD performance issues are hidden behind years of SSD performance optimization tricks involving caches, code, and cleverness. However, at the end of the day, if you understand what is supposed to be going on, you can usually design a pathological case that hits it where it hurts. When worrying about the free pool in an SSD, you really need to be aware of all the implementation details, such as what the reserve size is, and then figure out how to flood it with your hypothetical test.

I once had an insipid argument with Jordan when I asked iX to default to undersizing the SLOG partition in order to get better free pool performance under heavy write loads. Jordan asked me to PROVE that this was the case, and I declined, because I've got better things to do than to do someone else's product improvement proofs for them. Apparently they did eventually get around to figuring this out themselves -- after Jordan was gone. But one of the things to remember here is that there is a relationship between theory and physical reality. The theory that ashift=13 reduces wear on 8K devices might be right or wrong, because who knows what implementation details might mess with it, but the physical reality is that not-ashift=13 is going to cause partial writes or garbage collection on 8K devices in some (probably many) cases *guaranteed*.
 

xyzzy

Explorer
Joined
Jan 26, 2016
Messages
76
Yep, as a software engineer, I totally get all of that. Plus I realize that a lot of the SSD internals are not publicly known so there's only so far I can reasonably take my optimization efforts.

I just found it surprising that 8 separate fio tests came back with no changes after switching to 4096 byte sectors and ashift=13 and wanted to make sure I wasn't doing anything wrong.

(In case you're wondering, the fio tests I'm running basically mimic the 8 CrystalDiskMark tests when using the "NVMe" setting.)
 

ByteMan

Dabbler
Joined
Nov 10, 2021
Messages
32
Can SSDs with varying sector sizes (based on what the drives reports as sector size, as well as based on what is really going on in the background) be used to build a raidz vdev? Is there potential for issues with such a setup?
 
Top