How to do that with NVME drives?
Hi all. according to the documentation this is the way to use only a set amount of a drive's capacity: https://www.ixsystems.com/documentation/freenas/11.3-U3.2/storage.html#overprovisioning This is the result when I try that on my system: root@freenas-pmh[~]# disk_resize nvd0 16GB Resizing...
www.ixsystems.com
I don't know what they're doing in the command.
In theory, if you reset a drive (losing all the mappings), and simply allocate a much smaller SLOG partition up front, and don't use the rest of the drive, you should get the correct behaviour.
I had actually written quite a bit on the topic into the ticket and then the stupid Bugzilla had apparently logged me out by the time I posted it, vanishing a fairly detailed post.
So here's the idea.
SSD's are actually huge-block devices. Sorta like SMR

In order to cope with that, they allow writing to smaller "pages", sometimes 8192 bytes (which is where ashift 13 comes from). I still
recommend the Ars tutorial on this as a general explainer of how the technology works.
So the thing here is that SSD's have this mechanism which causes the consolidation of pages into new blocks. This is predicated on the idea that an SSD has lots of user filesystem data on it. The normal workload for an SSD would have lots of data being retained long-term, while other data gets updated (Windows updated, you saved a spreadsheet, etc). And the average user who buys a 500GB SSD expects to be able to store ~~500GB of stuff. So your typical SSD comes with only a little extra flash space. The drive tries to maintain a healthy pool of new erased blocks so that when you do a big write, it can do these quickly.
The problem is, SLOG isn't like that, at all. SLOG is short-term data, which will be read a maximum of one time, and then only under duress (replay during import).
If you have a 500GB SLOG on a 1Gbps ethernet NAS, the maximum amount that could be written in a five second transaction group is around 625MB, and since you also have one being committed to the pool, you really can't make effective use of more than maybe 2GB of SLOG. (Let's not argue jails or stuff, trying to get the basic idea across here.) However, if your FreeNAS creates a 500GB partition on that 500GB SLOG, what's going to happen is that the SLOG is written to the entire 500GB. It won't come around to LBA #123456 very often... about once an hour.
But, this is also stressing the controller a bit, because if you're constantly writing all 500GB, you still have a relatively small pool of erased blocks. Depending on how well the controller figures out what's going on, you might not be actively thrashing about doing tons of garbage collection, but this is still a stupid thing to do.
There are two general fixes.
One is to rely on TRIM. TRIM involves sending craptons of extra commands at the drive, and not all drives support it, or support it correctly. Those extra commands are chewing up precious bandwidth on the SATA/SAS link, increasing latency of actual SLOG writes. On the upside, it lets the host tell the SSD exactly what isn't needed anymore. This does have the advantage of being 100% correct, but only if the drive does something useful with the data, and at a performance penalty of needing to do those other transactions.
The other is to lean on statistics. If we know that our maximum possible SLOG usage is a certain amount, let's use the 2GB as an example, then we can be clever. Resetting the drive to factory resets all the page mappings and the drive will work its way through all the blocks and erase them. This is a required precondition for this trick; it does not matter if it happened at the factory on a new drive, or if you do it manually on an existing drive that has had data previously written to it. The end result is a guaranteed massive pile of erased blocks. Now you create your 2GB partition on that and start writing to it. Well, because you are writing sequentially, an SSD controller will tend to pick contiguous pages on the same block, but even if it doesn't, there are so many available erased blocks, it isn't a problem.
When you get to the end of your 2GB partition and cycle around back to partition sector #1, you still have around 480GB worth of erased blocks out there. The controller is not going to waste time trying to garbage collect and consolidate pages, it is under no pressure. This is extremely good for wear leveling, as you should never get an unnecessary update to a block. When it writes those first few hundred sectors, the underlying old flash block no longer has any references, gets thrown onto the dirty page pile, and gets erased at the drive's convenience.
If the controller is struggling to keep pace, the other thing is that it can soak up a "sprint" of continuous SLOG activity, up to 480GB's worth, even if it cannot be actively erasing old flash blocks.
And the thing is, unlike the TRIM case, this doesn't rely on TRIM working, doesn't involve bogging down the drive with extra TRIM commands, and simply plays to the natural design of SSD's, taking advantage of how they work to get optimized behaviour.
So from my perspective, using TRIM for this is basically an example of saying "I can't think my way through the underlying problems to reach an obvious solution."
Also, as far as I'm concerned, overprovisioning refers to the amount of extra flash a manufacturer includes in an SSD (for example a "500GB" SSD will typically have 512GB but only advertises 500). Underprovisioning refers to artificially increasing that pool by using less than the advertised amount. Unfortunately, precision is a lost cause.