SOLVED How do writes to metadata / dedup vdevs work?

Herr_Merlin

Patron
Joined
Oct 25, 2019
Messages
200
I assume a rather stupid question:
- How do writes to metadata vdevs work?

Assuming you have a pool of disk, a pair of ssds as metadata vdev and some mirrored devices as slog.

Assumention:

Data: RAM+SLOG -> disk -> done
Metadata RAM+SLOG -> disk -> done

Another possibility:
Data: RAM+SLOG -> disk -> done
Metadata RAM -> disk -> done

So the rather stupid question is: Do you need SSDs with power loss protection for the Metadata or not. If all data is going through SLOG there is no need. If not and the writes happen in parallel PLP would be strongly needed.

Another question: How is the dedup data handled for a dedup vdev?
Will those data always go though SLOG if not PLP would be needed as well.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
Metadata RAM -> disk -> done
I don't think it would be this as SLOG/ARC are pool-wide and the special VDEVs are just VDEVS after all, so included in the pool.

Another question: How is the dedup data handled for a dedup vdev?
Will those data always go though SLOG if not PLP would be needed as well.
Same answer, meaning not required... will be interesting to see if anyone can come up with a surprise on this and say otherwise.
 

Herr_Merlin

Patron
Joined
Oct 25, 2019
Messages
200
Well yeah don't want to be the one who tests this... thus asking upfrot
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Assumptions of my own for those reading; when I say "PLP" I am implying "PLP for in-flight data" not just for data-at-rest. All SSDs used for ZFS should provide data assurance at rest, whether through hardware or firmware.

Data: RAM+SLOG -> disk -> done
Metadata: RAM+SLOG -> disk -> done

Closer to this. ZFS metadata writes are synchronous, but some metadata is generated based off of the contents of data (eg: checksums, block pointers) so it "spontaneously appears" in the transaction group on its own. The path is more like:

Data+metadata -> (RAM+SLOG) -> generate some additional metadata into RAM -> Close Transaction Group -> Drain txg data async -> Drain txg metadata sync

So the rather stupid question is: Do you need SSDs with power loss protection for the Metadata or not. If all data is going through SLOG there is no need. If not and the writes happen in parallel PLP would be strongly needed.

PLP is strongly recommended from a performance perspective because the "metadata drain" is a sync operation. It will work without it and still likely be orders of magnitude faster than metadata-on-data-vdev if the data vdevs just based on SSD vs HDD, but a good metadata SSD will have consistent sustained performance which often brings PLP to the party as well.

Another question: How is the dedup data handled for a dedup vdev?
Will those data always go though SLOG if not PLP would be needed as well.
Dedup data is just metadata and is handled the same way - PLP suggested for performance, not necessary for safety. Important note though is that the dedup workload is extremely punishing on SSDs (almost exclusively 4K at low queue depths with mixed R/W) and therefore Optane is highly recommended. Member @Stilez has written an excellent resource based on their experience and I'll pull a few graphs from it, hopefully the inline works.


optane1-png.40217
optane2-png.40218


Notice how a traditional NAND SSD has progressively higher/worse read latency under a mixed read/write workload, whereas Optane is like my avatar animal - "Optane don't care about mixed workloads, Optane don't give a [Family Friendly]" - it just delivers a flat line of consistent latency.
 

Herr_Merlin

Patron
Joined
Oct 25, 2019
Messages
200
That answers my question perfectly. Many Thanks.
 
Top