Do Metadata vdevs store Deduplication Tables?

foreverjake

Cadet
Joined
Apr 21, 2023
Messages
5
Hello,
I'm looking to understand the function of Metadata vdevs vs Deduplication vdevs. I think I understand the purpose of each individually, but can a Metadata vdev perform the function of both? I'm looking to take advantage of both a Metadata vdev and deduplication, but only have 4 NVME PCIe cards and would like to use two of these as a separate data vdev. So I'm wondering if the Metadata vdev will also store the DDT or do I have to suck it up and use 2 of these cards each for separate vdevs? I would like to setup this system with 2 Data vdevs as separate pools with one composed of 12 x 12TB drives in a RAIDZ2 and the other with a mirror of 2 x 3.2TB cards. I want to use deduplication and a separate Metadata vdev on the RAIDZ2 pool.

Server:
2 x Xeon 20 core CPU's
512Gb RAM
2 x 200GB Intel SSD's as a boot raid
12 x 12TB SATA drives
2 x 1.6TB HHHL
2 x 3.2TB HHHL
4 x 10Gb/s NICS all in a trunked lagg

This is green field setup and I will test it with about 50TB of data from an existing server, but I was looking for some guidance on the function of the vdev's. I have seen several comments that leave me confused as to whether or not DDT is considered metadata.

Thanks
Mike
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
If you don't have a metadata or dedup VDEV, the metadata and dedup data is in your pool VDEV(s).

The table on page 7 of this presentation seems to make clear that the dedup table will live on your pool VDEV(s) unless you have a dedup class VDEV.

 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
can a Metadata vdev perform the function of both?

The intent of special vdevs is to offload specific kinds of pool traffic to devices more appropriate to the traffic type. Special vdevs are intended to offload high IOPS traffic onto SSD or possibly Optane. Optane is recommended for special vdevs used to store DDT.

https://www.truenas.com/docs/references/zfsdeduplication/

High quality mirrored SSDs configured as a “special vdev” for the DDT (and usually all metadata) are strongly recommended for deduplication unless the entire pool is built with high quality SSDs. Expect potentially severe issues if these are not used as described below. NVMe SSDs are recommended whenever possible. SSDs must be large enough to store all metadata.

This makes for some confusion for new users, as there are multiple ways to store metadata -- in the main pool, in persistent L2ARC, and in a special vdev. The main problem with a special vdev is that it is a root device in the pool. If the special vdev is unavailable for any reason, the pool is unusable. Also, the special vdev MUST be able to hold ALL the metadata. This is different from persistent L2ARC, where the loss of an L2ARC device is not fatal and absence merely causes the system to refer to the main pool (possibly rather slowly).

As far as I can tell, metadata vdevs are not eligible for ddt data. This makes sense because you would probably use an SSD for metadata but Optane for DDT.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
As far as I can tell, metadata vdevs are not eligible for ddt data. This makes sense because you would probably use an SSD for metadata but Optane for DDT.
Dedup data is considered metadata, so it should be put on the general metadata special vdevs if there aren't any dedicated dedup type vdevs.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Consider having your dedup working set stored in L2ARC instead since it's not mission-critical; you can make it (sort of) permanent and strip different drives in order to more easily expand it if you require more space (though you need to increase ARC as well).

I'm assuming you know the dangers of ddt.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Dedup data is considered metadata, so it should be put on the general metadata special vdevs if there aren't any dedicated dedup type vdevs.

Well I have to agree with the poster that this is clear as mud. I actually looked at this for awhile without coming to an obvious answer.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Well I have to agree with the poster that this is clear as mud. I actually looked at this for awhile without coming to an obvious answer.​
The only definition I was able to find is the following (source).​
A special VDEV can store metadata such as file locations and allocation tables. The allocations in the special class are dedicated to specific block types. By default, this includes all metadata, the indirect blocks of user data, and any deduplication tables. The class can also be provisioned to accept small file blocks. This is a great use case for high performance but smaller sized solid-state storage. Using a special vdev drastically speeds up random I/O and cuts the average spinning-disk I/Os needed to find and access a file by up to half.​
 
Joined
Oct 22, 2019
Messages
3,641
@Davvo, I'm giving you an angry face reaction because I was going to post something similar just now, and you beat me by a few seconds.

I will replace the angry face with a thumbs up after five minutes. I hope you understand.

EDIT: Five minutes are up. Angry face removed. Replaced with thumbs up. Thank you for being cooperative.
 
Last edited:

foreverjake

Cadet
Joined
Apr 21, 2023
Messages
5
So there is no clear answer to this question? It would seem that there being two different special vdev's would indicate that they do different functions, but the documentation seems to indicate the a metadata vdev will store DDT if deduplication is turned on on an associated data vdev. Is the logic that having a second type of special vdev is just a way of further separating the workload or protecting the data across more storage options?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
So there is no clear answer to this question?
There is: as we (clearly?) said before DDT is considered metadata and will be stored on the metadata special vdev as default behaviour.

Is the logic that having a second type of special vdev is just a way of further separating the workload or protecting the data across more storage options?
The logic of having a special vdev (composed of more than one device) is simply to gain performance.
Using a special vdev drastically speeds up random I/O and cuts the average spinning-disk I/Os needed to find and access a file by up to half.
 
Top