ZFS Metadata Special Device: Small (40MB) Files?

oguruma

Patron
Joined
Jan 2, 2016
Messages
226
I have about 8TB of photos on my NAS. Most of them are raw files, which are 30-50MB.

I'm building a new pool, and I might add a metadata special device to speed up performance in terms of finding images over SMB.

For, say, 16TB of storage (12TB of that being 30-50MB file size and the rest being 2ish GB movies), how much storage would I need for the Metadata special device VDEV? From what I've read, 3% is about right, but does that apply with lots of smaller files? (Or is 30-50MB, even considered "small")?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
If you are serious about having 12TBs of "small files", about 30MB-50MB in size, then your Metadata special vDev would need to be 12TB.

A Metadata special vDev is NOT a cache of any sort. It is the first line of storage for what ever has been allowed into it. Like small files. Thus, if you had a 1TB Mirrored Metadata special vDev, after it filled up, all new small files would go to the main data vDevs. Then on read, ZFS would read the newer small files from the main data vDevs.
 
Joined
Jun 15, 2022
Messages
674
If you are serious about having 12TBs of "small files", about 30MB-50MB in size, then your Metadata special vDev would need to be 12TB.

A Metadata special vDev is NOT a cache of any sort. It is the first line of storage for what ever has been allowed into it. Like small files. Thus, if you had a 1TB Mirrored Metadata special vDev, after it filled up, all new small files would go to the main data vDevs. Then on read, ZFS would read the newer small files from the main data vDevs.
That seems a bit limited in scope from the official docs:
Fusion Pools -- What's a special VDEV?​
A special VDEV can store metadata such as file locations and allocation tables. The allocations in the special class are dedicated to specific block types. By default, this includes all metadata, the indirect blocks of user data, and any deduplication tables. The class can also be provisioned to accept small file blocks. This is a great use case for high performance but smaller sized solid-state storage. Using a special vdev drastically speeds up random I/O and cuts the average spinning-disk I/Os needed to find and access a file by up to half.​
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
A fusion pool will store metadata - which doesn't take much space. It doesn't cache it like L2ARC. If L2ARC fails then it doesn't matter. If a special vdev fails then you have lost the pool.

If you want to use the secondary function of a special vdev, which is to store small files as well as metadata then you need to be a bit careful of sizing although the vdev will go I think to 75% and then store small files on the main other vdevs as usual. If you wanted to store 12TB of small files then you will need something like 16TB of special vdev (which is a bit silly)
 
Joined
Jun 15, 2022
Messages
674
This is a great topic and really helpful.

Filesystems typically cache the metadata when practical. Therefore more RAM is more better. The takeaway is: Don't fix a problem you don't have. In this case, until directory loads are slow because the metadata exceeds the available RAM do not create a special VDev because it will be orders of magnitude slower than RAM and creates another critical point of failure, so avoiding total data loss requires planning, hardware, and expense.

From personal experience (which may not apply) I think there are better ways around this. Images are stored as a date-code.RAW, so if the directory structure is reasonably organized it should be very quick and easy to navigate the tree to the file needed, and a Find is meaningless. Where the problem comes in is storing other unrelated data with the images that does require a Find, such as saved web pages of Tips and Tricks which can also generate a lot of tiny file entries (hint: Print to PDF).

What I do is keep photos under subject\narrower_subject\narrower_subject and then a standard tree within the set such as:
CompressorX
- docs
- resources
- misc
- published_article
-- icons
-- images
- processed_images
-- used
-- unused
- raw_images
-- used
-- unused

For photos many marketing firms use two different data volumes:
+ live_jobs
+ cold_storage

Perhaps the live job array would be a super-fast SSD setup, and cold storage a set of HDDs arranged for capacity rather than speed.

On the directory structure, many new users arrange by year; year doesn't matter and most people can't remember it. Arrange by customer\event\sub-event--it keeps all the things you need to know at your fingertips.
 

CookieMonster

Dabbler
Joined
May 26, 2022
Messages
34
It is the first line of storage for what ever has been allowed into it. Like small files.

If I create a special VDEV from low latency Optane drives, how do I assign the system to store small files on it? Do I decide how small is small, or does TrueNAS decide dynamically based on how full the special VDEV is?

Thanks
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Thats the complex bit.
Any dataset has a record size associated with it. The default is 128K
You must set the Metadata (Special) Small Block Size to < record size otherwise ZFS will attempt to store files on the special device - which will work up to 80% of the special device and then the files will be stored on the main datavdevs

After that its a manual choice (its in advanced options under edit dataset). Set that value to what you think it should be and ZFS will (from that point on) attempt to store all files that are less than that size on the special device.

It requires some tuning though.

Warning - there may be a bug on Cobia atm - dunno wether its a GUI bug or an actual bug - but the metadata size is not being saved
 
Last edited:
Top