Planning ZFS Storage Configuration

superadmin29

Cadet
Joined
Jan 23, 2023
Messages
5
Hey y'all. I'm re-deploying my main storage server (giving VMware the boot and running TrueNAS on bare metal) and wanted to make sure I get my disk configuration right before transferring all my data back. This will be my main storage server, containing mostly media but also for use with storing ISOs, files, and app data (Bitwarden, Vault, LDAP, Sonarr, Radarr, etc).

Context:
  • The main storage server will be used only for storage (no app, jails, VMs, etc will be running) with a 10 core (20 thread) Xeon and 70GB of RAM.
  • This storage server has (12) 3.5" drive bays and a connected SAS disk shelf with (24) 2.5" bays, both of which are approximately half filled and will continue to grow as storage needs demand.
  • I have a separate backup host (4x3.5") that can store significantly less data, but will be used for backups of some core media and pertinent files. Thus data resiliency on the primary storage host is preferred but not absolutely critical.
  • All storage and compute hosts are connected by 10Gbe, and I have a 1Gpbs upstream.
  • NFS will be used for connecting shares to necessary compute hosts. I looked into it and I think that iSCSI would be overkill (feel free to prove me wrong).
  • Plex media usage can assume about 4 concurrent streams max, potentially growing slightly in the future. A Nvidia Tesla P4 is powering transcodes.
Given the above, I want to design the main storage pool to be performant as possible for my use case. I have done some researching, and I think I understand a good amount of the factors at play here, but all the specifics about storage, how it works, and how ZFS interacts with it can be a lot. I would appreciate getting some sanity checks on this before I implement it and realize I screwed up.

Here is a list of the current storage devices I have on hand:
  • (7) 3.5" WD Red Plus 10TB 7200RPM CMR (to be expanded as needed to 12)
  • (4) 2.5" SanDisk Lightning Ascend Gen II 1.6TB SAS SSD (Specs)
  • (7) 2.5" HGST Ultrastar 400GB SAS SSD (Specs)
  • (1) Samsung 970 EVO Plus 500GB M.2 NVMe SSD
Proposed pool structure:
  • (3) Mirrored 10TB Storage VDevs
  • (1) 10TB Hot Spare VDev
  • (1) Mirrored 1.6TB Metadata VDev
  • (1) Mirrored 400GB Dedup VDev
  • (1) Mirrored 500GB Log VDev (would purchase adapter, and another drive)
  • Remaining SAS SSDs would be thrown in a different storage pool
My logic behind striped mirrors is that since my array will be mostly read heavy, it offers a performance gain with still some redundancy. I'm not to worried about the space inefficiency. Also having a hot spare to help remediate drive issues quickly. The metadata video will offload the metadata from my storage disks and have an SSD performance gain. Similarly, adding the SSD VDev for space efficiency reasons. My understanding is that since NFS is synchronous, the log VDev will be useful for caching write operations. I left out a cache VDev because I don't perceive any benefit. Since media will be a majority of disk I/O and media choices will be rather random, I don't think it would prove very useful, even for general file storage.

Please let me know what your thoughts are on this setup and if there are any changes I should consider. Appreciate your time.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
  • (1) 10TB Hot Spare VDev
A hot-spare is not a separate vdev, but belongs to the vdev it is supposed to "support" in case of failure.
  • (1) Mirrored 1.6TB Metadata VDev
Since this will be absolutely critical, I would go for a 3-way mirror.
  • (1) Mirrored 400GB Dedup VDev
IMHO there is no such thing as a dedup vdev. In addition, I would caution against the use of deduplication. It requires a lot(!) more RAM than what you have. From what I remember the consensus seems to be that 256 GB RAM is a starting point, but you may easily need more.
SLOG drives have very specific requirements and a Samsung Evo is certainly not suitable. Please check the resource linked in my signature "Recommended readings" for more details.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
About layouts and performance:

Your want the parity level of your metadata vdev to at least match the parity level of the pool but consider using L2ARC (cache vdev) instead, it is possibile to use it for metadata only and make it non volatile.

Why do you think you need dedup?

You want a high endurance SSD (possibily NVMe) as your SLOG device.
 
Joined
Jul 3, 2015
Messages
926
Joined
Jul 3, 2015
Messages
926
but belongs to the vdev it is supposed to "support" in case of failure
It belongs to the pool it is supposed to "support" in case of failure
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
It belongs to the pool it is supposed to "support" in case of failure
Actually can even be in many pools at the same time.

So back to the original point, maybe not really a VDEV, more of a "list of spares".
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
There's probably some confusion around it appearing at the same level in zpool status as all the other VDEV types, so being an exception to that pattern, it's a strange one, but certainly not a VDEV as it holds no pool data at all in its home position.
 
Joined
Jul 3, 2015
Messages
926

ZFS Virtual Devices (ZFS VDEVs)​


A VDEV is a meta-device that can represent one or more devices. ZFS supports 7 different types of VDEV:

  • File - a pre-allocated file
  • Physical Drive (HDD, SDD, PCIe NVME, etc)
  • Mirror - a standard RAID1 mirror
  • ZFS software raidz1, raidz2, raidz3 'distributed' parity based RAID
  • Hot Spare - hot spare for ZFS software raid.
  • Cache - a device for level 2 adaptive read cache (ZFS L2ARC)
  • Log - ZFS Intent Log (ZFS ZIL)
VDEVS are dynamically striped by ZFS. A device can be added to a VDEV, but cannot be removed from it.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

ZFS Virtual Devices (ZFS VDEVs)​


A VDEV is a meta-device that can represent one or more devices. ZFS supports 7 different types of VDEV:

  • File - a pre-allocated file
  • Physical Drive (HDD, SDD, PCIe NVME, etc)
  • Mirror - a standard RAID1 mirror
  • ZFS software raidz1, raidz2, raidz3 'distributed' parity based RAID
  • Hot Spare - hot spare for ZFS software raid.
  • Cache - a device for level 2 adaptive read cache (ZFS L2ARC)
  • Log - ZFS Intent Log (ZFS ZIL)
VDEVS are dynamically striped by ZFS. A device can be added to a VDEV, but cannot be removed from it.
Cache, log and hot spares can.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Well, I guess that's somewhat official... but not from OpenZFS.

Oddly though, it looks almost identical to the definition from the site which OpenZFS points out to from their site (recommending it as an excellent documentation of OpenZFS) https://pthree.org/2012/12/04/zfs-administration-part-i-vdevs/

But I would begin by immediately shooting holes straight through it...

A device can be added to a VDEV, but cannot be removed from it.
Complete crap, you can absolutely do that and even in particular to the spares "VDEV" (if we're accepting it is one now).

It may be a recent (actually still waiting for official release) addition to RAIDZ device types, but you have always been able to do it for spare, mirror, cache and log VDEV types.

Also noteworthy from the linked document:
It's important to note that VDEVs are always dynamically striped
Which is also completely nuts if spare is a VDEV... it's never striped in with the pool.

In fact the drives listed in spares are not even connected to each other in any way other than presence in that list and can be on multiple lists from different pools, which is even more nuts to consider it as a VDEV.

Anyway, this discussion has been fun and I guess I'll consider the spare as a completely different kind of "special" VDEV.

That definition also misses the recently added VDEV types in addition to generally suffering from a dire lack of sensible logic... anyway, time to get off the soapbox and back to work.
 
Last edited:

superadmin29

Cadet
Joined
Jan 23, 2023
Messages
5
Since this will be absolutely critical, I would go for a 3-way mirror.
Noted, I’ll up that to 3 and probably use the remaining drive as a hot spare.
SLOG drives have very specific requirements and a Samsung Evo is certainly not suitable. Please check the resource linked in my signature "Recommended readings" for more details.
I read through the post under “ZIL and SLOG” in your signature, a lot of good information there that helps me understand how that all works. However, I still don’t see why Samsung Evo M.2 NVMe drives are frowned upon here. They meet the specifications of the post, and even when compared to the recommended Intel devices, the R/W, IOPS, MLC, etc is either the same or better. Is there something else I’m missing?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

superadmin29

Cadet
Joined
Jan 23, 2023
Messages
5
About layouts and performance:
Good info, thanks!
Your want the parity level of your metadata vdev to at least match the parity level of the pool but consider using L2ARC (cache vdev) instead, it is possibile to use it for metadata only and make it non volatile.
What is the benefit to having metadata in L2ARC instead of the ZFS metadata Vdev? Doing some googling it seems like a bit more work and will result in potentially worse write?
Why do you think you need dedup?

You want a high endurance SSD (possibily NVMe) as your SLOG device.
Yeah, going to scrap dedup. The proposed drives for SLOG are MLC NVMe.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I still don’t see why Samsung Evo M.2 NVMe drives are frowned upon here.
Simple answer, "write endurance".

A recommended SLOG like the Optane models, have 10-15x more TBW rating than the EVO range.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
What is the benefit to having metadata in L2ARC instead of the ZFS metadata Vdev? Doing some googling it seems like a bit more work and will result in potentially worse write?
Write speeds should not be impacted by L2ARC, it's not in that path (at least not in series).

L2ARC isn't pool integral, so it can be removed/added/changed and doesn't need to be redundant as losing it won't kill your pool like a metadata VDEV will.

With a little tuning, folks seem to find L2ARC in metadata only mode can really do a lot for large file trees.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
In addition to the point bringed up by srertalla about pool resiliency and flexibility, metadata vdevs (and as such fusion pools) great strenght is imho the ability to be provisioned to accept small file blocks: it roughly means that you can use a single pool for both small files and large files without bottlenecking your HDDs; if you want to store only metadata L2ARC does it in a overall better way being non-critical and more flexible.
 

superadmin29

Cadet
Joined
Jan 23, 2023
Messages
5
L2ARC isn't pool integral, so it can be removed/added/changed and doesn't need to be redundant as losing it won't kill your pool like a metadata VDEV will.
In addition to the point bringed up by srertalla about pool resiliency and flexibility, metadata vdevs (and as such fusion pools) great strenght is imho the ability to be provisioned to accept small file blocks: it roughly means that you can use a single pool for both small files and large files without bottlenecking your HDDs; if you want to store only metadata L2ARC does it in a overall better way being non-critical and more flexible.
Interesting, okay so I'm catching on now. Leave the metadata on the data vdevs so data and metadata have the same point of failure, but cache the metadata for increased performance. I assume this should follow the same space requirements as a metadata vdev? Since this is metadata only would this also benefit from performance of NVMe or is SAS SSD sufficient.

Also want to thank everyone for all the friendly and welcoming insights. I know there's a lot of these "help my plan my storage" posts but this has really helped me learn and adapt this to my specific use cases.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Leave the metadata on the data vdevs so data and metadata have the same point of failure, but cache the metadata for increased performance.
Not really: use L2ARC as substitute for the metadata vdev and set it in a way that caches metadata-only.

I assume this should follow the same space requirements as a metadata vdev? Since this is metadata only would this also benefit from performance of NVMe or is SAS SSD sufficient.
Generally NVMe is suggested for L2ARC (and SLOG), but a SATA SSD could still give you a good boost compared to a plain HDD pool. If I'm not wrong the optimal ratio of ARC:L2ARC should be 1:5 or 1:6, up to a maximum of 1:8 I believe; a 500GB NVMe is likely your best shot.
 
Top