Planning ZFS Storage Configuration

superadmin29 · Jan 25, 2023

Hey y'all. I'm re-deploying my main storage server (giving VMware the boot and running TrueNAS on bare metal) and wanted to make sure I get my disk configuration right before transferring all my data back. This will be my main storage server, containing mostly media but also for use with storing ISOs, files, and app data (Bitwarden, Vault, LDAP, Sonarr, Radarr, etc).

Context:

The main storage server will be used only for storage (no app, jails, VMs, etc will be running) with a 10 core (20 thread) Xeon and 70GB of RAM.
This storage server has (12) 3.5" drive bays and a connected SAS disk shelf with (24) 2.5" bays, both of which are approximately half filled and will continue to grow as storage needs demand.
I have a separate backup host (4x3.5") that can store significantly less data, but will be used for backups of some core media and pertinent files. Thus data resiliency on the primary storage host is preferred but not absolutely critical.
All storage and compute hosts are connected by 10Gbe, and I have a 1Gpbs upstream.
NFS will be used for connecting shares to necessary compute hosts. I looked into it and I think that iSCSI would be overkill (feel free to prove me wrong).
Plex media usage can assume about 4 concurrent streams max, potentially growing slightly in the future. A Nvidia Tesla P4 is powering transcodes.

Given the above, I want to design the main storage pool to be performant as possible for my use case. I have done some researching, and I think I understand a good amount of the factors at play here, but all the specifics about storage, how it works, and how ZFS interacts with it can be a lot. I would appreciate getting some sanity checks on this before I implement it and realize I screwed up.

Here is a list of the current storage devices I have on hand:

(7) 3.5" WD Red Plus 10TB 7200RPM CMR (to be expanded as needed to 12)
(4) 2.5" SanDisk Lightning Ascend Gen II 1.6TB SAS SSD (Specs)
(7) 2.5" HGST Ultrastar 400GB SAS SSD (Specs)
(1) Samsung 970 EVO Plus 500GB M.2 NVMe SSD

Proposed pool structure:

(3) Mirrored 10TB Storage VDevs
(1) 10TB Hot Spare VDev
(1) Mirrored 1.6TB Metadata VDev
(1) Mirrored 400GB Dedup VDev
(1) Mirrored 500GB Log VDev (would purchase adapter, and another drive)
Remaining SAS SSDs would be thrown in a different storage pool

My logic behind striped mirrors is that since my array will be mostly read heavy, it offers a performance gain with still some redundancy. I'm not to worried about the space inefficiency. Also having a hot spare to help remediate drive issues quickly. The metadata video will offload the metadata from my storage disks and have an SSD performance gain. Similarly, adding the SSD VDev for space efficiency reasons. My understanding is that since NFS is synchronous, the log VDev will be useful for caching write operations. I left out a cache VDev because I don't perceive any benefit. Since media will be a majority of disk I/O and media choices will be rather random, I don't think it would prove very useful, even for general file storage.

Please let me know what your thoughts are on this setup and if there are any changes I should consider. Appreciate your time.

ChrisRJ · Jan 26, 2023

superadmin29 said:
(1) 10TB Hot Spare VDev

A hot-spare is not a separate vdev, but belongs to the vdev it is supposed to "support" in case of failure.

superadmin29 said:
(1) Mirrored 1.6TB Metadata VDev

Since this will be absolutely critical, I would go for a 3-way mirror.

superadmin29 said:
(1) Mirrored 400GB Dedup VDev

IMHO there is no such thing as a dedup vdev. In addition, I would caution against the use of deduplication. It requires a lot(!) more RAM than what you have. From what I remember the consensus seems to be that 256 GB RAM is a starting point, but you may easily need more.

superadmin29 said:
(1) Mirrored 500GB Log VDev (would purchase adapter, and another drive)

SLOG drives have very specific requirements and a Samsung Evo is certainly not suitable. Please check the resource linked in my signature "Recommended readings" for more details.

Davvo · Jan 26, 2023

About layouts and performance:

ZFS Storage Pool Layout

This resource was originally created by user: @Davvo on the TrueNAS Community Forums Archive. https://www.truenas.com/community/resources/zfs-storage-pool-layout.201/download [1] This amazing document, created by iXsystems in February 2022 as a “White Paper”, cleanly explains how to qualify...

www.truenas.com

Your want the parity level of your metadata vdev to at least match the parity level of the pool but consider using L2ARC (cache vdev) instead, it is possibile to use it for metadata only and make it non volatile.

Why do you think you need dedup?

You want a high endurance SSD (possibily NVMe) as your SLOG device.

Johnny Fartpants · Jan 26, 2023

ChrisRJ said:
A hot-spare is not a separate vdev

It is actually

Johnny Fartpants · Jan 26, 2023

ChrisRJ said:
but belongs to the vdev it is supposed to "support" in case of failure

It belongs to the pool it is supposed to "support" in case of failure

sretalla · Jan 26, 2023

Johnny Fartpants said:
It belongs to the pool it is supposed to "support" in case of failure

Actually can even be in many pools at the same time.

So back to the original point, maybe not really a VDEV, more of a "list of spares".

Davvo · Jan 26, 2023

It's a hotspare

sretalla · Jan 26, 2023

There's probably some confusion around it appearing at the same level in zpool status as all the other VDEV types, so being an exception to that pattern, it's a strange one, but certainly not a VDEV as it holds no pool data at all in its home position.

Johnny Fartpants · Jan 26, 2023

Kernel/Reference/ZFS - Ubuntu Wiki

ZFS Virtual Devices (ZFS VDEVs)

A VDEV is a meta-device that can represent one or more devices. ZFS supports 7 different types of VDEV:

File - a pre-allocated file
Physical Drive (HDD, SDD, PCIe NVME, etc)
Mirror - a standard RAID1 mirror
ZFS software raidz1, raidz2, raidz3 'distributed' parity based RAID
Hot Spare - hot spare for ZFS software raid.
Cache - a device for level 2 adaptive read cache (ZFS L2ARC)
Log - ZFS Intent Log (ZFS ZIL)

VDEVS are dynamically striped by ZFS. A device can be added to a VDEV, but cannot be removed from it.

Davvo · Jan 26, 2023

Johnny Fartpants said:
Kernel/Reference/ZFS - Ubuntu Wiki

ZFS Virtual Devices (ZFS VDEVs)

A VDEV is a meta-device that can represent one or more devices. ZFS supports 7 different types of VDEV:

File - a pre-allocated file

Physical Drive (HDD, SDD, PCIe NVME, etc)

Mirror - a standard RAID1 mirror

ZFS software raidz1, raidz2, raidz3 'distributed' parity based RAID

Hot Spare - hot spare for ZFS software raid.

Cache - a device for level 2 adaptive read cache (ZFS L2ARC)

Log - ZFS Intent Log (ZFS ZIL)

VDEVS are dynamically striped by ZFS. A device can be added to a VDEV, but cannot be removed from it.

Cache, log and hot spares can.

sretalla · Jan 26, 2023

Well, I guess that's somewhat official... but not from OpenZFS.

Oddly though, it looks almost identical to the definition from the site which OpenZFS points out to from their site (recommending it as an excellent documentation of OpenZFS) https://pthree.org/2012/12/04/zfs-administration-part-i-vdevs/

But I would begin by immediately shooting holes straight through it...

From Ubuntu Wiki said:
A device can be added to a VDEV, but cannot be removed from it.

Complete crap, you can absolutely do that and even in particular to the spares "VDEV" (if we're accepting it is one now).

It may be a recent (actually still waiting for official release) addition to RAIDZ device types, but you have always been able to do it for spare, mirror, cache and log VDEV types.

Also noteworthy from the linked document:

It's important to note that VDEVs are always dynamically striped

Which is also completely nuts if spare is a VDEV... it's never striped in with the pool.

In fact the drives listed in spares are not even connected to each other in any way other than presence in that list and can be on multiple lists from different pools, which is even more nuts to consider it as a VDEV.

Anyway, this discussion has been fun and I guess I'll consider the spare as a completely different kind of "special" VDEV.

That definition also misses the recently added VDEV types in addition to generally suffering from a dire lack of sensible logic... anyway, time to get off the soapbox and back to work.

superadmin29 · Jan 26, 2023

ChrisRJ said:
Since this will be absolutely critical, I would go for a 3-way mirror.

Noted, I’ll up that to 3 and probably use the remaining drive as a hot spare.

ChrisRJ said:
SLOG drives have very specific requirements and a Samsung Evo is certainly not suitable. Please check the resource linked in my signature "Recommended readings" for more details.

I read through the post under “ZIL and SLOG” in your signature, a lot of good information there that helps me understand how that all works. However, I still don’t see why Samsung Evo M.2 NVMe drives are frowned upon here. They meet the specifications of the post, and even when compared to the recommended Intel devices, the R/W, IOPS, MLC, etc is either the same or better. Is there something else I’m missing?

Davvo · Jan 26, 2023

There is also TrueNAS documentation (SCALE in this case since the opener is using that).

/scale/scaleuireference/storage/pools/

superadmin29 · Jan 26, 2023

Davvo said:
About layouts and performance:

ZFS Storage Pool Layout

This resource was originally created by user: @Davvo on the TrueNAS Community Forums Archive. https://www.truenas.com/community/resources/zfs-storage-pool-layout.201/download [1] This amazing document, created by iXsystems in February 2022 as a “White Paper”, cleanly explains how to qualify...

www.truenas.com

Good info, thanks!

Davvo said:
Your want the parity level of your metadata vdev to at least match the parity level of the pool but consider using L2ARC (cache vdev) instead, it is possibile to use it for metadata only and make it non volatile.

What is the benefit to having metadata in L2ARC instead of the ZFS metadata Vdev? Doing some googling it seems like a bit more work and will result in potentially worse write?

Davvo said:
Why do you think you need dedup?

You want a high endurance SSD (possibily NVMe) as your SLOG device.

Yeah, going to scrap dedup. The proposed drives for SLOG are MLC NVMe.

sretalla · Jan 26, 2023

superadmin29 said:
I still don’t see why Samsung Evo M.2 NVMe drives are frowned upon here.

Simple answer, "write endurance".

A recommended SLOG like the Optane models, have 10-15x more TBW rating than the EVO range.

sretalla · Jan 26, 2023

superadmin29 said:
What is the benefit to having metadata in L2ARC instead of the ZFS metadata Vdev? Doing some googling it seems like a bit more work and will result in potentially worse write?

Write speeds should not be impacted by L2ARC, it's not in that path (at least not in series).

L2ARC isn't pool integral, so it can be removed/added/changed and doesn't need to be redundant as losing it won't kill your pool like a metadata VDEV will.

With a little tuning, folks seem to find L2ARC in metadata only mode can really do a lot for large file trees.

Davvo · Jan 26, 2023

In addition to the point bringed up by srertalla about pool resiliency and flexibility, metadata vdevs (and as such fusion pools) great strenght is imho the ability to be provisioned to accept small file blocks: it roughly means that you can use a single pool for both small files and large files without bottlenecking your HDDs; if you want to store only metadata L2ARC does it in a overall better way being non-critical and more flexible.

Fusion Pools

Describes how to create a Fusion Pool on TrueNAS CORE.

www.truenas.com

Pools

Tutorials about managing storage pools in TrueNAS CORE.

www.truenas.com

superadmin29 · Jan 26, 2023

sretalla said:
L2ARC isn't pool integral, so it can be removed/added/changed and doesn't need to be redundant as losing it won't kill your pool like a metadata VDEV will.

Davvo said:
In addition to the point bringed up by srertalla about pool resiliency and flexibility, metadata vdevs (and as such fusion pools) great strenght is imho the ability to be provisioned to accept small file blocks: it roughly means that you can use a single pool for both small files and large files without bottlenecking your HDDs; if you want to store only metadata L2ARC does it in a overall better way being non-critical and more flexible.

Interesting, okay so I'm catching on now. Leave the metadata on the data vdevs so data and metadata have the same point of failure, but cache the metadata for increased performance. I assume this should follow the same space requirements as a metadata vdev? Since this is metadata only would this also benefit from performance of NVMe or is SAS SSD sufficient.

Also want to thank everyone for all the friendly and welcoming insights. I know there's a lot of these "help my plan my storage" posts but this has really helped me learn and adapt this to my specific use cases.

Davvo · Jan 26, 2023

superadmin29 said:
Leave the metadata on the data vdevs so data and metadata have the same point of failure, but cache the metadata for increased performance.

Not really: use L2ARC as substitute for the metadata vdev and set it in a way that caches metadata-only.

superadmin29 said:
I assume this should follow the same space requirements as a metadata vdev? Since this is metadata only would this also benefit from performance of NVMe or is SAS SSD sufficient.

Generally NVMe is suggested for L2ARC (and SLOG), but a SATA SSD could still give you a good boost compared to a plain HDD pool. If I'm not wrong the optimal ratio of ARC:L2ARC should be 1:5 or 1:6, up to a maximum of 1:8 I believe; a 500GB NVMe is likely your best shot.

Important Announcement for the TrueNAS Community.

Planning ZFS Storage Configuration

Cadet

Wizard

MVP

Guru

Guru

Powered by Neutrality

MVP

Powered by Neutrality

Guru

ZFS Virtual Devices (ZFS VDEVs)​

MVP

ZFS Virtual Devices (ZFS VDEVs)​

Powered by Neutrality

Cadet

MVP

Cadet

Powered by Neutrality

Powered by Neutrality

MVP

Cadet

MVP

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Planning ZFS Storage Configuration"

Similar threads

ZFS Virtual Devices (ZFS VDEVs)

ZFS Virtual Devices (ZFS VDEVs)