Path to Success for Structuring Datasets in Your Pool

Samuel Tai · Jun 10, 2020

Path to Success for Structuring Datasets in Your Pool

So you've got a shiny new FreeNAS server, just begging to have you create a pool and start loading it up. Assuming you've read @jgreco's The path to success for block storage sticky, you've decided on the composition of your pool (RAIDZx vs mirrors), and built your pool accordingly. Now you have an empty pool and a pile of bits to throw in.

STOP! You'll need to think at this point about how to structure your data.

1) Understand the difference between a dataset and a zvol.

A dataset is a self-contained ZFS container for data, and is the smallest unit of control for ZFS policies like compression, deduplication, and quotas. This is also the smallest structure for setting ZFS flags. A dataset is essentially its own independent ZFS filesystem.
A zvol is a virtual disk image. These are similar to other virtual disks, like vmWare's VMDK or Hyper-V's VHD. Unlike these other disk images, a zvol is NOT a file, but is a reference to a block device. (These are actually created in /dev/zvol/tank/, but appear under pool tank in Storage->Pools.)
Datasets can be nested. A dataset can contain a zvol or another dataset, but a zvol cannot contain any child datasets or zvols.

2) Consider all the use cases for your data

What sharing mechanisms do you intend to support? AFP, NFS, SMB, iSCSI? It's generally a bad idea to have multiple sharing protocols acting on a single dataset, as it runs the risk of multiple users stepping on each other in that dataset, as file locking isn't shared across protocols, and working out what happened afterwards can be confusing. Also, iSCSI can only use zvols, whereas the others can only use datasets.
Do you want to run plugins or jails? The jail manager (warden or iocage) will construct a standard set of datasets for its own use, so you shouldn't construct any datasets with the same names. In particular, warden installs a dataset hierarchy under tank/jails, and iocage installs a hierarchy under tank/iocage, where tank is the pool. To provide access to a dataset on the host, you'll need to use the jail manager to map a host dataset to a mount point within the jail.
Do you want to run VMs? VM disks can only be zvols, not datasets. Once running, the VM can mount a host dataset via NFS (preferred) or SMB (not recommended, due to SMB's single-threaded operation).
What permissions structure do you intend? Nested datasets, in particular, can result in hairy permissions and/or ACL inheritance settings that are hard to debug.
Lastly, datasets can have independent snapshot/replication intervals. Is some data so critical that you need to snapshot every 30 minutes? That data should be collected within its own dataset.

3) Some rules of thumb

The top-level of a pool is reserved; don't just use it as a dumping ground.
Group data of similar criticality in the same dataset.
Have separate datasets for separate sharing protocols when practical.
Create zvols on mirror pools, if possible.
Only nest datasets when necessary, and then only to a maximum depth of around 2, to minimize jankiness with permissions and ACLs
Name your datasets and zvols so their function is obvious, not just today, but 5 or more years in the future.
- Avoid creating datasets with names beginning with a period (.). These datasets will be invisible in Storage->Pools, and any operations on these datasets (snapshots, replication, etc.) can only be performed using CLI tools in the shell. (If you absolutely know what you're doing, a hidden dataset can be created in either the GUI or the shell, and then used as normal.) Snapshots of these datasets do get listed in Storage->Snapshots, however.
- Likewise, avoid using shell special characters (e.g., /) in dataset names or snapshots names.
- Avoid using spaces in dataset/zvol names. This can result in unexpected behavior during replications and syncs.
Never create or modify datasets underneath jail manager-created datasets, like tank/jails or tank/iocage. Running jail cleanup utilities, like iocage clean -a will silently destroy these added datasets, as some forum members have learned to their chagrin. Also do not share out datasets underneath jail manager-created datasets, as using the ACL manager to change permissions on the share will break the jail manager's access to datasets for which it expects to have full control.
Avoid creating symlinks between datasets. This degrades the independence of datasets, and confuses utilities like du. It can also disturb replications and syncs.

winnielinnie · Jun 12, 2020

Excellent, excellent write-up!

A few questions for clarification:

Samuel Tai said:
The top-level of a pool is reserved; don't just use it as a dumping ground.

This is to mean that if you create a pool named "bigtank", which is mounted under /mnt/bigtank/, no files or directories should be saved under /mnt/bigtank/, and /mnt/bigtank/ should never be used as the root directory for a share? Is this a hard restriction, or is it because the system will allow you to do it, but it can cause issues down the line if you try to do so?

Samuel Tai said:
Only nest datasets when necessary, and then only to a maximum depth of around 2, to minimize jankiness with permissions and ACLs

This one seems odd to me. I assumed that you could theoretically create and indefinite amount of nested datasets without issues, as each dataset is essentially its own filesystem. A maximum depth of "2" seems shallow, pardon the pun. Here is an example of a home server (of multiple family members) I made up which can take advantage of the concept "each dataset is its own filesystem, and each dataset can be configured to its own snapshot and replication policies."

The "projects" dataset may be configured to take less frequent snapshots, yet the "multimedia" and "isos" datasets are less important and have no snapshots. For each member's computer dump, the "userhome-backups" would hold regular rsync/ssh tasks from client-to-server, and these datasets have very frequent snapshots (so older versions or outright "deleted" files can later be retrieved). The other datasets under each member's specific computer may be deemed less important and have few to zero snapshots, and may only be accessed rarely, such as a dedicated place to dump iPhone/iPad backups that can later be entirely deleted when there is no more need to hold on to the phone backups. In this depth of 3 nested datasets, what's a likely issue or conflict? I can't really think of any myself.

Code:

bigtank
    ---> downloads
            ---> isos
            ---> projects
            ---> multimedia
    ---> computers
            ---> office-pc
                     ---> userhome-backups
                     ---> noncritical-temp-files
            ---> family-pc
                     ---> userhome-backups
                     ---> noncritical-temp-files
            ---> eric-laptop
                     ---> userhome-backups
                     ---> usb-drive-copy
            ---> gina-laptop
                     ---> userhome-backups
                     ---> usb-drive-copy
                     ---> iphone-backups
            ---> work-laptop
                     ---> userhome-backups
                     ---> android-backups

Again, major props for your excellent guide! I wonder if some things will fundamentally change with OpenZFS 2.0? (Such as how encryption per dataset changes the dynamics of logically structuring your pool.)

Samuel Tai · Jun 12, 2020

winnielinnie said:
This is to mean that if you create a pool named "bigtank", which is mounted under /mnt/bigtank/, no files or directories should be saved under /mnt/bigtank/, and /mnt/bigtank/ should never be used as the root directory for a share? Is this a hard restriction, or is it because the system will allow you to do it, but it can cause issues down the line if you try to do so?

This is because in 11.3, it's not possible to set permissions or ACLs on the root for sharing. The system won't prevent you from creating directories and files at the pool root level, but it's not very useful to have data there that can't be shared.

winnielinnie said:
This one seems odd to me. I assumed that you could theoretically create and indefinite amount of nested datasets without issues, as each dataset is essentially its own filesystem. A maximum depth of "2" seems shallow, pardon the pun. Here is an example of a home server (of multiple family members) I made up which can take advantage of the concept "each dataset is its own filesystem, and each dataset can be configured to its own snapshot and replication policies."

This runs into limitations of the GUI in applying permissions/ACLs recursively. For example, on my system, I have my user home datasets set up like this:

<root>/

home/ (dataset)

local/ (dataset)

local-account-folder-1

local-account-folder-2

...

windows/ (dataset, used as SMB home share)

SMB-account-folder-1

SMB-account-folder-2

...

The local accounts are used for admins who SSH into the server and then sudo. If I apply a permissions/ACL recursively at the home/ level, I have to remember then to go into the shell to change ownership of the individual account folders back to their respective user:group settings. If I make a mistake with permissions at the home/ level, I can make the local/ and windows/ levels inaccessible to SSH logins and SMB share mounting.

anodos · Jun 12, 2020

Samuel Tai said:
This is because in 11.3, it's not possible to set permissions or ACLs on the root for sharing. The system won't prevent you from creating directories and files at the pool root level, but it's not very useful to have data there that can't be shared.

This runs into limitations of the GUI in applying permissions/ACLs recursively. For example, on my system, I have my user home datasets set up like this:

<root>/

home/ (dataset)
local/ (dataset)
local-account-folder-1
local-account-folder-2
...
windows/ (dataset, used as SMB home share)
SMB-account-folder-1
SMB-account-folder-2
...

The local accounts are used for admins who SSH into the server and then sudo. If I apply a permissions/ACL recursively at the home/ level, I have to remember then to go into the shell to change ownership of the individual account folders back to their respective user:group settings. If I make a mistake with permissions at the home/ level, I can make the local/ and windows/ levels inaccessible to SSH logins and SMB share mounting.

The ACL manager won't change the owner / group recursively unless you have "apply owner" or "apply group" checked (as of U3.2). There was a GUI bug in U3 where the webui was always submitting a request to change owner. This is fixed in the latest release.

jenksdrummer · Sep 23, 2020

iSCSI can only use zvols

This is incorrect, you can use zvols or your can use file-based extents with iSCSI.

Both perform a bit differently in terms of throughput and iops...but as an observation, file-based extents can have dramatically higher compression ratios given similar conditions.

volothamp · Sep 24, 2020

Samuel Tai said:
This is because in 11.3, it's not possible to set permissions or ACLs on the root for sharing. The system won't prevent you from creating directories and files at the pool root level, but it's not very useful to have data there that can't be shared.

So what should we do instead? Creating a directory in the root and start from that? Thank you

Alecmascot · Sep 24, 2020

volothamp said:
So what should we do instead? Creating a directory in the root and start from that? Thank you

No, create a dataset not a directory.

volothamp · Sep 24, 2020

Alecmascot said:
No, create a dataset not a directory.

But assuming the root is also a dataset that means we're forced to nest datasets.

How this is related to this rule?

"Only nest datasets when necessary, and then only to a maximum depth of around 2"

Should the depth of 2 include also the root?

Thank you very much

Alecmascot · Sep 24, 2020

volothamp said:
But assuming the root is also a dataset that means we're forced to nest datasets.

How this is related to this rule?

"Only nest datasets when necessary, and then only to a maximum depth of around 2"

Should the depth of 2 include also the root?

Thank you very much

That is not a rule, it was a suggestion to aid with management of permissions and ACls.
You may get unexpected results if a share traverses datasets.

SweetAndLow · Sep 24, 2020

Alecmascot said:
That is not a rule, it was a suggestion to aid with management of permissions and ACls.
You may get unexpected results if a share traverses datasets.

In what scenario would you be traversing multiple datasets and get permission issues? For example you can't traverse datasets via smb or nfs or even in a jail. So really i only see accessing things via cli might be the only complex time for a user.

pumapanzer · Apr 26, 2021

Thank you so much for sharing your wisdom!

I am new to open systems in general, expanding my knowledge about Linux, Unix, open source software, and most recently, TrueNAS and ZFS. I am thoroughly enjoying reading through articles and community threads. It's great stuff, especially practical advice from experienced users such as yourself ;)

Samuel Tai said:
1) Understand the difference between a dataset and a zvol.

Datasets can be nested. A dataset can contain a zvol or another dataset, but a zvol cannot contain any child datasets or zvols.

2) Consider all the use cases for your data

What permissions structure do you intend? Nested datasets, in particular, can result in hairy permissions and/or ACL inheritance settings that are hard to debug.

3) Some rules of thumb

Only nest datasets when necessary, and then only to a maximum depth of around 2, to minimize jankiness with permissions and ACLs

I appreciate your guidance regarding child datasets and nesting; however, I feel like I am missing the point of creating child datasets. If you don't mind, could you share some common use-cases for child datasets, or perhaps point me in the direction for more reading on the subject? Thanks in advance and take care!

pumapanzer · Apr 28, 2021

pumapanzer said:
Thank you so much for sharing your wisdom!

I am new to open systems in general, expanding my knowledge about Linux, Unix, open source software, and most recently, TrueNAS and ZFS. I am thoroughly enjoying reading through articles and community threads. It's great stuff, especially practical advice from experienced users such as yourself ;)

I appreciate your guidance regarding child datasets and nesting; however, I feel like I am missing the point of creating child datasets. If you don't mind, could you share some common use-cases for child datasets, or perhaps point me in the direction for more reading on the subject? Thanks in advance and take care!

My question regarding child datasets could have been more clear. I meant to ask: Aside from the 2nd-level datasets (direct decendent children of the pool's top-level), what's the point of creating 3rd-level child datasets, and potentially 4th-level child datasets, and so on?

For my planned use-cases, I was able to answer my own question by searching a bit further in the "Resources" using this wonderful guide: https://www.truenas.com/community/resources/introduction-to-zfs.111/

Take care!

PS I am unable to edit my previous reply, or I would have edited the question directly, in my previous reply. Perhaps it is because I am a new community member who is still on "probation"? If yes, I completely understand. :)

Davvo · Nov 16, 2022

Why isn't this a resource? It's so insightful.

Important Announcement for the TrueNAS Community.

Path to Success for Structuring Datasets in Your Pool

Samuel Tai

Never underestimate your own stupidity

winnielinnie

MVP

Samuel Tai

Never underestimate your own stupidity

anodos

Sambassador

jenksdrummer

Patron

volothamp

Explorer

Alecmascot

Guru

volothamp

Explorer

Alecmascot

Guru

SweetAndLow

Sweet'NASty

pumapanzer

Cadet

pumapanzer

Cadet

Davvo

MVP

Similar threads

Important Announcement for the TrueNAS Community.

Path to Success for Structuring Datasets in Your Pool

Never underestimate your own stupidity

MVP

Never underestimate your own stupidity

Sambassador

Patron

Explorer

Guru

Explorer

Guru

Sweet'NASty

Cadet

Cadet

MVP

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Path to Success for Structuring Datasets in Your Pool"

Similar threads