Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

Path to Success for Structuring Datasets in Your Pool

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
2,793
Path to Success for Structuring Datasets in Your Pool

So you've got a shiny new FreeNAS server, just begging to have you create a pool and start loading it up. Assuming you've read @jgreco's The path to success for block storage sticky, you've decided on the composition of your pool (RAIDZx vs mirrors), and built your pool accordingly. Now you have an empty pool and a pile of bits to throw in.

STOP! You'll need to think at this point about how to structure your data.

1) Understand the difference between a dataset and a zvol.
  • A dataset is a self-contained ZFS container for data, and is the smallest unit of control for ZFS policies like compression, deduplication, and quotas. This is also the smallest structure for setting ZFS flags. A dataset is essentially its own independent ZFS filesystem.
  • A zvol is a virtual disk image. These are similar to other virtual disks, like vmWare's VMDK or Hyper-V's VHD. Unlike these other disk images, a zvol is NOT a file, but is a reference to a block device. (These are actually created in /dev/zvol/tank/, but appear under pool tank in Storage->Pools.)
  • Datasets can be nested. A dataset can contain a zvol or another dataset, but a zvol cannot contain any child datasets or zvols.
2) Consider all the use cases for your data
  • What sharing mechanisms do you intend to support? AFP, NFS, SMB, iSCSI? It's generally a bad idea to have multiple sharing protocols acting on a single dataset, as it runs the risk of multiple users stepping on each other in that dataset, as file locking isn't shared across protocols, and working out what happened afterwards can be confusing. Also, iSCSI can only use zvols, whereas the others can only use datasets.
  • Do you want to run plugins or jails? The jail manager (warden or iocage) will construct a standard set of datasets for its own use, so you shouldn't construct any datasets with the same names. In particular, warden installs a dataset hierarchy under tank/jails, and iocage installs a hierarchy under tank/iocage, where tank is the pool. To provide access to a dataset on the host, you'll need to use the jail manager to map a host dataset to a mount point within the jail.
  • Do you want to run VMs? VM disks can only be zvols, not datasets. Once running, the VM can mount a host dataset via NFS (preferred) or SMB (not recommended, due to SMB's single-threaded operation).
  • What permissions structure do you intend? Nested datasets, in particular, can result in hairy permissions and/or ACL inheritance settings that are hard to debug.
  • Lastly, datasets can have independent snapshot/replication intervals. Is some data so critical that you need to snapshot every 30 minutes? That data should be collected within its own dataset.
3) Some rules of thumb
  • The top-level of a pool is reserved; don't just use it as a dumping ground.
  • Group data of similar criticality in the same dataset.
  • Have separate datasets for separate sharing protocols when practical.
  • Create zvols on mirror pools, if possible.
  • Only nest datasets when necessary, and then only to a maximum depth of around 2, to minimize jankiness with permissions and ACLs
  • Name your datasets and zvols so their function is obvious, not just today, but 5 or more years in the future.
    • Avoid creating datasets with names beginning with a period (.). These datasets will be invisible in Storage->Pools, and any operations on these datasets (snapshots, replication, etc.) can only be performed using CLI tools in the shell. (If you absolutely know what you're doing, a hidden dataset can be created in either the GUI or the shell, and then used as normal.) Snapshots of these datasets do get listed in Storage->Snapshots, however.
    • Likewise, avoid using shell special characters (e.g., /) in dataset names or snapshots names.
    • Avoid using spaces in dataset/zvol names. This can result in unexpected behavior during replications and syncs.
  • Never create or modify datasets underneath jail manager-created datasets, like tank/jails or tank/iocage. Running jail cleanup utilities, like iocage clean -a will silently destroy these added datasets, as some forum members have learned to their chagrin.
  • Avoid creating symlinks between datasets. This degrades the independence of datasets, and confuses utilities like du. It can also disturb replications and syncs.
 
Last edited:

winnielinnie

Senior Member
Joined
Oct 22, 2019
Messages
542
Excellent, excellent write-up!

A few questions for clarification:
The top-level of a pool is reserved; don't just use it as a dumping ground.
This is to mean that if you create a pool named "bigtank", which is mounted under /mnt/bigtank/, no files or directories should be saved under /mnt/bigtank/, and /mnt/bigtank/ should never be used as the root directory for a share? Is this a hard restriction, or is it because the system will allow you to do it, but it can cause issues down the line if you try to do so?


Only nest datasets when necessary, and then only to a maximum depth of around 2, to minimize jankiness with permissions and ACLs
This one seems odd to me. I assumed that you could theoretically create and indefinite amount of nested datasets without issues, as each dataset is essentially its own filesystem. A maximum depth of "2" seems shallow, pardon the pun. Here is an example of a home server (of multiple family members) I made up which can take advantage of the concept "each dataset is its own filesystem, and each dataset can be configured to its own snapshot and replication policies."

The "projects" dataset may be configured to take less frequent snapshots, yet the "multimedia" and "isos" datasets are less important and have no snapshots. For each member's computer dump, the "userhome-backups" would hold regular rsync/ssh tasks from client-to-server, and these datasets have very frequent snapshots (so older versions or outright "deleted" files can later be retrieved). The other datasets under each member's specific computer may be deemed less important and have few to zero snapshots, and may only be accessed rarely, such as a dedicated place to dump iPhone/iPad backups that can later be entirely deleted when there is no more need to hold on to the phone backups. In this depth of 3 nested datasets, what's a likely issue or conflict? I can't really think of any myself.

Code:
bigtank
    ---> downloads
            ---> isos
            ---> projects
            ---> multimedia
    ---> computers
            ---> office-pc
                     ---> userhome-backups
                     ---> noncritical-temp-files
            ---> family-pc
                     ---> userhome-backups
                     ---> noncritical-temp-files
            ---> eric-laptop
                     ---> userhome-backups
                     ---> usb-drive-copy
            ---> gina-laptop
                     ---> userhome-backups
                     ---> usb-drive-copy
                     ---> iphone-backups
            ---> work-laptop
                     ---> userhome-backups
                     ---> android-backups


Again, major props for your excellent guide! I wonder if some things will fundamentally change with OpenZFS 2.0? (Such as how encryption per dataset changes the dynamics of logically structuring your pool.)
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
2,793
This is to mean that if you create a pool named "bigtank", which is mounted under /mnt/bigtank/, no files or directories should be saved under /mnt/bigtank/, and /mnt/bigtank/ should never be used as the root directory for a share? Is this a hard restriction, or is it because the system will allow you to do it, but it can cause issues down the line if you try to do so?
This is because in 11.3, it's not possible to set permissions or ACLs on the root for sharing. The system won't prevent you from creating directories and files at the pool root level, but it's not very useful to have data there that can't be shared.

This one seems odd to me. I assumed that you could theoretically create and indefinite amount of nested datasets without issues, as each dataset is essentially its own filesystem. A maximum depth of "2" seems shallow, pardon the pun. Here is an example of a home server (of multiple family members) I made up which can take advantage of the concept "each dataset is its own filesystem, and each dataset can be configured to its own snapshot and replication policies."
This runs into limitations of the GUI in applying permissions/ACLs recursively. For example, on my system, I have my user home datasets set up like this:

<root>/
home/ (dataset)​
local/ (dataset)​
local-account-folder-1​
local-account-folder-2​
...​
windows/ (dataset, used as SMB home share)​
SMB-account-folder-1​
SMB-account-folder-2​
...​
The local accounts are used for admins who SSH into the server and then sudo. If I apply a permissions/ACL recursively at the home/ level, I have to remember then to go into the shell to change ownership of the individual account folders back to their respective user:group settings. If I make a mistake with permissions at the home/ level, I can make the local/ and windows/ levels inaccessible to SSH logins and SMB share mounting.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
7,503
This is because in 11.3, it's not possible to set permissions or ACLs on the root for sharing. The system won't prevent you from creating directories and files at the pool root level, but it's not very useful to have data there that can't be shared.



This runs into limitations of the GUI in applying permissions/ACLs recursively. For example, on my system, I have my user home datasets set up like this:

<root>/
home/ (dataset)​
local/ (dataset)​
local-account-folder-1​
local-account-folder-2​
...​
windows/ (dataset, used as SMB home share)​
SMB-account-folder-1​
SMB-account-folder-2​
...​
The local accounts are used for admins who SSH into the server and then sudo. If I apply a permissions/ACL recursively at the home/ level, I have to remember then to go into the shell to change ownership of the individual account folders back to their respective user:group settings. If I make a mistake with permissions at the home/ level, I can make the local/ and windows/ levels inaccessible to SSH logins and SMB share mounting.
The ACL manager won't change the owner / group recursively unless you have "apply owner" or "apply group" checked (as of U3.2). There was a GUI bug in U3 where the webui was always submitting a request to change owner. This is fixed in the latest release.
 

jenksdrummer

Member
Joined
Jun 7, 2011
Messages
199
iSCSI can only use zvols
This is incorrect, you can use zvols or your can use file-based extents with iSCSI.

Both perform a bit differently in terms of throughput and iops...but as an observation, file-based extents can have dramatically higher compression ratios given similar conditions.
 

volothamp

Member
Joined
Jul 28, 2019
Messages
55
This is because in 11.3, it's not possible to set permissions or ACLs on the root for sharing. The system won't prevent you from creating directories and files at the pool root level, but it's not very useful to have data there that can't be shared.
So what should we do instead? Creating a directory in the root and start from that? Thank you
 

Alecmascot

Neophyte Sage
Joined
Mar 18, 2014
Messages
722
So what should we do instead? Creating a directory in the root and start from that? Thank you
No, create a dataset not a directory.
 

volothamp

Member
Joined
Jul 28, 2019
Messages
55
No, create a dataset not a directory.
But assuming the root is also a dataset that means we're forced to nest datasets.

How this is related to this rule?

"Only nest datasets when necessary, and then only to a maximum depth of around 2"

Should the depth of 2 include also the root?

Thank you very much
 

Alecmascot

Neophyte Sage
Joined
Mar 18, 2014
Messages
722
But assuming the root is also a dataset that means we're forced to nest datasets.

How this is related to this rule?

"Only nest datasets when necessary, and then only to a maximum depth of around 2"

Should the depth of 2 include also the root?

Thank you very much
That is not a rule, it was a suggestion to aid with management of permissions and ACls.
You may get unexpected results if a share traverses datasets.
 
Last edited:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,390
That is not a rule, it was a suggestion to aid with management of permissions and ACls.
You may get unexpected results if a share traverses datasets.
In what scenario would you be traversing multiple datasets and get permission issues? For example you can't traverse datasets via smb or nfs or even in a jail. So really i only see accessing things via cli might be the only complex time for a user.
 

pumapanzer

Newbie
Joined
Apr 23, 2021
Messages
3
Thank you so much for sharing your wisdom!

I am new to open systems in general, expanding my knowledge about Linux, Unix, open source software, and most recently, TrueNAS and ZFS. I am thoroughly enjoying reading through articles and community threads. It's great stuff, especially practical advice from experienced users such as yourself ;)

1) Understand the difference between a dataset and a zvol.
  • Datasets can be nested. A dataset can contain a zvol or another dataset, but a zvol cannot contain any child datasets or zvols.
2) Consider all the use cases for your data
  • What permissions structure do you intend? Nested datasets, in particular, can result in hairy permissions and/or ACL inheritance settings that are hard to debug.
3) Some rules of thumb
  • Only nest datasets when necessary, and then only to a maximum depth of around 2, to minimize jankiness with permissions and ACLs
I appreciate your guidance regarding child datasets and nesting; however, I feel like I am missing the point of creating child datasets. If you don't mind, could you share some common use-cases for child datasets, or perhaps point me in the direction for more reading on the subject? Thanks in advance and take care!
 

pumapanzer

Newbie
Joined
Apr 23, 2021
Messages
3
Thank you so much for sharing your wisdom!

I am new to open systems in general, expanding my knowledge about Linux, Unix, open source software, and most recently, TrueNAS and ZFS. I am thoroughly enjoying reading through articles and community threads. It's great stuff, especially practical advice from experienced users such as yourself ;)



I appreciate your guidance regarding child datasets and nesting; however, I feel like I am missing the point of creating child datasets. If you don't mind, could you share some common use-cases for child datasets, or perhaps point me in the direction for more reading on the subject? Thanks in advance and take care!

My question regarding child datasets could have been more clear. I meant to ask: Aside from the 2nd-level datasets (direct decendent children of the pool's top-level), what's the point of creating 3rd-level child datasets, and potentially 4th-level child datasets, and so on?

For my planned use-cases, I was able to answer my own question by searching a bit further in the "Resources" using this wonderful guide: https://www.truenas.com/community/resources/introduction-to-zfs.111/

Take care!

PS I am unable to edit my previous reply, or I would have edited the question directly, in my previous reply. Perhaps it is because I am a new community member who is still on "probation"? If yes, I completely understand. :)
 
Top