- Joined
- Apr 24, 2020
- Messages
- 5,399
Path to Success for Structuring Datasets in Your Pool
So you've got a shiny new FreeNAS server, just begging to have you create a pool and start loading it up. Assuming you've read @jgreco's The path to success for block storage sticky, you've decided on the composition of your pool (RAIDZx vs mirrors), and built your pool accordingly. Now you have an empty pool and a pile of bits to throw in.
STOP! You'll need to think at this point about how to structure your data.
1) Understand the difference between a dataset and a zvol.
So you've got a shiny new FreeNAS server, just begging to have you create a pool and start loading it up. Assuming you've read @jgreco's The path to success for block storage sticky, you've decided on the composition of your pool (RAIDZx vs mirrors), and built your pool accordingly. Now you have an empty pool and a pile of bits to throw in.
STOP! You'll need to think at this point about how to structure your data.
1) Understand the difference between a dataset and a zvol.
- A dataset is a self-contained ZFS container for data, and is the smallest unit of control for ZFS policies like compression, deduplication, and quotas. This is also the smallest structure for setting ZFS flags. A dataset is essentially its own independent ZFS filesystem.
- A zvol is a virtual disk image. These are similar to other virtual disks, like vmWare's VMDK or Hyper-V's VHD. Unlike these other disk images, a zvol is NOT a file, but is a reference to a block device. (These are actually created in /dev/zvol/tank/, but appear under pool tank in Storage->Pools.)
- Datasets can be nested. A dataset can contain a zvol or another dataset, but a zvol cannot contain any child datasets or zvols.
- What sharing mechanisms do you intend to support? AFP, NFS, SMB, iSCSI? It's generally a bad idea to have multiple sharing protocols acting on a single dataset, as it runs the risk of multiple users stepping on each other in that dataset, as file locking isn't shared across protocols, and working out what happened afterwards can be confusing. Also, iSCSI can only use zvols, whereas the others can only use datasets.
- Do you want to run plugins or jails? The jail manager (
warden
oriocage
) will construct a standard set of datasets for its own use, so you shouldn't construct any datasets with the same names. In particular,warden
installs a dataset hierarchy under tank/jails, andiocage
installs a hierarchy under tank/iocage, where tank is the pool. To provide access to a dataset on the host, you'll need to use the jail manager to map a host dataset to a mount point within the jail. - Do you want to run VMs? VM disks can only be zvols, not datasets. Once running, the VM can mount a host dataset via NFS (preferred) or SMB (not recommended, due to SMB's single-threaded operation).
- What permissions structure do you intend? Nested datasets, in particular, can result in hairy permissions and/or ACL inheritance settings that are hard to debug.
- Lastly, datasets can have independent snapshot/replication intervals. Is some data so critical that you need to snapshot every 30 minutes? That data should be collected within its own dataset.
- The top-level of a pool is reserved; don't just use it as a dumping ground.
- Group data of similar criticality in the same dataset.
- Have separate datasets for separate sharing protocols when practical.
- Create zvols on mirror pools, if possible.
- Only nest datasets when necessary, and then only to a maximum depth of around 2, to minimize jankiness with permissions and ACLs
- Name your datasets and zvols so their function is obvious, not just today, but 5 or more years in the future.
- Avoid creating datasets with names beginning with a period (.). These datasets will be invisible in Storage->Pools, and any operations on these datasets (snapshots, replication, etc.) can only be performed using CLI tools in the shell. (If you absolutely know what you're doing, a hidden dataset can be created in either the GUI or the shell, and then used as normal.) Snapshots of these datasets do get listed in Storage->Snapshots, however.
- Likewise, avoid using shell special characters (e.g., /) in dataset names or snapshots names.
- Avoid using spaces in dataset/zvol names. This can result in unexpected behavior during replications and syncs.
- Never create or modify datasets underneath jail manager-created datasets, like tank/jails or tank/iocage. Running jail cleanup utilities, like
iocage clean -a
will silently destroy these added datasets, as some forum members have learned to their chagrin. Also do not share out datasets underneath jail manager-created datasets, as using the ACL manager to change permissions on the share will break the jail manager's access to datasets for which it expects to have full control. - Avoid creating symlinks between datasets. This degrades the independence of datasets, and confuses utilities like
du
. It can also disturb replications and syncs.
Last edited: