Organization of your Data / Dataset Hierarchy

thomas-hn

Explorer
Joined
Aug 2, 2020
Messages
82
Hello,

at the moment I am on the definition of the structure of my datasets and how to organize my data for the private family NAS. For this I am trying to follow the approach to have top-level datasets depending on their data priority. However, it is hard to find a good concept and at the moment I am struggling with the question on how to handle different users in the folder/dataset structure.
For example, should I place photos inside /tank/ds_prio1/photos/user1 and /tank/ds_prio1/photos/user2 or would it be better to have something like /tank/ds_prio1/user1/photos and /tank/ds_prio1/user2/photos?

Could you, maybe, give some examples of your dataset structure and how you handle different family members in your dataset/folder structure?

Thanks a lot,

Thomas
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
 

Saoshen

Dabbler
Joined
Oct 13, 2023
Messages
47
if you have multiple users, I would go with something similar to '/pool/home/username' then let the user organize their own private data (including photos).

For global/shared media/pictures, '/pool/media/pictures' '/pool/media/audio' movies, tv, etc

under media/pictures you could of course create a folder heirarchy that best matches your preferences, ie by year folders, and/or other users or events or categories.

for even more safety, you can have separate pools with similar structure but different priorties, ie '/pool1/home/users' and '/pool2/media/etc' and '/pool3/backups/computername' etc.

the reason for separate pools is avoiding putting everything into one basket, so a catastrophic failure of one pool doesn't affect your other pools.

of course, always remember (or google) the 3-2-1 backup rule.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
the reason for separate pools is avoiding putting everything into one basket, so a catastrophic failure of one pool doesn't affect your other pools.
For a SOHO user I would suggest using different pools inside the same system only for performance reasons/different roles (ie zvol and media storage).
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
For a SOHO user I would suggest using different pools inside the same system only for performance reasons/different roles (ie zvol and media storage).
Agreed. The answer to "don't put all your eggs in one basket" is to build a better basket.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
If your RAIDZ3 pool A suffers a catastrophic loss why wouldn't your pool B be the same?

3-2-1 is valid: have backups and offsite backups!
 

Saoshen

Dabbler
Joined
Oct 13, 2023
Messages
47
If your RAIDZ3 pool A suffers a catastrophic loss why wouldn't your pool B be the same?

No, POOLS are completely independent.

If instead you meant VDEV, as in 2x vdevs of raidz3, then yes, that is my point, if giganticpool0 vdev2 fails, then you lose the entire giganticpool0.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
No, POOLS are completely independent.
I do know how ZFS works, but if your RAIDZ3 pool fails it means that either: your system experienced a catastrophic event, in which case any other pool is likely to have met the same fate; or that you didn't replace faulted drives and lost redundancy upon redundancy until you lost the whole pool, in which case why would your other pools have been treated differently? If you skip on maintenance you usually do so for the whole system.

It's better to have an offsite backup then a second pool in the same system.
 

Saoshen

Dabbler
Joined
Oct 13, 2023
Messages
47
if your RAIDZ3 pool fails it means that either: your system experienced a catastrophic event, in which case any other pool is likely to have met the same fate;
that is a pretty broad assumption. one pool failure does not presume another. they are both independent events.

as opposed to raidz or even mirror pairs involving multiple vdevs, where losing a vdev = catastrophic loss for the entire pool.

or that you didn't replace faulted drives and lost redundancy upon redundancy until you lost the whole pool,
ergo, you lose only that pool, your others are still intact.

in which case why would your other pools have been treated differently?
again another assumption, if you had multiple pools and one went down, it doesn't affect the others.

of course, if multiple pools do go down because of lack of maintenance or any other reason, that still isn't a reason to choose megapool200tbs vs multiple smaller pools.

It's better to have an offsite backup then a second pool in the same system.

Of course, as was already mentioned more than once ie 3-2-1 backups

the entire point is that multiple pools help isolate your data from being affected by other pools problems, they are also faster to resilver, putting your data at risk for shorter periods of time.

it is up to every user to determine their level of risk, but very few circumstance *requires* single large pools of hundreds of terabytes. Certainly sure, in some professional/production environments it may be more likely. But in the growing influx of 'homelab' and media storage interests, not many 'needs' a single giant pool, even if they may 'want' one.

edit: and to even more explicit, I am primarily talking about arrays of larger numbers of disks, disk shelfs of say, more than 8, that involve multiple vdevs.

8 and under, raidz2 and be done, or mirror pairs if you want easier expansion.
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
again another assumption, if you had multiple pools and one went down, it doesn't affect the others.
Yup. But if one RAIDZ3 pool fails it means there is a serious hazard going on into the system that's at least likely to hit the others.

it is up to every user to determine their level of risk, but very few circumstance *requires* single large pools of hundreds of terabytes. Certainly sure, in some professional/production environments it may be more likely. But in the growing influx of 'homelab' and media storage interests, not many 'needs' a single giant pool, even if they may 'want' one.
"Being more safe" is not the reason you go multiple pools; performance is.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
if you had multiple pools and one went down, it doesn't affect the others.
Obviously--we all know that. @Davvo is saying that the only realistic way that a RAIDZ3 vdev is going to fail is either (1) catastrophic hardware failure, or (2) near-criminal neglect of the system. That is indeed an assumption, but it happens to be correct. No, loss of pool1 won't cause the loss of pool2--but the conditions that caused that loss are highly likely to cause the other as well.
 

Saoshen

Dabbler
Joined
Oct 13, 2023
Messages
47
"Being more safe" is not the reason you go multiple pools; performance is.

Safety is relative.

Multiple pools allows for isolation and distribution of the chosen fault tolerance level.

Performance from multiple pools, really? when performance scales from multiple vdevs? Maybe performance isolation, which goes hand in hand with the above isolation of fault tolerance boundaries via multiple pools.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Performance from multiple pools, really? when performance scales from multiple vdevs?
It's sounding increasingly like you're determined to misunderstand Davvo. You do you, I guess, but it isn't very productive. He's already explained what he means by this:
only for performance reasons/different roles (ie zvol and media storage).
So, e.g., a large RAIDZn pool for bulk storage, and another pool consisting of (ideally several) mirrored vdevs for block storage. And/or a pool of SSDs for where higher performance is needed. The concept here is "different pools (with appropriately different configurations) for different performance requirements", not "separate pools result in higher performance."

Yes, if a vdev fails, the whole pool fails. We all know this. The remedy is to design your vdevs to have an acceptably-low change of failure. Whether that means greater redundancy, smaller vdevs, better hardware, or some combination of those things, that's where the answer is: "build a better basket."
 

Saoshen

Dabbler
Joined
Oct 13, 2023
Messages
47
Ok yes I get your expanded explanation, and even agree there.

Build a better basket is great, but still fallible. Having multiple better baskets is less riskier (while still fallible), than a single better basket.

I don't understand why this is met with such resistance.

Your signature, with I assume is a single pool of 4 raidz2 vdevs, is far more risk than I want. Mine would be 4 separate pools, if using the same disk setup (or in my actual setup, 4 pools of mirrors).

In any case, and as always, backups, backups, backups.
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I don't understand why this is met with such resistance.
Because it defeats a major purpose/benefit of ZFS, which is pooled storage.
single pool of 4 raidz2 vdevs, is far more risk than I want.
That's your call, of course, but is that based on any objective standard, or is it just, "I don't like it"?

To result in data loss, I'd need to have three hard failures, within the same vdev (i.e., half of the disks in that vdev), within about a week--and that's the worst-likely-case scenario, where I don't have a burned-in spare handy. And you're right that such a situation would result in loss of all my data, and yes, that would suck--even restoring from backup is a PITA. But what's the likelihood of that? Well, it isn't zero, much as I'd like it to be--but it's very low. Davvo's done some work to quantify that risk here:

To me, the benefit of pooled storage outweighs the risk. If I found the risk unacceptable, I'd look at different configurations (maybe 3x 8-disk RAIDZ3 vdevs), but splitting bulk storage into multiple pools would be a really hard sell for me. It greatly simplifies administration of your storage, which is why ZFS is designed that way.
 

Saoshen

Dabbler
Joined
Oct 13, 2023
Messages
47
I don't know if it is objective or not, but I do not like the risk to ALL of my data that would take a week or more to resilver, nor the time it would take to restore ALL of the data if lost. Versus the risk to a SUBSET of my data, for less time for either resilver or restoring that subset from backup.

To you, the large mega pool outweighs your risk. To me, it does not.

There is minimal administrative overhead once the pools are setup however desired.

I do not need a single giant pool, and my posit is, that most people do not either, even if they want it and are willing to subject their data to that additional risk.

Media does not need large megapools. Plex and every other media system supports having multiple paths for media libraries.

Is a single pool simpler? absolutely. Is it worth the risk, however small or large, whether objective or not, not for me.

Every time I see a post or thread about someone who lost their pool, I think well if they had just split that pool into smaller subsets, then their pain would be less. Regardless of backup status.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Build a better basket is great, but still fallible. Having multiple better baskets is less riskier (while still fallible), than a single better basket.
When you have a single hose to fill up a bucket on a cart why would you use four strong smaller water buckets instead of a bigger strong one? So that you lose only 1/4 of the water instead of all, while maybe spending more? And if the cart the buckets is on breks, won't all four buckets fall anyway?

Hope the analogy is clear enough.

Your signature, with I assume is a single pool of 4 raidz2 vdevs, is far more risk than I want. Mine would be 4 separate pools, if using the same disk setup (or in my actual setup, 4 pools of mirrors).
That would mean four times the shares, four times the snapshots, four times the permissions, four times the replications to backup things, etc... Doesn't look very pratical to me, but everyone does whatever they prefer.

And actually, I'm reading Pool: 6 x 6 TB RAIDZ2, 6 x 4 TB RAIDZ2, 6 x 8 TB RAIDZ2, 6 x 12 TB RAIDZ2 from his signature: this means he has a single pool composed of four VDEVs in RAIDZ2 made of six drives each. The different HDDs' size used for each VDEV points to a pool that has been expanded over a period of time, I would be surprised to see a pool designed in such a way from scratch.



Later addition:

I do not need a single giant pool, and my posit is, that most people do not either, even if they want it and are willing to subject their data to that additional risk.
The issue I have with this statement is that I think it's misleading (or not true at all).

Every time I see a post or thread about someone who lost their pool, I think well if they had just split that pool into smaller subsets, then their pain would be less. Regardless of backup status.
When someone loses a pool is because there is not enough redundancy (RAIDZ1 or RAIDZ2 with issues). You hardly see properly designed and manteined RAIDZ3 pools failing.

Edits for spelling corrections.
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
would take a week or more to resilver,
Where on earth does that come from? Writing a full 20 TB drive (about the largest currently available) assuming a (fairly low) average throughput of 150 MB/sec takes 37 hours, and resilvers write at pretty nearly full speed. Even if I double that, it's three days. And there's still redundancy while that's going on.
The different HDDs' size used for each VDEV points to a pool that has been expanded over a period of time
Correct.
I do not need a single giant pool, and my posit is, that most people do not either
Need? Well, I guess not. Most people don't need a NAS at all. Nor do they need Plex. Nor do they need, well, a whole lot of things. But a single storage pool greatly simplifies a number of tasks, both administrative and operational.
 
Top