SMB Size Reporting Question

awil95

Dabbler
Joined
Apr 23, 2017
Messages
28
How does TrueNAS Scale report pool or dataset sizes over SMB? When I map my SMB shares on my Windows PC the shares do not report the total storage capacity of the pool correctly. My main data pool (Atlas) is made up of 5x4TB WD RED CMR drives in a RAIDz2 which is about 10.63TB of usable space. My second pool (Hercules) is made up of 2x1TB SSDs which are Mirrored. See the attached images for drive size info and Windows mounts.

Before I started using TrueNAS I had was running Ubuntu Server with OpenZFS and Samba. It would also report weird total pool and dataset sizes until I applied this Samba ZFS Fix. After applying this fix to my smb.conf file on Ubuntu Server my Windows PC would show correct total pool space. Is there a way to do this on TrueNAS Scale as well?
 

Attachments

  • TrueNAS Storage.png
    TrueNAS Storage.png
    109.2 KB · Views: 507
  • SMB Shares.png
    SMB Shares.png
    65.6 KB · Views: 467
Joined
Oct 22, 2019
Messages
3,580
It's always been like that, I thought?

Especially if what is being shared is a dataset nested down the pool. Quotas also affect this display.

USED + FREE = TOTAL

So from the client's perspective, it knows how much space is being used up (within the share). It also knows how much space is reported as "free" or "available". And hence, it assumes the "total capacity" from these two values. (Rudimentary math.)

You'll notice that all your shares (from the same pool) report the same "free" value (regardless of how much data is stored on each share.)
 
Last edited:

awil95

Dabbler
Joined
Apr 23, 2017
Messages
28
It's always been like that, I thought?

Especially if what is being shared is a dataset nested down the pool. Quotas also affect this display.

USED + FREE = TOTAL

So from the client's perspective, it knows how much space is being used up (within the share). It also knows how much space is reported as "free" or "available". And hence, it assumes the "total capacity" from these two values. (Rudimentary math.)

You'll notice that all your shares (from the same pool) report the same "free" value (regardless of how much data is stored on each share.)
The USED + FREE = TOTAL analogy makes sense for why the reported values on Windows are the way they are. Especially since each dataset has different USED values but all will have the same FREE value since they are from the same pool. This is the way I've always seen it on FreeNAS/TrueNAS from what I remember I just haven't been using TrueNAS for all that long. It just seems weird that the datasets and shares that are all from the same storage pool don't report the same TOTAL space ~10.63TB. Before Scale was released I ran a custom built NAS on Ubuntu Server for the sole reason that I needed Nvidia HW transcoding for my Plex server.

On my Ubuntu server I had the same 3 datasets setup and with this Samba ZFS Fix the 3 shares from my Atlas pool would all have the same USED and TOTAL values reported on Windows since there was no quotas set and they are from the same pool.
 
Joined
Oct 22, 2019
Messages
3,580
On my Ubuntu server I had the same 3 datasets setup and with this Samba ZFS Fix the 3 shares from my Atlas pool would all have the same USED and TOTAL values reported on Windows since there was no quotas set and they are from the same pool.
That's the thing: you're using a third-party fix. For my Linux clients, they also report the same way that you're witnessing with Windows 10. (I'm not using a "fix" or third-party workaround on my Linux PCs.)



It's better to think about it in the reverse:

Imagine a pool with a total capacity of 10 TB. You configured multiple shares (independent of each other) to access via client PCs.

On one of these shares ("media"), there exists 2 TB worth of data on it. (Accessible to anyone and everyone.)

On another share ("archives"), you dump 5 TB worth of data onto it. (Accessible and known only to admins.)

It would still be just as strange to look at your "media" share's properties to see:
  • Used space: 2 TB
  • Total capacity: 10 TB
  • Free space: 3 TB

Imagine a non-admin user sitting at her desk, scratching her head looking at the "media" share: "Wait, I only saved 2 TB of stuff on here. But it's saying I only have 3 TB of space remaining? It's 10 TB total capacity! Where'd the other missing 5 TB go?!" :oops:



However, on the flip-side (as it is currently employed), she would actually see:
  • Used space: 2 TB
  • Total capacity: 5 TB
  • Free space: 3 TB

"Oh wow. I only have 3 TB left that I can dump multimedia in this share. Looks like I can't exceed more than 5 TB in total."
 
Last edited:

awil95

Dabbler
Joined
Apr 23, 2017
Messages
28
That's the thing: you're using a third-party fix. For my Linux clients, they also report the same way that you're witnessing with Windows 10. (I'm not using a "fix" or third-party workaround on my Linux PCs.)



It's better to think about it in the reverse:

Imagine a pool with a total capacity of 10 TB. You configured multiple shares (independent of each other) to access via client PCs.

On one of these shares ("media"), there exists 2 TB worth of data on it. (Accessible to anyone and everyone.)

On another share ("archives"), you dump 5 TB worth of data onto it. (Accessible and known only to admins.)

It would still be just as strange to look at your "media" share's properties to see:
  • Used space: 2 TB
  • Total capacity: 10 TB
  • Free space: 3 TB

Imagine a non-admin user sitting at her desk, scratching her head looking at the "media" share: "Wait, I only saved 2 TB of stuff on here. But it's saying I only have 3 TB of space remaining? It's 10 TB total capacity! Where'd the other missing 5 TB go?!" :oops:



However, on the flip-side (as it is currently employed), she would actually see:
  • Used space: 2 TB
  • Total capacity: 5 TB
  • Free space: 3 TB

"Oh wow. I only have 3 TB left that I can dump multimedia in this share. Looks like I can't exceed more than 5 TB in total."
Thank you for the amazing response. This make so much sense now as to why it's done this way now. It just seemed arbitrary to me at first since I am the only person using my NAS at home and not multiple users with limited share access.
 

oblivioncth

Explorer
Joined
Jul 13, 2022
Messages
71
@awil95
This is a bit of a can of worms I'm getting into, as most responses to these kinds of questions are generally met with responses somewhere along the lines of "The way it is now is the most proper, get used to it." The tl;dr is that in my opinion, this is insane.

I'll start off by acknowledging that this whole thing is a tricky situation with no real proper way to do it, due to the complications of pooled storage; however, I do think there's a pretty good compromise that I'll touch on at the end.

Largely, I've seen systems have Samba configured two ways by default in regards to this:

1) Report the entire pools used and free space for all shares.

This is what UNRAID does.

Let's say you had a pool of 50TB and regardless of how your data was laid out 35TB were in use. Then let's say you have 3 shares, "Media", "Documents", "Games".

They would all appear like this:
  • Media: 15TB free of 50TB
  • Documents: 15TB free of 50TB
  • Games: 15TB free of 50TB

As space gets consumed, the loss of free space is reflected across all shares.

This certainly keeps things the most simple in terms of being aware of space consumption, but really leaves a lot to be desired in terms of each share having a really sense of owning its own space. It also starts to break down pretty badly if you want to try and restrict certain shares to certain users since they then might not be aware of all the shares that might result in space being consumed.

This approach prioritizes making capacity and free space intuitive, but makes used space confusing, since it's overall and not per-share.

2) Determine the capacity of a share via USED + FREE (*shudders*)

Obviously this is how TrueNAS configures Samba, and while I do think it's better than (1), it's not without its fair share of issues.

First off, right out of the gate there is a problem with how this looks when mounted in Windows.

1674108897034.png


In this example, none of my shares have access limitations and none of my shares have reservations or quotas. Despite this the capacities obviously vary. We KNOW why this is the case, because the capacity here is basically an after thought, with used and free space being the primary sources of data to create these views of the shares. Although it makes technical sense and is ideal in terms of used and free space, to me it's not very intuitive and goes completely against how capacity is typically represented with local storage.

If you don't mind this setup because you primarily care that used and free space are handled "correctly", then more power to you. But for me at least, this drives me nuts.

Why does Games magically have more capacity because I put more in it? That doesn't make any sense at all. You can keep pointing back to USED + FREE but that's only so golden in a vacuum (where capacity is an afterthought), and in my opinion the shares should ideally make sense from all directions.

This approach prioritizes making used and free space intuitive, but makes capacity confusing since it ends up varying depending upon what's going on in other shares.

It's better to think about it in the reverse:

Imagine a pool with a total capacity of 10 TB. You configured multiple shares (independent of each other) to access via client PCs.

On one of these shares ("media"), there exists 2 TB worth of data on it. (Accessible to anyone and everyone.)

On another share ("archives"), you dump 5 TB worth of data onto it. (Accessible and known only to admins.)

It would still be just as strange to look at your "media" share's properties to see:
  • Used space: 2 TB
  • Total capacity: 10 TB
  • Free space: 3 TB

Imagine a non-admin user sitting at her desk, scratching her head looking at the "media" share: "Wait, I only saved 2 TB of stuff on here. But it's saying I only have 3 TB of space remaining? It's 10 TB total capacity! Where'd the other missing 5 TB go?!" :oops:



However, on the flip-side (as it is currently employed), she would actually see:
  • Used space: 2 TB
  • Total capacity: 5 TB
  • Free space: 3 TB

"Oh wow. I only have 3 TB left that I can dump multimedia in this share. Looks like I can't exceed more than 5 TB in total."

While this example highlights the strength of the current approach, it also demonstrates its weakness.

So at this point the user finished with the thought "Ok there is 5 TB total, I can fit 3TB more of stuff on here". Now lets say they think "Cool, tomorrow I'll bring in my 2TB movies drive from home to share with everyone else and dump them on there. But what they don't know is that overnight the admins decided to archive away 1.5TB of more data into that share they can't even see. Since that free space figure is for the entire pool, they get punked. They come back the next day, connect their drive and open up explorer to now see:
  • Used space: 2TB
  • Total capacity: 3.5 TB
  • Free space: 1.5 TB
And now, in an oh so familiar situation they go "Wait, the amount of used space didn't change, no one put any more files on this drive, and yet the free space suddenly is much less. Where did the other 1.5TB go? I can't fit my movies anymore". Then it just looks like the admins were mean and decided to take media space away from everybody for no reason. This would look even worse if the share was just for that user specifically, as at least with a common share it's vaguely understandable that all of the free space isn't yours.

Obviously, we understand why this happened, but I don't think it's reasonable to expect users, who shouldn't have to care how their storage is physically implemented, to understand this. Not to mention it basically means their free space is a lie and that at any time it could be reduced by the actions of others, even on personal shares.

To me this concept of just accepting "dynamic capacity" is too awkward.

My Approach:

The way I handle it, while not perfect, I think is better than either of these defaults. It institutes the paradigm that is more common with non-pool storage, FREE = TOTAL - USED. Same math, but the difference here is that capacity isn't dynamic, it's free/used space that are, as is traditional.

One word: Quotas. Yes they were technically mentioned in this very thread (barely), and yes no one said not to use them, but I'm really shocked at how rarely they're recommended when people complain about this way that SMB shares on TrueNAS show their capacity as it's basically the perfect middle ground IMO. I don't see reservations mentioned too often either. I think it would have gone a long way in the many other threads like these where people ask how to make their shares more intuitive for them, if instead of basically saying "the way it is by default is fine", people just at least told them to look into quotas and reservations.

Anyway, this is the approach:

Think of your whole pool (or the portion of it dedicated to SMB shares) as it literally is. A giant pool of abstracted/virtual storage available to divvy up however you want. You can think of your network like a single workstation that may or may not be shared by multiple users, and some of those users may or may not be able to see a given share. By always setting quotas on your shares, you treat them as if they were physical drives installed on a real machine and as such they appear and behave exactly like one. When you mount shares with quotas, Windows (and the Linux distros I've used) will show the quota as the capacity, with the free and used space being specific to that share/dataset as well.

1674110973986.png

Note these are the same shares as before, but now with quotas on them.

To keep it simple, here is an example with one user that isn't you, we'll call them Dylan.

You have your pool of 50TB and you decide to give Dylan 2TB of that for their own share, so you make a dataset for them, set a 2TiB quota on it, and then make it a share that they can access. This is just as if you just installed a 2TB disk to their machine. It looks like one, it acts like one and would be the most familiar to them as a regular user.

Then, eventually their "2TB drive" starts to get a little close to full and they come to you complaining. The solution? Easy! Just increase the quota (assuming you have and are willing to give up the extra space). Again, this is just like as if you went out and bought a new drive for them (lets say 4TB) and then upgraded their machine physically, just with the convenience of not having to do data migration. Now they have their "4TB drive" and all is well. Rinse and repeat whenever needed.

Even if you originally never intended to limit a certain share and essentially have to pick a quota arbitrarily, it's fine. Just start with one that roughly makes sense and you can enlarge it later if need be.

Overall:
  • The behavior and appearance of regular local storage is most closely replicated
  • Used, free, and total space all make intuitive sense
  • No space can go "missing"
  • Free space is never unexpectedly robbed
  • There is motivation to better regulate space consumption as you have to stop and think every time before increase a quota. "Do I really need 5 more TB of cat pictures on my pool in this cat pictures datasetshare? Maybe instead I should delete some old ones." It's harder to accidentally run up against your pool capacity as you have a much clearer sense of what's happening to space as it's consumed.
Is it perfect and suitable for all ZFS/SMB use cases? Certainly not. But for traditional "access/share remote data" SMB shares like these, I think it's the most ideal by far. It leaves zero room for confusion while using the shares and adds a minimal amount of forethought and maintenance to your pool.

Another way to think about it, if you happen to be familiar with virtual machines, is it's like having a hypervisor and using your physical drive's space to segment out room for each virtual machines virtual disks, with the disks using thin provisioning. It just requires thinking ahead a little, and not by much as you can always change the quotas later.

The only "caveat" is that you have to make sure that the total size of all your quotas doesn't exceed your pool size. Nothing bad will happen if they do, its just that you basically then lose the benefits of this approach.

Maybe it's a bit OCD, but unless I specifically need a massive dataset with an unfixed capacity, I like having more a more well defined structure and extents for my data (and pretty much everything else I touch software wise).
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,543
The best approach is to use ZFS-aware tools to keep track of ZFS space accounting. At the end of the day File Explorer is just the wrong tool for the job.
 

oblivioncth

Explorer
Joined
Jul 13, 2022
Messages
71
The best approach is to use ZFS-aware tools to keep track of ZFS space accounting. At the end of the day File Explorer is just the wrong tool for the job.
Of course there are many people that wouldn't be satisfied with this, but regardless, care to give an example?

I'd imagine it would need to be some kind of tool that has to be aware of all SMB shares or something like that. What I'm getting at is that I figure this is more of a Samba limitation, and so I'm not sure how to picture exactly what using such a tool would be like (i.e. how it would be different). I'm genuinely curious to see a better way to handle this in action.

Are there special SMB clients and Samba configurations that are ideal for ZFS? Or do said tools use other protocols entirely? Because if you're statement actually fully resolves to SMB isn't the right tool for the job, obviously that's not realistic for many systems.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,543
Of course there are many people that wouldn't be satisfied with this, but regardless, care to give an example?
We have an API to get stats, reporting framework, SNMP, etc. These are all fairly standard ways of managing servers.

What I'm getting at is that I figure this is more of a Samba limitation, and so I'm not sure how to picture exactly what using such a tool would be like (i.e. how it would be different).
There are lots of ways of gathering metrics about servers. Glancing at a pic in File Explorer isn't usually the best way to go about it.

Are there special SMB clients and Samba configurations that are ideal for ZFS?
ZFS / Samba are returning correct answers to the questions asked. (which is basically the same as running df <path> on local console. It's basically, path-based (technically an SMB handle) and so File Explorer only presents metrics for the dataset you happen to have mounted. TL;DR, it's just a matter of how the Windows GUI puts together the answer. This GUI was basically designed for NTFS / FAT32 where you have do not have multiple filesystems sharing a common storage pool.

Because if you're statement actually fully resolves to SMB isn't the right tool for the job, obviously that's not realistic for many systems.
Generally, systems only care about `available` space. Do you have an example of an application that experiences a bug / abberant behavior because of the way we report space.

Because if you're statement actually fully resolves to SMB isn't the right tool for the job, obviously that's not realistic for many systems.
My statement is that if you are relying exclusively on File Explorer to track detailed space accounting (apart from available space on the exact filesystem (not nested ones) that you have mounted) then you should probably re-think your strategy. Configure alerts and properly monitor things with tools that are aware of ZFS.
 

oblivioncth

Explorer
Joined
Jul 13, 2022
Messages
71
Ok, I see the issue.

This is an administration vs. user problem.

There are lots of ways of gathering metrics about servers. Glancing at a pic in File Explorer isn't usually the best way to go about it.

..

My statement is that if you are relying exclusively on File Explorer to track detailed space accounting (apart from available space on the exact filesystem (not nested ones) that you have mounted) then you should probably re-think your strategy. Configure alerts and properly monitor things with tools that are aware of ZFS.

Of course I wouldn't do that. As a system administrator, of course I'm going to track my pools usage properly using the various tools already provided by ZFS natively, those built into TrueNAS, as well as any external tools that might be helpful for gleaning relevant statistics (e.g. QDirStat, which I actually just happened to make more useful for ZFS space consumption analysis).

Of course what the Windows GUI shows by default is suboptimal for storage backed by a common pool, of course the actual data it is using is correct and similar to df, and of course the free space of any one SMB share is common to the entire server since that's just the nature of pooled storage. I understand all of this and administer my system accordingly.

However... good implementations abstract away as much of their actual composition and technical restrictions as possible, no?

To be clear, I'm not referring to the implementation of TrueNAS/ZFS, but rather the implementation of "I host a pooled storage file server and want to provide some of that storage to non-technical clients".

For many people (at least in my cases), they are going to be using Windows explorer and the concept of shares that are affected by each other, with seemingly dynamic capacity or free space that can suddenly go missing, is often too alien and confusing.

You said it yourself:
This GUI was basically designed for NTFS / FAT32 where you have do not have multiple filesystems sharing a common storage pool.

and this is what most general users are used to.

The tl;dr of my above approach is that a system administrator can use quotas to provide a better, more familiar user experience to their clients, assuming this doesn't clash with the rest of their requirements. Essentially acting as a translation layer between the real pool and it's space consumption, and the space that user's see available to them. It's almost in a way like kernel space vs user space. The user's storage isn't directly "real", its just a nicely mapped abstraction to the real thing.

It's a suggestion to improve the look feel of the shares, not a better way to manage them.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,543
It's a suggestion to improve the look feel of the shares, not a better way to manage them.
There's a danger associated with setting aggressive quotas though. As available space approaches zero ZFS performance characteristics will change.

In general the suggestion above to write a script to shell out to zfs command is generally not a good idea. This can be a hot code path, you don't want to do that on a busy server. You can achieve the same by setting the auxiliary parameter zfs_core:zfs_space_enabled = true. This will use libzfs dataset handles to calculate space based on used counters for child datasets. It's generally less efficient than using statvfs output (default behavior), but gives different numbers. IIRC, going through libzfs for this caused a significant performance hit for df ops over SMB.
 

oblivioncth

Explorer
Joined
Jul 13, 2022
Messages
71
In general the suggestion above to write a script to shell out to zfs command is generally not a good idea. This can be a hot code path, you don't want to do that on a busy server. You can achieve the same by setting the auxiliary parameter zfs_core:zfs_space_enabled = true. This will use libzfs dataset handles to calculate space based on used counters for child datasets. It's generally less efficient than using statvfs output (default behavior), but gives different numbers. IIRC, going through libzfs for this caused a significant performance hit for df ops over SMB.
I did not suggest this and agree it's problematic.

There's a danger associated with setting aggressive quotas though. As available space approaches zero ZFS performance characteristics will change.
Do you mean on the entire pool? Because then obviously yes, that is always the case with ZFS, or most file systems really. Or do you mean the free space of the individual dataset/share? If that's the case, in a way it's still somewhat a "good" emulation of a single disk as nearly full drives (especially with SSDs) are problematic in general, and even basic users understand this to a degree; however, the costs of a lack of free space with ZFS do tend to be higher than with other file systems so I want to make sure I understand this.

Are you saying that the degraded performance due to ZFS's copy-on-write limitations will come into play within a single dataset if that dataset has a quota and it's nearly met? In other words, will the performance within a dataset that is 90% full (due to a quota only; the pool has plenty of free space) be similarly degraded to an entire pool that is 90% full?
 

oblivioncth

Explorer
Joined
Jul 13, 2022
Messages
71
Well, the short answer seems to be that yes, this is true. Approaching quota limits does impact performance.

This is the side effect of a conscious design design made a while back: https://github.com/openzfs/zfs/commit/3ec3bc2167352df525c10c99cf24cb24952c2786

This is unfortunate. Degraded storage due to a nearly full pool is unavoidable and makes sense; however, I did not expect this to apply to quotas as I imagined that ZFS would handle using the free space elsewhere to prevent write slowdowns.

In a way, it could, but as summarized here the reason for the slowdown is because OpenZFS' current algorithm for setting up transaction groups could theoretically queue transactions such that the size of a quota would be exceeded, and so the reduced performance comes from shrinking transaction size as the quota limit is approach to prevent this from occurring.

I understand why they did this, as exceeding a quota would of course break its entire design philosophy, but the fact it has to come such a hefty performance cost is disappointing. Hopefully long in the future an alternative solution is devised (though given the complexity of ZFS it would be understandable if this is just the way it has to be).

I'd also be curious to know what the theoretical maximum space beyond the quota that could be consumed is, as I'd it's only by a small amount then I'd much rather have that as an option rather than degraded performance. From a user (of ZFS) point of view, I think most would like the think of quotas as a simple limiting tool, not as something that fundamentally changes how the filesystem works and has significant performance considerations.

This behavior is indirectly influenceable through the spa_asize_inflation tunable, but modifying that obviously would need to be done with extreme care.

My suggestion is still valid as long as you're generous with quota limits, but this does overall mean you would need more space available on average and have to be more mindful about users filling their space up sooner than you would if they had access to the entire amount of free space on the pool, so it's definitely a bit less than ideal now...
 
Last edited:

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,543
My suggestion is still valid as long as you're generous with quota limits, but this does overall mean you would need more space available on average and have to be more mindful about users filling their space up sooner than you would if they had access to the entire amount of free space on the pool, so it's definitely a bit less than ideal now...
You're creating an administrative burden with potential performance / production impact to make File Explorer's GUI look better to end-users (in a multi-user environment). In that context it's easy to see how preference could go the other way, right?
 

oblivioncth

Explorer
Joined
Jul 13, 2022
Messages
71
You're creating an administrative burden with potential performance / production impact to make File Explorer's GUI look better to end-users (in a multi-user environment). In that context it's easy to see how preference could go the other way, right?
Sure, but at least I dug deep enough to discover that there is this potential problem, and I plan to do something about it by adjusting the value I noted above to a reasonable one given my pool configuration to where I won't be too concerned about mild quotas set on some shares. Yes it's extra work, but its worth it to me. It certainly won't be for everyone.

With how almost every other response to these kind of questions is "working as intended", I'd bet money that many, many users weren't even aware of any of these implications at all and simply took dealing with the way SMB shares presented as gospel because, and then tell everyone else to deal with it that way as well. This stifles discourse on what your actual options are, even if they are involved.

What you described as a harmless preference is often communicated as a doctrine. I think its fine to make people aware of other options and simply notify them of the caveats.

Additionally, this problem is ultimately an issue with ZFS itself that is hopefully optimized over time. There are many real world, non-home lab context in which administrators want and do use quotas in a way that isn't just cosmetic. I'm also doing this as I actually do intend to limit the size of some of my shares to deal with some users that are lets just say "irresponsible" with their space consumption. This is a problem in those circumstances too and can't simply just be avoided. The solution can't just be "don't use quotas". The current quota implementation is simply problematic and the best we have at this time. Given I want to use it, in whichever way I please, I'm more than happy to deal with their quirks and I'm sure other's are as well.

Perhaps I'm over analyzing here a bit, but if you look through the many other threads that are more or less the same thing, 90% of the conversation in them is existing ZFS users basically trying to explain why the default presentation is fine and get the OP to understand that. While there is nothing wrong with this in itself, I think it would be much simpler to put a little less emphasis on that and simply state:

"Dataset quotas can achieve what you want, but be aware that they can cause degraded performance if their usage gets too close to the limit. Be careful if you decide to use them". Instead, we end up with threads like these were you have to try and justify yourself 100 times and pull teeth to get to the crux of the issue.
 
Last edited:

oblivioncth

Explorer
Joined
Jul 13, 2022
Messages
71
Top