Greater than 80% volume capacity: OK for large static storage?

Status
Not open for further replies.

JTheNASBuilder

Dabbler
Joined
Feb 4, 2014
Messages
28
I searched the forums and the FreeNAS manual for an answer to this particular question but everything I found was focused on dealing with the actual alert that is issued and not discussion of the actual practice.

My question is this: if you are using a FreeNAS storage pool for the storage of large and static files (HD movies and television files) is it really necessary to leave 20% of the pool open? On my media volume that would mean leaving 2.84 TBs unfilled to maintain the volume at 80% capacity. That seems like an enormous amount of overhead administrative space.

If it's necessary, I suppose, then it's necessary and I'll just deal with it and eventually get around to adding another drive pool... but practically speaking do I really need to leave nearly 3TB empty on a 14.2TB volume?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
This is a recommendation based around performance; I'm going to point to @jgreco's excellent post in another thread.

While it deals more with the peculiarities and pitfalls of VMs on ZFS, the idea remains the same - as fragmentation increases, performance drops. For a home user NAS that's focused on media streaming, and will be using a playback client that will buffer, you should be safe to exceed that 80% threshold but be aware that performance will get hurt.

There's another much more penalizing wall that you'll hit, around 95% I believe, and that will probably be enough of a performance hit that you'll need to expand then.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That's about it. It'll really hurt when you hit it, too. But for an archival pool, one where you only write files once and never remove them or do updates, you can probably get up into the 97-98% range. This is mostly because there is no fragmentation on the pool.

That is a very specific and carefully worded exception to the general rule.
 

JTheNASBuilder

Dabbler
Joined
Feb 4, 2014
Messages
28
So if I'm understanding this correctly... the issue is totally performance and not any real risk to the integrity of the pool/volume? The difference between a volume at 78%, 88%, and 98% capacity is a matter of performance and not an increased risk of disk failure or corruption?

(Also, great tip on the atime bit, as a FreeNAS neophyte I'd not have thought of that.)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
No. The issue is that the moment you cause fragmentation, ZFS will start to flail when allocating new blocks, because essentially it has almost none. And you can probably get into a situation where Really Bad Things can (and do) happen if you're insistent enough. It's well known that if you fill a ZFS pool, you may not actually be able to remove contents from the filesystem because the blocks needed to do the update cannot be allocated - a bit of a bad situation to be in.

So, re-read what I said before. In the one case where you are ONLY adding files to a pool, never removing or modifying them, yeah, you can probably push out to ~97-98%. But all hell may break loose if you then try to delete some stuff and then write more files, or are removing and adding files all along. It is a very special edge case.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Is there are way to measure the amount of fragmentation on a pool?
 

JTheNASBuilder

Dabbler
Joined
Feb 4, 2014
Messages
28
Gotcha, that's exactly the kind of information I was looking for. I'll definitely deep archive stuff and/or start a new pool before I hit, say, the low 90s to avoid that situation. Thanks for clearing everything up for me. =)
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Well that was easy. My google foo is obviously lacking.......
It's just something I happened to notice one day when I was using the CLI. I think it showed up in a relatively recent version of ZFS (or ZFS on FreeBSD perhaps).
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
The number you get with that is somewhat of a joke though. I've read up on it a little bit, and the number doesn't mean the same thing as other file systems defrag percentages.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I don't think there's a consistent meaning, since it is inherently subjective.
 

SirMaster

Patron
Joined
Mar 19, 2014
Messages
241
I wouldn't exactly call the number a joke. The FRAG number reported in "zpool list" is a measurement of the fragmentation of the free space on the pool.

Specifically, it is a measurement of the percentage of free blocks that are smaller than a specific size. If the free blocks are large enough then they don't count toward the FRAG percentage.

Free space fragmentation is really what matters for ZFS performance because of how the SLAB allocator code works. If your free space is all chopped up in small pieces then the SLAB allocator has to spend more time finding lots of small free slabs with which to allocate the incoming writes to.

The FRAG metric is added by the spacemap_histogram feature flag and is actually used by the SLAB allocator to more intelligently allocate the incoming writes.

When the incoming write load is low, the SLAB allocator spends more time searching for best-fit free blocks because it has time to do so. When the incoming write load is very high, then it simply throws data all over the big free metaslabs because it really doesn't have time to search for best fit spaces.

The SLAB allocator is actually tunable by the user too based on this FRAG metric.


As to the OPs question, yes in my experience with using zpools as backups in write once read many situations, I was getting good performance up to the 98% or so full mark where I stopped.

There is some information on it here I should add too:
http://blog.delphix.com/uday/2013/02/19/78/

There is lots more out there too if you look up info on the spacemap_histogram feature.
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That's a very thorough explanation. Do read it a few times! Great signal to noise ratio.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
I wouldn't exactly call the number a joke. The FRAG number reported in "zpool list" is a measurement of the fragmentation of the free space on the pool.

Specifically, it is a measurement of the percentage of free blocks that are smaller than a specific size. If the free blocks are large enough then they don't count toward the FRAG percentage.

So if a single 1GB file is broken into 4KB blocks of which no two are contiguous, but are separated by enough space that it doesn't trip the fragmentation counter, then fragmentation is 0%. Does that make sense to you? Sure doesn't make sense to me. ;)

Free space fragmentation is really what matters for ZFS performance because of how the SLAB allocator code works.
No its not. I just gave an excellent example of a file that would have an absolutely abysmal performance curve if you had to read the file, but the fragmentation would be listed as 0%. For writing to the zpool, free space matters. For reading from the zpool, the actual location of the data is what matters. Granted, if you have sufficient free space then, ideally, your writes will be contiguous. But there is no guarantee because of other things like CoW, changes to the file after it was originally written, snapshots, etc.

Please note that I'm not arguing with the technical accuracy (yes, your post is pretty awesome as it has good info). I'm simply arguing that the frag value doesn't really mean what most people would expect it to mean, and as a consequence I think it's fair to say that the value is "a joke". Most people expect the percentage to mean things like "percentage of my files that aren't contiguous" or "percentage of my files that require each disk to make more than 1 seek to read the entire file".
 

SirMaster

Patron
Joined
Mar 19, 2014
Messages
241
I understand what you are saying. Yes, you are right to say that there is a very large part of "fragmentation" that this metric has nothing to do with. It's just not something that any ZFS devs have tackled yet. It is definitely misleading as I know no other file system that measures fragmentation this way. Though they did it because it was the lowest hanging fruit and something they could do rather easily. At least rather easily compared to tacking file fragmentation (BP rewrite).

Although ZFS by design does try to minimize fragmentation by caching 5 seconds of writes, bundling that all into a transaction group, and then flushing it do disk all at once.

This works better than most other file systems at minimizing fragmentation on a system that is being written to by many, many sources at once.

However, the nature of copy on write and read-update-write operations will naturally cause fragmentation in individual files overtime which is just how it is.

The only way anyone has seen around this issue is block pointer rewrite which I'm sure most of you have an idea about as a concept. Hint: won't be happening any time soon heh.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
The only way anyone has seen around this issue is block pointer rewrite which I'm sure most of you have an idea about as a concept. Hint: won't be happening any time soon heh.

I'd love to see BPR, but everything I've read and discussed with others about ZFS development leads me to believe that this is something that we may not see for a very long time, if ever, and only if a company with a lot of money is going to fund it. :P
 
Status
Not open for further replies.
Top