Greater than 80% volume capacity: OK for large static storage?

JTheNASBuilder · Jun 1, 2015

I searched the forums and the FreeNAS manual for an answer to this particular question but everything I found was focused on dealing with the actual alert that is issued and not discussion of the actual practice.

My question is this: if you are using a FreeNAS storage pool for the storage of large and static files (HD movies and television files) is it really necessary to leave 20% of the pool open? On my media volume that would mean leaving 2.84 TBs unfilled to maintain the volume at 80% capacity. That seems like an enormous amount of overhead administrative space.

If it's necessary, I suppose, then it's necessary and I'll just deal with it and eventually get around to adding another drive pool... but practically speaking do I really need to leave nearly 3TB empty on a 14.2TB volume?

HoneyBadger · Jun 1, 2015

This is a recommendation based around performance; I'm going to point to @jgreco's excellent post in another thread.

While it deals more with the peculiarities and pitfalls of VMs on ZFS, the idea remains the same - as fragmentation increases, performance drops. For a home user NAS that's focused on media streaming, and will be using a playback client that will buffer, you should be safe to exceed that 80% threshold but be aware that performance will get hurt.

There's another much more penalizing wall that you'll hit, around 95% I believe, and that will probably be enough of a performance hit that you'll need to expand then.

jgreco · Jun 1, 2015

That's about it. It'll really hurt when you hit it, too. But for an archival pool, one where you only write files once and never remove them or do updates, you can probably get up into the 97-98% range. This is mostly because there is no fragmentation on the pool.

That is a very specific and carefully worded exception to the general rule.

HoneyBadger · Jun 1, 2015

Addendum: if you really insist on pushing usage right up to that cliff, you probably want to disable atime on the filesystem.

JTheNASBuilder · Jun 1, 2015

So if I'm understanding this correctly... the issue is totally performance and not any real risk to the integrity of the pool/volume? The difference between a volume at 78%, 88%, and 98% capacity is a matter of performance and not an increased risk of disk failure or corruption?

(Also, great tip on the atime bit, as a FreeNAS neophyte I'd not have thought of that.)

jgreco · Jun 1, 2015

HoneyBadger said:
Addendum: if you really insist on pushing usage right up to that cliff, you probably want to disable atime on the filesystem.

Yeah, atime should usually be off unless you have a good reason to have it on. Pointless write traffic. I've been advocating stuffing it off for about 20 years!

http://marc.info/?l=freebsd-hackers&m=94346965928864&w=2

jgreco · Jun 1, 2015

No. The issue is that the moment you cause fragmentation, ZFS will start to flail when allocating new blocks, because essentially it has almost none. And you can probably get into a situation where Really Bad Things can (and do) happen if you're insistent enough. It's well known that if you fill a ZFS pool, you may not actually be able to remove contents from the filesystem because the blocks needed to do the update cannot be allocated - a bit of a bad situation to be in.

So, re-read what I said before. In the one case where you are ONLY adding files to a pool, never removing or modifying them, yeah, you can probably push out to ~97-98%. But all hell may break loose if you then try to delete some stuff and then write more files, or are removing and adding files all along. It is a very special edge case.

Jailer · Jun 1, 2015

Is there are way to measure the amount of fragmentation on a pool?

JTheNASBuilder · Jun 1, 2015

Gotcha, that's exactly the kind of information I was looking for. I'll definitely deep archive stuff and/or start a new pool before I hit, say, the low 90s to avoid that situation. Thanks for clearing everything up for me. =)

jgreco · Jun 1, 2015

Jailer said:
Is there are way to measure the amount of fragmentation on a pool?

Ask it?

Robert Trevellyan · Jun 1, 2015

Jailer said:
Is there are way to measure the amount of fragmentation on a pool?

zpool list

Jailer · Jun 2, 2015

Robert Trevellyan said:
zpool list

Well that was easy. My google foo is obviously lacking.......

Robert Trevellyan · Jun 2, 2015

Jailer said:
Well that was easy. My google foo is obviously lacking.......

It's just something I happened to notice one day when I was using the CLI. I think it showed up in a relatively recent version of ZFS (or ZFS on FreeBSD perhaps).

cyberjock · Jun 2, 2015

The number you get with that is somewhat of a joke though. I've read up on it a little bit, and the number doesn't mean the same thing as other file systems defrag percentages.

jgreco · Jun 2, 2015

I don't think there's a consistent meaning, since it is inherently subjective.

SirMaster · Jun 2, 2015

I wouldn't exactly call the number a joke. The FRAG number reported in "zpool list" is a measurement of the fragmentation of the free space on the pool.

Specifically, it is a measurement of the percentage of free blocks that are smaller than a specific size. If the free blocks are large enough then they don't count toward the FRAG percentage.

Free space fragmentation is really what matters for ZFS performance because of how the SLAB allocator code works. If your free space is all chopped up in small pieces then the SLAB allocator has to spend more time finding lots of small free slabs with which to allocate the incoming writes to.

The FRAG metric is added by the spacemap_histogram feature flag and is actually used by the SLAB allocator to more intelligently allocate the incoming writes.

When the incoming write load is low, the SLAB allocator spends more time searching for best-fit free blocks because it has time to do so. When the incoming write load is very high, then it simply throws data all over the big free metaslabs because it really doesn't have time to search for best fit spaces.

The SLAB allocator is actually tunable by the user too based on this FRAG metric.

As to the OPs question, yes in my experience with using zpools as backups in write once read many situations, I was getting good performance up to the 98% or so full mark where I stopped.

There is some information on it here I should add too:
http://blog.delphix.com/uday/2013/02/19/78/

There is lots more out there too if you look up info on the spacemap_histogram feature.

jgreco · Jun 2, 2015

That's a very thorough explanation. Do read it a few times! Great signal to noise ratio.

cyberjock · Jun 2, 2015

SirMaster said:
I wouldn't exactly call the number a joke. The FRAG number reported in "zpool list" is a measurement of the fragmentation of the free space on the pool.

Specifically, it is a measurement of the percentage of free blocks that are smaller than a specific size. If the free blocks are large enough then they don't count toward the FRAG percentage.

So if a single 1GB file is broken into 4KB blocks of which no two are contiguous, but are separated by enough space that it doesn't trip the fragmentation counter, then fragmentation is 0%. Does that make sense to you? Sure doesn't make sense to me. ;)

SirMaster said:
Free space fragmentation is really what matters for ZFS performance because of how the SLAB allocator code works.

No its not. I just gave an excellent example of a file that would have an absolutely abysmal performance curve if you had to read the file, but the fragmentation would be listed as 0%. For writing to the zpool, free space matters. For reading from the zpool, the actual location of the data is what matters. Granted, if you have sufficient free space then, ideally, your writes will be contiguous. But there is no guarantee because of other things like CoW, changes to the file after it was originally written, snapshots, etc.

Please note that I'm not arguing with the technical accuracy (yes, your post is pretty awesome as it has good info). I'm simply arguing that the frag value doesn't really mean what most people would expect it to mean, and as a consequence I think it's fair to say that the value is "a joke". Most people expect the percentage to mean things like "percentage of my files that aren't contiguous" or "percentage of my files that require each disk to make more than 1 seek to read the entire file".

Click to expand...

SirMaster · Jun 2, 2015

I understand what you are saying. Yes, you are right to say that there is a very large part of "fragmentation" that this metric has nothing to do with. It's just not something that any ZFS devs have tackled yet. It is definitely misleading as I know no other file system that measures fragmentation this way. Though they did it because it was the lowest hanging fruit and something they could do rather easily. At least rather easily compared to tacking file fragmentation (BP rewrite).

Although ZFS by design does try to minimize fragmentation by caching 5 seconds of writes, bundling that all into a transaction group, and then flushing it do disk all at once.

This works better than most other file systems at minimizing fragmentation on a system that is being written to by many, many sources at once.

However, the nature of copy on write and read-update-write operations will naturally cause fragmentation in individual files overtime which is just how it is.

The only way anyone has seen around this issue is block pointer rewrite which I'm sure most of you have an idea about as a concept. Hint: won't be happening any time soon heh.

cyberjock · Jun 2, 2015

SirMaster said:
The only way anyone has seen around this issue is block pointer rewrite which I'm sure most of you have an idea about as a concept. Hint: won't be happening any time soon heh.

I'd love to see BPR, but everything I've read and discussed with others about ZFS development leads me to believe that this is something that we may not see for a very long time, if ever, and only if a company with a lot of money is going to fund it. :P

Important Announcement for the TrueNAS Community.

Greater than 80% volume capacity: OK for large static storage?

Dabbler

actually does care

Resident Grinch

actually does care

Dabbler

Resident Grinch

Resident Grinch

Not strong, but bad

Dabbler

Resident Grinch

Pony Wrangler

Not strong, but bad

Pony Wrangler

Inactive Account

Resident Grinch

Patron

Resident Grinch

Inactive Account

Patron

Inactive Account

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Greater than 80% volume capacity: OK for large static storage?"

Similar threads