Metaslab Meta-slab modification

Doug183

Dabbler
Joined
Sep 18, 2012
Messages
47
Any experts or people with hands on experience who have actually altered the metaslab parameter and know if it works reliably?

I am asking because over the years, as I have upgraded from 4TB drives to now 18TB drives, basically I am losing quite a bit of space to the ZFS ether because of the metaslab issue.

The best articles written on the meta-slab subject are here:


Before there is knee jerk reaction:
I have been using home built multi drive servers using ZFS and FreeNAS/TrueNAS for 8 years now, so I have some experience in what I am doing, so I ask politely not to turn this conversation into a flame war of opinions on how moronic I am for asking this question. (I get it, its not approved or tested.) I am happy to share my experiences and all the mistakes I have made along the way with the community, so I ask for the same generosity as well.

I am not the typical TrueNAS user, meaning I have a large library of video (film archive) with mostly large files that is more less stagnant except for one pool in each server that gets new video added to it. I built it this way for long term reliability, easy expansion, easy to backup to another set of bare drives. Only one user access this storage at a time so through-put is fine they way it is (actually I have 10 Gig net issues I have yet to solve but that's another post.). Yes there us a downside of losing speed efficiency by not making the pools as one big vdev, and I have to manually manage some drive/zfs, but speed is not the issue, reliability and recovery is if we ever lose a vdev.

Here is the Topology as well:
Server 1:
- Storage 1 has Ten 12TB drives in RaidZ2 and is both a Zpool and a vdev (Full)
- Storage 2 has Ten 12TB drives in RaidZ2 and is both a Zpool and a vdev (Full)
- Storage 3 hasTen 12TB drives in RaidZ2 and is both a Zpool and a vdev (Has Free Space)
(30 drives Total)

Server 2:
- Storage 4 has Ten 18TB drives in RaidZ2 and is both a Zpool and a vdev (Full)
- Storage 5 has Ten 18TB drives in RaidZ2 and is both a Zpool and a vdev (Full)
- Storage 6 hasTen 18TB drives in RaidZ2 and is both a Zpool and a vdev (Full)
- Storage 7 hasTen 18TB drives in RaidZ2 and is both a Zpool and a vdev (Has Free Space)
(40 drives Total)

Servers and drives are mostly left off until I need to access. I also do not run a ZIL or SLOG.

Back to my question:
1) Can the meta-slab be altered reliably and with stability?
2) How to make changing the meta-slab variable stick?
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I am not the typical TrueNAS user, meaning I have a large library of video (film archive) with mostly large files that is more less stagnant except for one pool in each server that gets new video added to it
While I'd argue that a lot of "typical TrueNAS users" are storing a big repository of "Linux ISOs" in a write-once-read-many (or never) server, most aren't using the scale of hardware you've got though.

The tunable you want is vfs.zfs.vdev.metaslabs_per_vdev and you can set this as tunable type sysctl in the TrueNAS UI where it will stick; however, this needs to be set before a top-level vdev is created, so it can't impact any of your existing vdevs.

I haven't mucked with this stuff in some time, so I apologize if I'm rusty. The limit of 200 metaslabs per vdev was set some time ago and was somewhat of an arbitrary number back then. With drives now above 20TB each it would make sense to have a larger number of metaslabs (with their own spacemaps, that get loaded and unloaded into RAM as needed) to spread it out - however, because of the funky way that metaslab sizing works (pick a number for 2^metaslab_shift that gives you as close to 200 slabs as possible without going over) there are certain thresholds you have to cross.

Right now your 18T disks are carved into 130 slabs of 128G (!) each and you're losing 123.8G of space that can't be allocated because it isn't quite enough to make a new slab. Bumping up the maximum number of metaslabs to 262 and allowing 261 of 64G each would result in a loss of 59.8G - you don't get your next break until you allow 524, use 523 x 32G and lose 27.8G.

200->262 isn't big enough to break things in my opinion. Anyone who's replaced 6T with 8T drives in-place has roughly the same increase (174 to 232) so my take would be to set the sysctl to 262, build a new vdev, and enjoy the additional ~60G of space.

Fun note - a lot of this results from the base10 vs 2^X malarky in drive sales. If your 18TB drives were 18TiB then you'd have 144 slabs of 128G each, no wasted space, and be very happy. Blame the Marketing department.
 

Doug183

Dabbler
Joined
Sep 18, 2012
Messages
47
Thank you. Looking at your math, is the 200 Meta-Slab size calculated per drive or per vedev? Meaning I'll regain about 10x 64G. (So 6.4 TB's)? Or is the meta-slab math done on the 10x18TB vdev as a whole? (The math might still be the same, but just asking.)
 
Top