Rebalancing a Zpool

Status
Not open for further replies.

cfendya

Dabbler
Joined
Jul 8, 2013
Messages
10
Hi all,

Semi-New here but finally decided to take the plunge and implement FreeNAS in my home. I have a question in regards to Zpools and when adding vdev's how I can rebalance data residing on the original vdev.

Before I start, I'll provide some background. My data (5TB) is almost 100% video. I'll be migrating from a Win2012 Server NTFS setup and plan on reusing my original drives once data has been copied over to ZFS.

My Zpool setup is as follows:
- vDev1 - 4 x 2TB drives - Raidz1

I plan on adding 4 x 2TB drives to my Zpool to create a second vDev once data is finished copying thus doubling my capacity. I don't want to get in a raidz1 vs raidz2 conversation as I understand the differences between the two and I'm comfortable with risks with raidz1 right now in my current setup :)

If you're following the above, you'll see that after a data migration my first vdev will be near 100% capacity. Here is where my question comes in....

I add in my second vdev...How do I rebalance my zpool across both vdevs? I've googled and searched on various forums and the common theme is ZFS doesn't support an automated rebalance. A normal rebalance could occur where files are written too however since I'm dealing with 100% video, these files will never change so they'll never "move" or rebalance on their own.

I've seen a few people discuss using "zfs send tank/data | zfs recv tank/data_new" but I first wanted to query the group to see if this was the right way of things. Honestly, I'm starting to read up on what this exactly does too so if someone could explain I would also appreciate it.

Thanks guys in advance on any suggestions or guidance!!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, but you also stand the chance of fragmenting the data. The bottom line, add the vdev and enjoy the new space. Don't mess with your pool. It works.
 

cfendya

Dabbler
Joined
Jul 8, 2013
Messages
10
Thanks cyber...I was just thinking and hoping to alleviate hot spots. Would the zfs send | zfs receive fragment?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yes, especially to the same pool.

You're missing the bigger picture and I'm not going to explain this in much detail because i could spend the next hour writing on the topic.

1. zfs will preferentially choose the emptier vdev for new writes.
2. a single vdev can easily saturate 1Gb LAN, so your vdevs are NOT your bottleneck. Unless you plan to go to massively overpriced 10Gb tomorrow you shouldn't care.
3. You aren't going to see an appreciable performance increase because of how everything in ZFS works out.

So see what I wrote above.. just use the pool. This is a waste of your time to think about, a waste of my time to talk about, and only adds more potential unknowns that could result in lost data. Add the vdev and go party with the new disk space.
 

SirMaster

Patron
Joined
Mar 19, 2014
Messages
241
I balanced my data and it worked out fine for me.

Basically I was migrating lots of data and had to make 2 vdevs.

I made the first vdev out of 6x2TB disks which was 8TB of space. Then I filled up this vdev to ~99% capacity. This is obviously not ideal, because next I added a second vdev of 6x3TB disks to add 12TB onto the pool.

So what I did to re-balance is I took half the data on the pool (4TB) and copied it to another directory in the pool. This caused 99% of the data to go to the new empty vdev since the other vdev was 99% full.

Then once the copy was done, I deleted the original copy of the files (thus freeing 4TB of space on vdev 1) and then moved the files back to their original directories.

This leaves me with an 8TB vdev at 50% usage and a 12TB vdev at 33% usage which was a better situation.



Eventually later I went on to fill my pool to about 80% so my vdevs would be something like 90% and 70% respectively, and then I replaced all the 2TB disks in vdev 1 with 4TB disks bringing my vdev usages to 45% and 70% which is roughly where they are at now.

Performance and everything has been great through and through. I still get over 700MB/s sequential read/write and my last scrub only took 9h29m to scrub 22.9TiB of data.

I'm happy I did it, but for media and gigabit network bottlenecks it's true that it's probably not necessary to worry so much about.
 

cfendya

Dabbler
Joined
Jul 8, 2013
Messages
10
Thanks Sir! Did you by chance take I/O metrics before the rebalance? I'm sitting at 95% on vdev1 and 0% on vdev2 so very similar to how you started :) I'm still looking into the zfs send | zfs receive process as it sounds like this will do everything at once vs doing the copy process. Also, I heard that they are looking at an actual rebalance command (bp_rewrite). Must mean something to someone vs what Cyber mentions above.

Like most things, I'm sure it's very dependent on what type of data is stored and how. In my case, I have massive files that are mostly being read with limited writes over a 1Gb link. Nothing near enterprise obviously however the geek in me does like to follow suggested practices. Also, I found someone which was running a VMware cluster backed by ZFS where he ran into IO performance problems. He corrected it by rebalancing his zpool.

The question though is...If rebalancing does become a suggested practice (it's obviously made a difference for people) than it should be done anytime a vdev is added. Whether using the "zfs send | zfs receive" or the cp process, both would seem like they would take a long time...especially as the environment grows.
 

SirMaster

Patron
Joined
Mar 19, 2014
Messages
241
I did not keep any metrics sorry :(

bp_rewrite is quite the interesting topic heh. I would not count on it being implemented any time soon if ever though. But I like to remain hopeful that someone will tackle it once again someday. Matt Ahrens started working on it once way back while at SUN but it ended up becoming a dead project. Some info near the end of this video:
View: https://www.youtube.com/watch?v=G2vIdPmsnTI


He basically said that if he ever did implement it he would like it to be the last feature they add to ZFS because it would make all features after bp_reqrite more difficult and complicated. And it's hard to say what will or when the last feature would be. There is still a lot of stuff they want to do with ZFS and its still a somewhat growing software.

As far as always re-balancing when you add a vdev? Probably not unless you are talking 90%+ usage before adding the vdev. There is another project in ZFS that's already on it's way that reduces the need to worry about balancing so much by preventing you from getting into such unbalanced situations in the first place. It has to do with keeping track of zpool fragmentation and setting up vdev quotas and things. You can hear all about it here:
View: https://www.youtube.com/watch?v=UuscV_fSncY&t=0m30s

So you can see that pool fragmentation and vdev balancing is currently a hot topic actually and the next big place the developers are looking to tune and improve.
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I guarantee you that you didn't rebalance (whatever you think that meant) but good luck. Nice to see a noon think he knows more than an expert. Good luck. I'm sure you have great thing ahead for you
 

SirMaster

Patron
Joined
Mar 19, 2014
Messages
241
Care to explain how you "guarantee" that I didn't rebalance my pool?

zdb confirms otherwise. Before the operation vdev1 metaslabs were at 99% capacity and vdev2 was at 0% capacity.

After the operation vdev1 is at 50% capacity and vdev2 at 33%.


zpool iostat -v confirms this as well and my writes are now going to both vdevs close to equally whereas before they were only going to vdev2 with vdev1 showing an insignificant amount of io.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Not really. Those percentages are only a small part of the picture.

I told you not to do it and kept it simple because I wasn't about to explain how incredibly difficult it is to even validate that you've rebalanced and I'm sorry but I'm not about to explain it now.

And I have put my money where my mouth is before. I've added vdevs to computers for friends and family and I have never tried to "rebalance" their pools. I just added the vdev and let it go.

Since you didn't take my word for it before why would I think that you would now? Right?
 

SirMaster

Patron
Joined
Mar 19, 2014
Messages
241
Alright, but given the fact that ZFS must always write at least some data to each vdev then there is a reason why someone would want to maintain enough free space on all vdevs. Because writes can slow way, way down when ZFS has to spend a lot of time filling what are only tiny holes left in the metaslabs. I freed up large contiguous space in my metaslabs on the previously 99% full vdev.

I did the process I did based on a recommendation by ZFS developers themselves (which are as expert as anyone can get) and they assured me that it was indeed the correct way to balance out my vdevs into a more ideal situation than they were currently in and that it would be a good idea in the long run for the performance of my pool.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Alright, but given the fact that ZFS must always write at least some data to each vdev...

That right there is incorrect.

See? I *knew* that you'd have misconceptions about ZFS. This isn't new and you won't be the last. I'm just not about to spend what would probably be the better part of my day to explain it.

Sorry.
 

SirMaster

Patron
Joined
Mar 19, 2014
Messages
241
It's not incorrect...

ZFS has to write at least 512 bytes to each device. It cannot fail that allocation or it fails the device. (I'm surprised if you do not know this). This is a fundamental design rule of ZFS.

Either go look at the code yourself or ask Matthew Ahrens or George Wilson if you do not believe me. I'm not asking you to take my word for it.
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Sorry but 512 bytes when talking about TB of data is NOT "writing data".

Yes, technically transaction data is written, but you're arguing that "I peed in the ocean so the level is higher". Yes, technically it's higher, but is *anyone* going to actually argue over it? Hell no. If you're upset about 512 bytes being written you've got your priorities extremely mixed up.

Like I said, not here to argue with it because it's not worth my time. Been down this path and had these questions asked of me as well as asked some of the ZFS programmers. My answer didn't come from just me.

Good luck.
 
Status
Not open for further replies.
Top