SOLVED Upgrading from 11 x 1TB to 2TB

Status
Not open for further replies.

siconic

Explorer
Joined
Oct 12, 2016
Messages
95
Hello everyone!

Been running FreeNAS for 3 years now, love it. I have an 11 disk Raid z3, and right now I am resilvering 2 disks at a time ( I know this may not be safe, but all my important data is backed up to CrashPlan Pro). My NAS is at capacity, only 80GB of free space. I have 64GB RAM, 2 x XEON E5640, 11 x HGST 1TB Disks in RAID z3, Upgrading to 11 x 2TB HGST disks.

I have 2 questions:

1. Is there any way to correlate the disks, to their location in the NAS. IE Bay 1 is /dev/sda1, etc... I know I can put that in the disk Description, but I am still curious.

2. It takes about 36 hours to resilver 2 disks. At the current rate of 2 disks every ~ 36 hours, it will take almost 8 days to complete. Is there any way to speed this up?

Thanks!


3 UPDATES:

1. If you add a disk to the resilver process, IE, your resilvering 2 disks, and add another, it starts the process over entirely!

2. If you offline a disk, and replace it, you CANNOT add the original disk back into the pool in the event of a failure, at least not from the GUI.

3. It appears that the number of disks you resilver does not affect the speed of the resilver process

So, just some usefull data based on my setup, that may help someone in the future. It could be that I have one bad disk, but we will see soon, and I will give another update.
 
Last edited:

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
1. No because even if there were it can change after a reboot.
2. No, especially resilvering 2 drives at a time.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
1. Is there any way to correlate the disks, to their location in the NAS. IE Bay 1 is /dev/sda1, etc... I know I can put that in the disk Description, but I am still curious.

I normally run a dd of the disk I'm interested in locating... the one that lights up it's activity lights is it....

2. It takes about 36 hours to resilver 2 disks. At the current rate of 2 disks every ~ 36 hours, it will take almost 8 days to complete. Is there any way to speed this up?

Well, you could do 3 at once. Super unsafe though.

If you have spare Sata ports you can replace without offlining and theoretically do all disks at once.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Is there any way to speed this up?
How many drive bays do you have? If you have enough, as @Stux says, you could replace all the disks at once. You'd follow the manual's instructions for replacing a disk without taking the old one offline first.
 

siconic

Explorer
Joined
Oct 12, 2016
Messages
95
I normally offline the disk, then replace it, but I just noticed when I "replaced disk" that the array appears to keep the old disk online! Is this indeed the case? I have 15 Drive bays, but only use 11, so I could do 4 at a time with the replace method?

upload_2017-9-15_6-5-50.png
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I have 15 Drive bays, but only use 11, so I could do 4 at a time with the replace method?
Assuming you have SATA ports and cables/backplane for all 15 bays, and depending on just how adventurous your are, you could do up to 7 at a time--offline three, remove them, replace them with three new ones (whereupon you're back to 11 bays used), then replace four more in-place. Once the resilver completes, remove the four disks you replaced in-place, then replace the remaining four disks in-place.

Of course, this method calls for removing all your redundancy, so it probably isn't what you want to do. But if you offline and remove two disks, you still have some redundancy, you can swap in six disks at a time, so you'd only require two complete resilvers to complete the replacement process.
 

siconic

Explorer
Joined
Oct 12, 2016
Messages
95
I do have one other question regarding replacing disks; If I am 50% complete on a resilver job, and I add another disk, and begin a resilver on it, the % complete resets to 0%. Are the other two disks still @ 50% and this one is at 0%? I ask, because I started to "In Place" 2 this morning, but would like to pull 2 disks, and add 2 more to the job, now that i know I can do that.

Also, is there anthing else I should be aware of when offlining two disks, AFTER I have already started a resilver?
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Assuming you have SATA ports and cables/backplane for all 15 bays, and depending on just how adventurous your are, you could do up to 7 at a time--offline three, remove them, replace them with three new ones (whereupon you're back to 11 bays used), then replace four more in-place. Once the resilver completes, remove the four disks you replaced in-place, then replace the remaining four disks in-place.

Of course, this method calls for removing all your redundancy, so it probably isn't what you want to do. But if you offline and remove two disks, you still have some redundancy, you can swap in six disks at a time, so you'd only require two complete resilvers to complete the replacement process.

Not even that bad... you could always re-online the off-lined disks if you needed to.

But yes, replacing without offlining is the most reliable way to do it, and you can replace as many disks as you have spare ports simultaneously.

Be an interesting stress test :)
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I do have one other question regarding replacing disks; If I am 50% complete on a resilver job, and I add another disk, and begin a resilver on it, the % complete resets to 0%. Are the other two disks still @ 50% and this one is at 0%? I ask, because I started to "In Place" 2 this morning, but would like to pull 2 disks, and add 2 more to the job, now that i know I can do that.

I believe so, but not sure.

Also, is there anthing else I should be aware of when offlining two disks, AFTER I have already started a resilver?

As long as you still have one disk of parity, its pretty safe. If you have a disk failure... add back one of the offlined disks and it will resilver very quickly to restore full redundancy.

At the end of the day, this is what ZFS is made for.
 

siconic

Explorer
Joined
Oct 12, 2016
Messages
95
Those are the answers I am looking for, so thank you guys for the help! I hope this thread helps others with replacing disks to upgrade their storage!
 

siconic

Explorer
Joined
Oct 12, 2016
Messages
95
As long as you still have one disk of parity, its pretty safe. If you have a disk failure... add back one of the offlined disks and it will resilver very quickly to restore redundancy.

How can I online a disk I have removed on the fly if I need to? I have never tried that.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Screen Shot 2017-09-15 at 9.34.54 PM.png
Screen Shot 2017-09-15 at 9.35.03 PM.png


Trick is knowing which disk by Serial Number matches up with the ZFS id.

Good notes would help. Alternatively, you can look it up somehow...

Ie, so you know which disk to online...

Actually... it shouldn't matter... just put any of them back and try onlining all of the offlined disks, FreeNAS will figure it out.
 

siconic

Explorer
Joined
Oct 12, 2016
Messages
95
Actually, the 1TB disks I am replacing have 71k hours on them, in my garage with only a single HGST disk failure! Bought 5 brand new 4TB Seagates, ALL FAILED BEFORE THE WARRANTY! So I got them warrantied, and the "refurbs" they sent me failed within 3 months. ALL OF THEM! Never ever again will I use Seagate. Gotta love the HGST Ultrastars! That 2 million hours MTBF sure holds up. The new ones have much less, around 35k, so all my disks have been thoroughly "stress tested", lol. So If I can get another 35k hours of "abuse" out of them, I will be golden!
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Yeah, more curious about the GB/s resilvering rate you end up seeing
 

siconic

Explorer
Joined
Oct 12, 2016
Messages
95
Right now, around 90MB/s. I wish it was GB/s, this would take no time at all. I am pretty sure it is because of the 1TB disks, they were pretty slow. I think these are a little faster, but they are still only 300MB/s.

I am running a Rackables SAN, and running SAS from the NAS to the Server. But the backplane is only SATA, so I am sure my speeds are pretty limited. One day when I get the money, I want to upgrade the chassis itself to a better model. But on a budget, it works, and works perfect for my needs.

upload_2017-9-15_7-42-26.png
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Might also be bottlenecked by the RaidZ3
 

siconic

Explorer
Joined
Oct 12, 2016
Messages
95
Might also be bottlenecked by the RaidZ3

Does Z3 really cause that much of a performance degradation? I thought that was kind of a myth, that the performance difference was negligible between Z2 and Z3. I know small reads and writes suffer the most, and would likely be noticeable, and in my application, I rarely have small reads or writes.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Does Z3 really cause that much of a performance degradation? I thought that was kind of a myth, that the performance difference was negligible between Z2 and Z3. I know small reads and writes suffer the most, and would likely be noticeable, and in my application, I rarely have small reads or writes.

Dunno. Let me erase and resilver a drive...

...

So, with my system, the resilver seems to be bottlenecked by the write activity to the new disk... ie its seeing 99% busy. most of the other 7 disks in the 8-way RaidZ are seeming to fluctuate 30-80% busy...

Its currently running at just 22MB/s on the resilver... From previous experience I see that speeds up as it gets further into the resilver... Scrubs do reach into the GB/s range.

Try running gstat in a shell and seeing which devices are bottlenecking.

My drives have sequential performance of 220-110MB/s or so across their platters... so in the future when ZFS offers sequential resilvering, it should be better

https://github.com/zfsonlinux/zfs/pull/6256
 
Last edited:

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Does Z3 really cause that much of a performance degradation?
Not neccessarily RaidZ3 but a wide vdev such as yours will.

IIRC @cyberjock posted that he'd never do a 12 wide vdev again due to the performance degradation as it fills and the long resilver times.

EDIT: Here's just one post I found.

12 disk vdev

And another by @Ericloewe

Disks per VDEV
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
There we go... just kicked up from 22MB/s to 120MB/s, and now da7 (the drive been resilvered) is only 25% busy. So now its being bottlenecked somewhere else...

Source disks are now 65-85% busy.

Still accelerating.

Now up to 150M/s and da7 is just 10% busy.

I suspect it depends on the source fragmentation etc... sometimes we have big data dumps... and sometimes lots of itty bitty files.

edit: yeah, at 155M/s, the bottlenecks are now the source drives... which are hitting 100%, and the destination is just 9%
 
Status
Not open for further replies.
Top