How to best handle a drive fault while resilvering a different drive on RAIDZ2

bstev

Explorer
Joined
Dec 16, 2016
Messages
53
I am currently 18% through resilvering the final disk on my raidz2, to expand it. Replacing 10x6TB drives with 10x10TB drives. All of these drives had been in service on another machine with light usage but had regular smart tests and scrubs. I find this morning that one of the 10TB drives installed, now has an ATA error count that has gotten to 3 and has 154 read errors on the pool so the drive is faulted.

The pool is pretty full so it is taking a full day or better for each drive replacement. my question is, what is my safest method from here? Allow it to finish the 18% resilvering on the final drive and hope no other drives have an issue, or cancel it and start the replacement of the faulted drive? Most of what I read says to let it finish, but I am nervous, so asking directly.

I do have my critical files from the pool backed up which are only like 2TB, but the other 38TB of this pool would be very very painful to try and rebuild. So I want to do my best to see this through properly and not turn into a nightmare.
 
Last edited:

bstev

Explorer
Joined
Dec 16, 2016
Messages
53
1670336661847.png

1670336746548.png
 
Joined
Jul 3, 2015
Messages
926
Are you doing an in-place replacement or have you actually offline/removed the 6TB drive to replace with the 10TB one?

If in-place then leave the resilver to finish as you have only technically lost one drive in a Z2 while the replacement is happening so you could lose another and still be ok.

If you have offline/removed the 6TB drive you have now technically lost two drives from a Z2 which is bad. I would still however leave the resilver to complete if you think it will be done within a day and then replace the failed drives ASAP.
 

bstev

Explorer
Joined
Dec 16, 2016
Messages
53
Are you doing an in-place replacement or have you actually offline/removed the 6TB drive to replace with the 10TB one?

If in-place then leave the resilver to finish as you have only technically lost one drive in a Z2 while the replacement is happening so you could lose another and still be ok.

If you have offline/removed the 6TB drive you have now technically lost two drives from a Z2 which is bad. I would still however leave the resilver to complete if you think it will be done within a day and then replace the failed drives ASAP.
Thank you for the quick reply. I am doing an in place replacement and I am glad you mentioned that, because I was hoping that was how it would handle things. In the past I was removing and replacing since I did not have any free bays.

Trying to be wiser as time goes on, this time I moved a two disk mirror out temporarily. This gave free bays for doing the in-place replacements.
 
Joined
Jul 3, 2015
Messages
926
Cool you should be fine then. Just make sure you replace that failed drive asap after your resilver is complete and dare I say backup the other 38TB :wink:
 

bstev

Explorer
Joined
Dec 16, 2016
Messages
53
After the resilver, would it be much hope in cleaning the SATA connection, reseating the drive and doing a scrub? Or just go directly to replacing it while leaving the drive faulted? This drive was installed a few days ago and had been part of the pool for maybe a day.
 
Joined
Jul 3, 2015
Messages
926
You could try that. If it looks ok I’d run a long SMART test on it to be sure.
 
Top