Unable to replace a disk after failure (RAIDZ1)

Joined
Nov 9, 2013
Messages
4
Ada0.jpg
Hi there,
I'm running FreeNas 11.2 on a PC
- MOBO ASUS P7P55D
- Intel Core I3
- 16 GB RAM
- Intel PRO/1000 GT
- 5 * 3GB WD RED disks in a RAIDZ1 pool
Some days ago, I got some I/O errors on the ada0 disk, so I put it offline, shutdown the box, remove the faulty disk, inserted a new disk ( 3GB WD RED) and did a REPLACE.
The replace ran for several hours and finished with error (which one ? I don't know)
On another PC I tested the 'new' disk, overwrite it whith zeroes, and put it back in FreeNas. Same thing.
So I bought a brand new disk, Tested it on another PC, result OK, and then I put it in the FreeNas box. When I try to replace, I on Member Disk, but nothing appears, so I'm completely stuck.
Is there any solution to recover it ?
Or do I need to reconstruct all the array, and restore all from backup, which means a week of downtime ?
Thanks in advance for any help, and apologies for my english which is not my usual language.
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Hey Michel,

Je parle aussi français si cela peut t'aider...

RaidZ-1 is to be avoided for many reasons. You may be experiencing one of them right now.

Once RaidZ-1 is degraded and missing its only extra drive, it drops to no protection at all. So when it is time to rebuild, there is no way for ZFS to detect and fix errors in the filesystem. Should there be a single piece of info wrong in the filesystem, ZFS can not fix it because it lost its only option to recover. Also, to rebuild requires to read the entire filesystem, on every drive. The exercise is pretty intensive and have a higher risk to hit such errors compared to regular operation. Should the array be Raid-Z2 or RaidZ-3, then ZFS would still be able to detect and fix errors during the rebuild.

Another consequence is that once a rebuild is performed with such undetected and uncorrected error, there is no way to recover at a later moment. To rebuild the entire pool ends up the radical but last resort solution.

According to your post, you did correctly on your first replacement :
--You offline the drive
--You physically removed and replaced it safely by powering down the server
--Once back, you triggered the re-silvering process by using the Replace action from the GUI

So unfortunately, it looks like Yes, to restore your backup will be the safest thing to do. But before doing that, have a look at Raid-Z2 and see if you can use it instead of going back walking the edge of the cliff once again.

Good luck recovering from that incident,
 

blueether

Patron
Joined
Aug 6, 2018
Messages
259
do you have space/ports for the original ada0? that disk will have usable data on it that may help in the resilver if it is readable
 
Top