Is it safe to replace both drives in a mirror if both of them are dying?

Status
Not open for further replies.
D

Deleted47050

Guest
Heh, crappy thread title, I didn't know how to put this better, so allow me to clarify. I have one FreeNAS box configured with a single two-drive mirror. This morning I received an email telling me that my drives are failing:

Code:
Device: /dev/ada1, 1 Currently unreadable (pending) sectors
Device: /dev/ada0, 25 Currently unreadable (pending) sectors
Device: /dev/ada0, 25 Offline uncorrectable sectors


I have a backup of this system so restoring my data is not going to be an issue. However, is it safe to replace one drive, resilver, replace the other drive, resilver again in this case, since both of them are having issues? Or do I risk ending up with a new mirror with corrupted data?

I kinda think I already know the answer here, but I figured I'd ask here first.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Yes, it's is reasonably safe to replace one mirror at a time. Due to the errors, replace /dev/ada0 first.
ZFS won't copy bad data.

Note that if you really have some bad blocks, ZFS will tell you via zpool status -v
Any files listed as lost would have to be manually restored
 
D

Deleted47050

Guest
Can I see the actual list of files lost just by running zpool status -v? Or will that simply tell me if there are some bad blocks, and I need to find out what files were lost using other tools?


Sent from my iPhone using Tapatalk
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Can I see the actual list of files lost just by running zpool status -v? Or will that simply tell me if there are some bad blocks, and I need to find out what files were lost using other tools?
In general, you will get a list of files. On rare occasions, it's less verbose, like for ZVols.

Another thing to note, is that ZFS, (unless told otherwise), will keep redundant metadata.
Meaning, on a simple 2 way mirror vDev / pool, you will have 4 copies of directory entries,
2 on each disk. And for critical metadata, 3 copies per disk, (6 total). Thus, it's harder to
have a corrupt file system, (aka dataset), with ZFS. You CAN loose data, but it's less likely
to loose an entire directory tree or a file's directory entry.

Here are the ZFS dataset parameters for redundancy. You can look them up in documentation,
(like Unix manual pages), for details;
Code:
rpool  redundant_metadata  all  default
rpool  copies			  1	default
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
If you have ports to spare, you temporarily turn it into a 3-way mirror.
 
D

Deleted47050

Guest
If you have ports to spare, you temporarily turn it into a 3-way mirror.

Why? What would be the advantage? The only thing I can think of is lower stress on the drives during the resilvering. Is this why you are suggesting this?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Yes, I forgot about adding a 3rd mirror. It has the advantage if blocks are bad on one source
disk, but are available on the other source disk, you can get a clean copy.

Adding a 3rd mirror is similar to performing a zpool replace. With the replace option, the
new device is added as a temporary 3rd mirror until the re-silver is complete. The source disk
is used for all data, UNLESS there is a bad block. In which case, any other available source, like
a 2nd mirror, is then used.
 
D

Deleted47050

Guest
Yes, I forgot about adding a 3rd mirror. It has the advantage if blocks are bad on one source
disk, but are available on the other source disk, you can get a clean copy.

Adding a 3rd mirror is similar to performing a zpool replace. With the replace option, the
new device is added as a temporary 3rd mirror until the re-silver is complete. The source disk
is used for all data, UNLESS there is a bad block. In which case, any other available source, like
a 2nd mirror, is then used.

Did you by chance mean to say zpool attach? I mean, since in this case we will be temporarily adding one drive to the pool, rather than replacing one of the existing ones?
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
If you look in the documentation for "replacing drives to grow a pool", you'll see the procedure. You physically install the additional drive, then select one of the existing drives and use the Replace task, choosing the new drive as the replacement. During resilver, the original vdev remains intact. After resilvering, the replaced drive is automatically offlined. At that point it can be physically removed.
 
D

Deleted47050

Guest
If you look in the documentation for "replacing drives to grow a pool", you'll see the procedure. You physically install the additional drive, then select one of the existing drives and use the Replace task, choosing the new drive as the replacement. During resilver, the original vdev remains intact. After resilvering, the replaced drive is automatically offlined. At that point it can be physically removed.

Ok, thanks for clarifying! I guess the only thing that sucks this time around is that I won't have time to burn in my new drives but oh well, you can't have everything.
 
D

Deleted47050

Guest
Partially unrelated, but anyway: I decided to tempt fate and burn in my new drives anyway, hoping that my pool will survive for a couple more days. However, after the reboot I see this:

Code:
Device: /dev/ada0, 2 Offline uncorrectable sectors
Device: /dev/ada0, 2 Currently unreadable (pending) sectors


Does this make any sense to you? The offline sectors were 25 before the reboot.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The drive must have fixed a few of them.
 
D

Deleted47050

Guest
I didn't even know a drive could do that :/


Sent from my iPhone using Tapatalk
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Usually it is fixed by re-mapping the sector onto a spare on the next overwrite. Still needs watching.
 
D

Deleted47050

Guest
Usually it is fixed by re-mapping the sector onto a spare on the next overwrite. Still needs watching.

Interesting, but the damaged sectors are still there then. Shouldn't the report still count these?

Or does this only count sectors where you actually have data on?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
All drives have a bunch of empty sectors onto which bad sectors can be remapped.
 
D

Deleted47050

Guest
Ok, but if the report said 25 offline sectors first, and now it says 2, it means that there are still other 23 offline sectors somewhere. I was just wondering if it would make sense to count these anyway, even if they have been remapped.

Not sure if I am making any sense.


Sent from my iPhone using Tapatalk
 
Status
Not open for further replies.
Top