I should begin by saying that yes, most of this data is backed up, but BOY HOWDY would it be a pain to restore. (the usual story, I bet)
I have a FreeNAS 9.10.2-U1 box which has/had a raidz1 with 4x3T drives. One of the drives started throwing errors, so I decided now would be a good time to upgrade to 4T drives (already did this once, 2T to 3T, a couple years ago).
I swapped in the first drive for the bad one, and got everything resilvered. What I *didn't* do was a full scrub. :( Then, I swapped in the second drive, and after a couple hours, THE NEW (FIRST) DRIVE STARTED THROWING ERRORS. Things appear to sort of, kind of be resilvering still:
The "ONLINE" drive is the second one I swapped in, and the new drive with errors is gptid/35XXXXX .
What's my best course of action here? I'm curious whether the resilver will ever (can ever?) finish, since there's effectively two failures in the RAIDZ1. The file system itself appears up and available - I've been able to use it while this is going on, though I haven't needed any of the files with errors. I would be perfectly OK with losing the files which are currently showing data errors (or, really, any chunk of the data up to a complete rebuild - and even a complete rebuild is not the end of the world, just REALLY ANNOYING).
My preference is for the outcome which requires the least interactive effort on my part - I'm happy to let this resilver run for another couple days if it might actually succeed (for some value of "succeed"), if it means less actual work on my part (as opposed to on the part of the computers).
Also- yes, I realize that raidz2 or raidz3 would have prevented this, and I did understand that this could happen - I just didn't know exactly what it would look like. I knew the risks! :)
Thanks in advance -
+j
I have a FreeNAS 9.10.2-U1 box which has/had a raidz1 with 4x3T drives. One of the drives started throwing errors, so I decided now would be a good time to upgrade to 4T drives (already did this once, 2T to 3T, a couple years ago).
I swapped in the first drive for the bad one, and got everything resilvered. What I *didn't* do was a full scrub. :( Then, I swapped in the second drive, and after a couple hours, THE NEW (FIRST) DRIVE STARTED THROWING ERRORS. Things appear to sort of, kind of be resilvering still:
Code:
pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sun Jun 17 09:07:49 2018 1.33T scanned out of 10.2T at 343M/s, 7h33m to go 340G resilvered, 12.98% done config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 722 raidz1-0 DEGRADED 0 0 1.54K gptid/2bfb3df4-c3b3-11e5-ab04-1cc1de023244 DEGRADED 0 0 0 too many errors gptid/2d126fc9-c3b3-11e5-ab04-1cc1de023244 DEGRADED 0 0 0 too many errors gptid/a0eeff41-703d-11e8-b75f-1cc1de023244 ONLINE 0 0 0 (resilvering) gptid/35b07078-6f6d-11e8-b75f-1cc1de023244 DEGRADED 720 0 0 too many errors errors: 152 data errors, use '-v' for a list
The "ONLINE" drive is the second one I swapped in, and the new drive with errors is gptid/35XXXXX .
What's my best course of action here? I'm curious whether the resilver will ever (can ever?) finish, since there's effectively two failures in the RAIDZ1. The file system itself appears up and available - I've been able to use it while this is going on, though I haven't needed any of the files with errors. I would be perfectly OK with losing the files which are currently showing data errors (or, really, any chunk of the data up to a complete rebuild - and even a complete rebuild is not the end of the world, just REALLY ANNOYING).
My preference is for the outcome which requires the least interactive effort on my part - I'm happy to let this resilver run for another couple days if it might actually succeed (for some value of "succeed"), if it means less actual work on my part (as opposed to on the part of the computers).
Also- yes, I realize that raidz2 or raidz3 would have prevented this, and I did understand that this could happen - I just didn't know exactly what it would look like. I knew the risks! :)
Thanks in advance -
+j