I am using encrypted raid z1 - and boy do I regret z1 instead of z2 right now.
I had a drive start to go bad on me. No data was lost, but there were some SMART test failures, and metrics showed a couple of bad reads on the drive. So I bought a new drive and started replacing according to http://doc.freenas.org/11/storage.html#replacing-an-encrypted-drive. During resilvering things started to go wrong. I noticed that reading files through SMB would sometimes fail, and I got write errors as well. Jails took a very long time to appear in the fails list. In my logs I see many messages like (ada7 is not the new disk. The new disk is ada0.):
And when checking resilvering progress I'd see something up near 2% one time and then look again and see it back at 0.15%. In the volume status pane I see the volume is degraded, the pool (I think it's the pool, it's named "raidz1-0") shows as degraded, and disk ada0p2 also shows as degraded. All other disks show as healthy.
Maybe I'm off base here, but that looks like ada7 is in big trouble as well. I did try replacing the sata cable just in case. Is it possible to go back and unreplace? The data on the pulled drive are still there. My hope is I can restore the volume using the pulled drive, backup what I can, replace ada7, resilver (hoping the originally pulled drive I've put back in place doesn't get worse), replace ada0 again, and resilver again.
I had a drive start to go bad on me. No data was lost, but there were some SMART test failures, and metrics showed a couple of bad reads on the drive. So I bought a new drive and started replacing according to http://doc.freenas.org/11/storage.html#replacing-an-encrypted-drive. During resilvering things started to go wrong. I noticed that reading files through SMB would sometimes fail, and I got write errors as well. Jails took a very long time to appear in the fails list. In my logs I see many messages like (ada7 is not the new disk. The new disk is ada0.):
Code:
Mar 19 08:22:27 note GEOM_ELI(ada7:ahcich7:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 d0 ce 48 40 28 01 00 01 00 00 Mar 19 08:22:27 note g_eli_read_done() failed (error=5) gptid/a4ee10c8-45dd-11e3-9663-60a44caf3660.eli[READ(offset=2542914260992, length=798720)] Mar 19 08:22:27 note (ada7:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error Mar 19 08:22:27 note (ada7:ahcich7:0:0:0): Retrying command Mar 19 08:22:27 note (ada7:ahcich7:0:0:0): READ_FPDMA_QUEUED. ACB: 60 78 d0 cf 48 40 28 01 00 00 00 00 Mar 19 08:22:27 note (ada7:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
And when checking resilvering progress I'd see something up near 2% one time and then look again and see it back at 0.15%. In the volume status pane I see the volume is degraded, the pool (I think it's the pool, it's named "raidz1-0") shows as degraded, and disk ada0p2 also shows as degraded. All other disks show as healthy.
Maybe I'm off base here, but that looks like ada7 is in big trouble as well. I did try replacing the sata cable just in case. Is it possible to go back and unreplace? The data on the pulled drive are still there. My hope is I can restore the volume using the pulled drive, backup what I can, replace ada7, resilver (hoping the originally pulled drive I've put back in place doesn't get worse), replace ada0 again, and resilver again.
Last edited: