Hello Everyone,
Hoping you can help me resolve an issue with a replaced drive that doesn't seem to disappear after resilver. I assume due to errors reading the other device in the mirror.
inherited an old ZFS storage server that was running Nexenta 3.16. It is setup as 4 striped mirrors, I believe.
Recently had a 2TB drive failure, which Nexenta couldn't seem to recover from. The system went into a freeze/crash loop.
Failed drive was determined as culprit by being missing from LSI bios. After drive was replaced, system still wouldn't boot. Would freeze at login prompt.
After seemingly unable to boot to Nexenta, installed FreeNas to a USB. After FreeNas was booted attempting to import also identified that a cache SSD had failed. Was connected to motherboard sata controller, rather than LSI.
Physically replaced cache ssd. At this point, I probably should have gone back to Nexenta to see if this resolved the crash/freezing issue, but hindsight is 20/20.
Missing cache was removed with zpool detach or replace, can't remember which. Re-added new cache with zpool add cache <drive> command
Pool was successfully imported using zpool import -Ff -m pool1.
Ran zpool replace for the failed drive and resilver started.
However, after resilver completes, I am running into a situation where the other device in mirror is throwing checksum errors, and being marked degraded. I have cleared the pool twice now, and waited for resilver to complete with similar results.
All of the error entries, except two are of the type: pool1:<0x2>, <metadata>:<0x6> or similiar.
Can I just replace the other device in the the mirror "gptid/d1fd2a3e-1257-cf45-f1a1-8980dc262804", and let it rebuild from the stripe?
Or will that break my pool?
Thanks in advance for any assistance.
Hoping you can help me resolve an issue with a replaced drive that doesn't seem to disappear after resilver. I assume due to errors reading the other device in the mirror.
inherited an old ZFS storage server that was running Nexenta 3.16. It is setup as 4 striped mirrors, I believe.
Recently had a 2TB drive failure, which Nexenta couldn't seem to recover from. The system went into a freeze/crash loop.
Failed drive was determined as culprit by being missing from LSI bios. After drive was replaced, system still wouldn't boot. Would freeze at login prompt.
After seemingly unable to boot to Nexenta, installed FreeNas to a USB. After FreeNas was booted attempting to import also identified that a cache SSD had failed. Was connected to motherboard sata controller, rather than LSI.
Physically replaced cache ssd. At this point, I probably should have gone back to Nexenta to see if this resolved the crash/freezing issue, but hindsight is 20/20.
Missing cache was removed with zpool detach or replace, can't remember which. Re-added new cache with zpool add cache <drive> command
Pool was successfully imported using zpool import -Ff -m pool1.
Ran zpool replace for the failed drive and resilver started.
However, after resilver completes, I am running into a situation where the other device in mirror is throwing checksum errors, and being marked degraded. I have cleared the pool twice now, and waited for resilver to complete with similar results.
Code:
pool: pool1 state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: resilvered 509G in 0 days 10:55:32 with 18 errors on Mon Sep 21 04:13:31 2020 config: NAME STATE READ WRITE CKSUM pool1 DEGRADED 0 0 55.1K mirror-0 DEGRADED 0 0 110K gptid/d1fd2a3e-1257-cf45-f1a1-8980dc262804 DEGRADED 0 0 110K too many errors replacing-1 DEGRADED 346 0 110K 6465343411475403059 UNAVAIL 0 0 0 was /dev/dsk/c0t5000CCA369E19282d0s0 gptid/97722574-9c54-ffcd-bfe9-c0ad9f1260b4 DEGRADED 0 0 110K too many errors mirror-1 ONLINE 0 0 0 gptid/f4ae7b4e-5671-6869-e707-9a8586dc59bb ONLINE 0 0 0 gptid/883ec3fa-5031-ba45-9713-d696ea8bcf9b ONLINE 0 0 0 block size: 512B configured, 4096B native mirror-2 ONLINE 0 0 0 gptid/bfecf884-2a49-76cf-e2c7-d314e56a7be5 ONLINE 0 0 0 block size: 512B configured, 4096B native gptid/05e4b07f-d802-eceb-cc98-ec3a38ecfd59 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 gptid/bfa8b359-7f74-8a66-ee2a-8204fdc239ea ONLINE 0 0 0 block size: 512B configured, 4096B native gptid/50c27685-2d39-80e9-e041-d6c82e547085 ONLINE 0 0 0 block size: 512B configured, 4096B native logs ada1p1 ONLINE 0 0 0 block size: 512B configured, 4096B native cache gptid/eb67a718-f576-11ea-a9fe-001b21c258bc ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: ...
All of the error entries, except two are of the type: pool1:<0x2>, <metadata>:<0x6> or similiar.
Can I just replace the other device in the the mirror "gptid/d1fd2a3e-1257-cf45-f1a1-8980dc262804", and let it rebuild from the stripe?
Or will that break my pool?
Thanks in advance for any assistance.