replaced drive not removed after resilver

havefun! · Sep 23, 2020

Hello Everyone,

Hoping you can help me resolve an issue with a replaced drive that doesn't seem to disappear after resilver. I assume due to errors reading the other device in the mirror.

inherited an old ZFS storage server that was running Nexenta 3.16. It is setup as 4 striped mirrors, I believe.

Recently had a 2TB drive failure, which Nexenta couldn't seem to recover from. The system went into a freeze/crash loop.
Failed drive was determined as culprit by being missing from LSI bios. After drive was replaced, system still wouldn't boot. Would freeze at login prompt.

After seemingly unable to boot to Nexenta, installed FreeNas to a USB. After FreeNas was booted attempting to import also identified that a cache SSD had failed. Was connected to motherboard sata controller, rather than LSI.

Physically replaced cache ssd. At this point, I probably should have gone back to Nexenta to see if this resolved the crash/freezing issue, but hindsight is 20/20.

Missing cache was removed with zpool detach or replace, can't remember which. Re-added new cache with zpool add cache <drive> command

Pool was successfully imported using zpool import -Ff -m pool1.
Ran zpool replace for the failed drive and resilver started.

However, after resilver completes, I am running into a situation where the other device in mirror is throwing checksum errors, and being marked degraded. I have cleared the pool twice now, and waited for resilver to complete with similar results.

Code:

  pool: pool1
state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: resilvered 509G in 0 days 10:55:32 with 18 errors on Mon Sep 21 04:13:31 2020
config:

        NAME                                              STATE     READ WRITE CKSUM
        pool1                                             DEGRADED     0     0 55.1K
          mirror-0                                        DEGRADED     0     0  110K
            gptid/d1fd2a3e-1257-cf45-f1a1-8980dc262804    DEGRADED     0     0  110K  too many errors
            replacing-1                                   DEGRADED   346     0  110K
              6465343411475403059                         UNAVAIL      0     0     0  was /dev/dsk/c0t5000CCA369E19282d0s0
              gptid/97722574-9c54-ffcd-bfe9-c0ad9f1260b4  DEGRADED     0     0  110K  too many errors
          mirror-1                                        ONLINE       0     0     0
            gptid/f4ae7b4e-5671-6869-e707-9a8586dc59bb    ONLINE       0     0     0
            gptid/883ec3fa-5031-ba45-9713-d696ea8bcf9b    ONLINE       0     0     0  block size: 512B configured, 4096B native
          mirror-2                                        ONLINE       0     0     0
            gptid/bfecf884-2a49-76cf-e2c7-d314e56a7be5    ONLINE       0     0     0  block size: 512B configured, 4096B native
            gptid/05e4b07f-d802-eceb-cc98-ec3a38ecfd59    ONLINE       0     0     0
          mirror-3                                        ONLINE       0     0     0
            gptid/bfa8b359-7f74-8a66-ee2a-8204fdc239ea    ONLINE       0     0     0  block size: 512B configured, 4096B native
            gptid/50c27685-2d39-80e9-e041-d6c82e547085    ONLINE       0     0     0  block size: 512B configured, 4096B native
        logs
          ada1p1                                          ONLINE       0     0     0  block size: 512B configured, 4096B native
        cache
          gptid/eb67a718-f576-11ea-a9fe-001b21c258bc      ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:
...

All of the error entries, except two are of the type: pool1:<0x2>, <metadata>:<0x6> or similiar.

Can I just replace the other device in the the mirror "gptid/d1fd2a3e-1257-cf45-f1a1-8980dc262804", and let it rebuild from the stripe?
Or will that break my pool?

Thanks in advance for any assistance.

Important Announcement for the TrueNAS Community.

replaced drive not removed after resilver

havefun!

Cadet

Similar threads