Long story short, I have a partially corrupt pool that was cause by some gross negligence on my part due to lack of alerting and a busy few months. Once i identified the failure i picked up some new (larger) drives and planned on upgrading the failing mirror and then simply deleting the corrupted files once that was completed. This is where i ran into some weird issues.
First, output of zpool status:
Mirror-2 is in a sorry state and contained the failing hardware. I replaced the 4TB 12595850469584586315 with a 10TB gptid/4876745b-3736-11ea-8eba-000c29f0189f which cause a resilver and I figured I could simply offline the drive and then replace its partner gptid/be5fc6de-f100-11e8-830c-000c29f0189f. However, when trying to offline 12595850469584586315 i receive the following error:
Now where I really made a stupid mistake was to try to add the second new 10TB drive to replace the original failing drive (don't ask, it was a stupid thought) instead of replacing the gptid/be5fc6de-f100-11e8-830c-000c29f0189f drive that's still in the mirror.
So the current crux of my problem is that when i try to remove any one of the two new drives I added i get the same "no valid replicas" error. How do i go about removing gptid/4876745b-3736-11ea-8eba-000c29f0189f or gptid/e86f027a-37b2-11ea-8eba-000c29f0189f and replace gptid/be5fc6de-f100-11e8-830c-000c29f0189f with one of those two drives?
And lastly, will that let me bring the pool back into a "stable" setting so i can simply clear the corrupted files and move on? No really worried about losing the data as much as I don't want to rebuild the pool and integrations i've built around it.
First, output of zpool status:
Code:
root@freenas:~ # zpool status pool: StoragePool001 state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: resilvered 6.07T in 0 days 13:24:10 with 4594493 errors on Wed Jan 15 23:44:26 2020 config: NAME STATE READ WRITE CKSUM StoragePool001 DEGRADED 8.76M 0 0 mirror-0 ONLINE 0 0 0 gptid/ba8e4f65-f100-11e8-830c-000c29f0189f ONLINE 0 0 0 gptid/bb325aad-f100-11e8-830c-000c29f0189f ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 gptid/bc00738c-f100-11e8-830c-000c29f0189f ONLINE 0 0 0 gptid/bcafc4af-f100-11e8-830c-000c29f0189f ONLINE 0 0 0 mirror-2 DEGRADED 17.5M 0 0 replacing-0 UNAVAIL 0 0 0 12595850469584586315 UNAVAIL 0 0 0 was /dev/gptid/bd8e1960-f100-11e8-830c-000c29f0189f gptid/4876745b-3736-11ea-8eba-000c29f0189f ONLINE 0 0 0 gptid/e86f027a-37b2-11ea-8eba-000c29f0189f ONLINE 0 0 0 gptid/be5fc6de-f100-11e8-830c-000c29f0189f ONLINE 0 0 17.5M errors: 4594488 data errors, use '-v' for a list pool: freenas-boot state: ONLINE scan: scrub repaired 0 in 0 days 00:00:56 with 0 errors on Mon Jan 13 03:45:57 2020 config: NAME STATE READ WRITE CKSUM freenas-boot ONLINE 0 0 0 da0p2 ONLINE 0 0 0 root@freenas:~ # glabel status Name Status Components iso9660/FreeNAS N/A cd0 gptid/2b93f4e2-f0f9-11e8-aa6c-000c29f0189f N/A da0p1 gptid/bc00738c-f100-11e8-830c-000c29f0189f N/A da1p2 gptid/bcafc4af-f100-11e8-830c-000c29f0189f N/A da2p2 gptid/ba8e4f65-f100-11e8-830c-000c29f0189f N/A da3p2 gptid/be5fc6de-f100-11e8-830c-000c29f0189f N/A da4p2 gptid/bb325aad-f100-11e8-830c-000c29f0189f N/A da5p2 gptid/4876745b-3736-11ea-8eba-000c29f0189f N/A da6p2 gptid/e86f027a-37b2-11ea-8eba-000c29f0189f N/A da7p2 gptid/bbf7966a-f100-11e8-830c-000c29f0189f N/A da1p1 gptid/485c4b69-3736-11ea-8eba-000c29f0189f N/A da6p1 gptid/e85b1afd-37b2-11ea-8eba-000c29f0189f N/A da7p1
Mirror-2 is in a sorry state and contained the failing hardware. I replaced the 4TB 12595850469584586315 with a 10TB gptid/4876745b-3736-11ea-8eba-000c29f0189f which cause a resilver and I figured I could simply offline the drive and then replace its partner gptid/be5fc6de-f100-11e8-830c-000c29f0189f. However, when trying to offline 12595850469584586315 i receive the following error:
Code:
Error: Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/tastypie/resources.py", line 219, in wrapper response = callback(request, *args, **kwargs) File "./freenasUI/api/resources.py", line 877, in offline_disk notifier().zfs_offline_disk(obj, deserialized.get('label')) File "./freenasUI/middleware/notifier.py", line 1056, in zfs_offline_disk raise MiddlewareError('Disk offline failed: "%s"' % error) freenasUI.middleware.exceptions.MiddlewareError: [MiddlewareError: Disk offline failed: "cannot offline /dev/gptid/bd8e1960-f100-11e8-830c-000c29f0189f: no valid replicas, "]
Now where I really made a stupid mistake was to try to add the second new 10TB drive to replace the original failing drive (don't ask, it was a stupid thought) instead of replacing the gptid/be5fc6de-f100-11e8-830c-000c29f0189f drive that's still in the mirror.
So the current crux of my problem is that when i try to remove any one of the two new drives I added i get the same "no valid replicas" error. How do i go about removing gptid/4876745b-3736-11ea-8eba-000c29f0189f or gptid/e86f027a-37b2-11ea-8eba-000c29f0189f and replace gptid/be5fc6de-f100-11e8-830c-000c29f0189f with one of those two drives?
And lastly, will that let me bring the pool back into a "stable" setting so i can simply clear the corrupted files and move on? No really worried about losing the data as much as I don't want to rebuild the pool and integrations i've built around it.