Long story short, I have a partially corrupt pool that was cause by some gross negligence on my part due to lack of alerting and a busy few months. Once i identified the failure i picked up some new (larger) drives and planned on upgrading the failing mirror and then simply deleting the corrupted files once that was completed. This is where i ran into some weird issues.
First, output of zpool status:
Mirror-2 is in a sorry state and contained the failing hardware. I replaced the 4TB 12595850469584586315 with a 10TB gptid/4876745b-3736-11ea-8eba-000c29f0189f which cause a resilver and I figured I could simply offline the drive and then replace its partner gptid/be5fc6de-f100-11e8-830c-000c29f0189f. However, when trying to offline 12595850469584586315 i receive the following error:
Now where I really made a stupid mistake was to try to add the second new 10TB drive to replace the original failing drive (don't ask, it was a stupid thought) instead of replacing the gptid/be5fc6de-f100-11e8-830c-000c29f0189f drive that's still in the mirror.
So the current crux of my problem is that when i try to remove any one of the two new drives I added i get the same "no valid replicas" error. How do i go about removing gptid/4876745b-3736-11ea-8eba-000c29f0189f or gptid/e86f027a-37b2-11ea-8eba-000c29f0189f and replace gptid/be5fc6de-f100-11e8-830c-000c29f0189f with one of those two drives?
And lastly, will that let me bring the pool back into a "stable" setting so i can simply clear the corrupted files and move on? No really worried about losing the data as much as I don't want to rebuild the pool and integrations i've built around it.
First, output of zpool status:
Code:
root@freenas:~ # zpool status
pool: StoragePool001
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: resilvered 6.07T in 0 days 13:24:10 with 4594493 errors on Wed Jan 15 23:44:26 2020
config:
NAME STATE READ WRITE CKSUM
StoragePool001 DEGRADED 8.76M 0 0
mirror-0 ONLINE 0 0 0
gptid/ba8e4f65-f100-11e8-830c-000c29f0189f ONLINE 0 0 0
gptid/bb325aad-f100-11e8-830c-000c29f0189f ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
gptid/bc00738c-f100-11e8-830c-000c29f0189f ONLINE 0 0 0
gptid/bcafc4af-f100-11e8-830c-000c29f0189f ONLINE 0 0 0
mirror-2 DEGRADED 17.5M 0 0
replacing-0 UNAVAIL 0 0 0
12595850469584586315 UNAVAIL 0 0 0 was /dev/gptid/bd8e1960-f100-11e8-830c-000c29f0189f
gptid/4876745b-3736-11ea-8eba-000c29f0189f ONLINE 0 0 0
gptid/e86f027a-37b2-11ea-8eba-000c29f0189f ONLINE 0 0 0
gptid/be5fc6de-f100-11e8-830c-000c29f0189f ONLINE 0 0 17.5M
errors: 4594488 data errors, use '-v' for a list
pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0 days 00:00:56 with 0 errors on Mon Jan 13 03:45:57 2020
config:
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
da0p2 ONLINE 0 0 0
root@freenas:~ # glabel status
Name Status Components
iso9660/FreeNAS N/A cd0
gptid/2b93f4e2-f0f9-11e8-aa6c-000c29f0189f N/A da0p1
gptid/bc00738c-f100-11e8-830c-000c29f0189f N/A da1p2
gptid/bcafc4af-f100-11e8-830c-000c29f0189f N/A da2p2
gptid/ba8e4f65-f100-11e8-830c-000c29f0189f N/A da3p2
gptid/be5fc6de-f100-11e8-830c-000c29f0189f N/A da4p2
gptid/bb325aad-f100-11e8-830c-000c29f0189f N/A da5p2
gptid/4876745b-3736-11ea-8eba-000c29f0189f N/A da6p2
gptid/e86f027a-37b2-11ea-8eba-000c29f0189f N/A da7p2
gptid/bbf7966a-f100-11e8-830c-000c29f0189f N/A da1p1
gptid/485c4b69-3736-11ea-8eba-000c29f0189f N/A da6p1
gptid/e85b1afd-37b2-11ea-8eba-000c29f0189f N/A da7p1
Mirror-2 is in a sorry state and contained the failing hardware. I replaced the 4TB 12595850469584586315 with a 10TB gptid/4876745b-3736-11ea-8eba-000c29f0189f which cause a resilver and I figured I could simply offline the drive and then replace its partner gptid/be5fc6de-f100-11e8-830c-000c29f0189f. However, when trying to offline 12595850469584586315 i receive the following error:
Code:
Error: Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/tastypie/resources.py", line 219, in wrapper
response = callback(request, *args, **kwargs)
File "./freenasUI/api/resources.py", line 877, in offline_disk
notifier().zfs_offline_disk(obj, deserialized.get('label'))
File "./freenasUI/middleware/notifier.py", line 1056, in zfs_offline_disk
raise MiddlewareError('Disk offline failed: "%s"' % error)
freenasUI.middleware.exceptions.MiddlewareError: [MiddlewareError: Disk offline failed: "cannot offline /dev/gptid/bd8e1960-f100-11e8-830c-000c29f0189f: no valid replicas, "]
Now where I really made a stupid mistake was to try to add the second new 10TB drive to replace the original failing drive (don't ask, it was a stupid thought) instead of replacing the gptid/be5fc6de-f100-11e8-830c-000c29f0189f drive that's still in the mirror.
So the current crux of my problem is that when i try to remove any one of the two new drives I added i get the same "no valid replicas" error. How do i go about removing gptid/4876745b-3736-11ea-8eba-000c29f0189f or gptid/e86f027a-37b2-11ea-8eba-000c29f0189f and replace gptid/be5fc6de-f100-11e8-830c-000c29f0189f with one of those two drives?
And lastly, will that let me bring the pool back into a "stable" setting so i can simply clear the corrupted files and move on? No really worried about losing the data as much as I don't want to rebuild the pool and integrations i've built around it.