(When the time comes) Hook it up to a laptop (USB to SAS/SATA ~ 15$) running ubuntu (or whatever) live cd, run gparted, erase partition table (don't create new gpt).Yes. What method would you use for erasing the drive? Just dd zeroes over the GPT table? Full erasure would take days, again probably.
That's why you want 3-2-1: 3 copies of each file in 2 different systems, 1 of which offsite.Yes, I would. If I had some spare storage :(
Aside from this I think there has to be a solution.. like clearing all the ZFS database, as it obviously has some permanent meta data stored, which lead o the pool always being imported with some errors, until I did it with manual parameter override, as described in post 43.
So I see a good chance, that the existing pool would run fine, if I imported it into another host system.
Do a clean installation then.So I see a good chance, that the existing pool would run fine, if I imported it into another host system.
Likely, some drives are so strained by the resilver that they have to go offline, or at least become unresponsive, while they work out committing all writes in SMR layout; so the drives get out of sync with the rest of the array and ZFS has to resilver again. And again. And again.The resilver has run though - and sadly has started over again.
Could this happen without an single reported error?so the drives get out of sync with the rest of the array and ZFS has to resilver again. And again. And again.
Courtesy of SMR drives.Could this happen without an single reported error?
Good point. You might have to look at the OS logs and look for disk timeouts.Hello!
Could this happen without an single reported error?
Well, I had this idea of 'too slow' drives, too - but I cannot see how there would be anything that intentionally lead to the resilver process repeating without reporting any problem. Because IMHO such behaviour would clearly qualify as a bug...
If this SMR too busy is the issue, their may be a way to work around it. Newer OpenZFS includes a way to throttle re-silvers, slowing down re-syncs. This might allow ZFS to reduce the speed on which it's attempting to write, allowing the SMR drive to "catch up". I don't have the details of this tunable, perhaps this might help;
..
root@truenas[~]# zpool status z2pool
pool: z2pool
state: ONLINE
scan: resilvered 5.54T in 6 days 14:01:50 with 0 errors on Sat Aug 27 00:43:41 2022
config:
NAME STATE READ WRITE CKSUM
z2pool ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/2affd65a-0bc6-11ed-bd90-000c29077bb3 ONLINE 0 0 0
gptid/10046937-6b06-11eb-9b87-000c29077bb3 ONLINE 0 0 0
gptid/aa7c274d-0bc3-11ed-bd90-000c29077bb3 ONLINE 0 0 0
gptid/9ce5247f-331b-3148-9b38-c958d2bd057a ONLINE 0 0 0
errors: No known data errors
root@truenas[~]#
root@truenas[~]# gpart show /dev/da1
=> 40 15628053088 da1 GPT (7.3T)
40 88 - free - (44K)
128 4194304 1 freebsd-swap (2.0G)
4194432 15623858696 2 freebsd-zfs (7.3T)
root@truenas[~]# gpart show /dev/da2
=> 40 15628053088 da2 GPT (7.3T)
40 88 - free - (44K)
128 4194304 1 freebsd-swap (2.0G)
4194432 15623858696 2 freebsd-zfs (7.3T)
root@truenas[~]# gpart show /dev/da3
=> 40 15628053088 da3 GPT (7.3T)
40 88 - free - (44K)
128 4194304 1 freebsd-swap (2.0G)
4194432 15623858696 2 freebsd-zfs (7.3T)
root@truenas[~]# gpart show /dev/da4
=> 34 15628053101 da4 GPT (7.3T)
34 2014 - free - (1.0M)
2048 15628034048 1 !6a898cc3-1dd2-11b2-99a6-080020736631 (7.3T)
15628036096 16384 9 !6a945a3b-1dd2-11b2-99a6-080020736631 (8.0M)
15628052480 655 - free - (328K)