Uh-Oh Zpool Offline - :(

NickF

Guru
Joined
Jun 12, 2014
Messages
763
So I was on the couch with my wife and we were about to start a movie, time was about 8:00. The movie is hosted on my TrueNAS box and we were watching it on Plex which is also hosted on the same box. Everything was normal, but the movie wouldn't play. I was home all day, and didn't notice any problems or receive any alerts from my NAS.

I came down to my office to take a look and I was getting weird I/O errors when I tried to access files on my SMB share. I signed into TrueNAS and it said my pool was degraded and I had 40,000 some odd errors on one of the drives in one of my mirrors. Okay, no problem, I will eject the bad drive from the pool in the GUI and then go physically replace it with a cold spare. The system froze for several minutes and I couldn't do anything. I could still navigate the SMB share, but the TrueNAS UI was unresponsive. I tried to SSHinto the box, and the CLI was not responding. Then I got an email:
TrueNAS @ prod

New alert:
  • Pool sadness state is DEGRADED: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
    The following devices are not healthy:
    • Disk HUH721010AL4200 7PG33KKR is UNAVAIL
    • Disk HUH721010AL4200 7PG3RYSR is DEGRADED
The following alert has been cleared:
  • Pool sadness state is DEGRADED: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
    The following devices are not healthy:
    • Disk HUH721010AL4200 7PG33KKR is UNAVAIL
    • Disk HUH721010AL4200 7PG27ZZR is DEGRADED
    • Disk HUH721010AL4200 7PG3RYSR is DEGRADED
Current alerts:
  • Failed to check for alert ActiveDirectoryDomainHealth: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/middlewared/plugins/alert.py", line 776, in __run_source alerts = (await alert_source.check()) or [] File "/usr/lib/python3/dist-packages/middlewared/alert/source/active_directory.py", line 46, in check await self.middleware.call("activedirectory.check_nameservers", conf["domainname"], conf["site"]) File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1386, in call return await self._call( File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1335, in _call return await methodobj(*prepared_call.args) File "/usr/lib/python3/dist-packages/middlewared/plugins/activedirectory_/dns.py", line 210, in check_nameservers resp = await self.middleware.call('dnsclient.forward_lookup', { File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1386, in call return await self._call( File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1335, in _call return await methodobj(*prepared_call.args) File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1318, in nf return await func(*args, **kwargs) File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1186, in nf res = await f(*args, **kwargs) File "/usr/lib/python3/dist-packages/middlewared/plugins/dns_client.py", line 108, in forward_lookup results = await asyncio.gather(*[ File "/usr/lib/python3/dist-packages/middlewared/plugins/dns_client.py", line 40, in resolve_name ans = await r.resolve( File "/usr/lib/python3/dist-packages/dns/asyncresolver.py", line 114, in resolve timeout = self._compute_timeout(start, lifetime) File "/usr/lib/python3/dist-packages/dns/resolver.py", line 950, in _compute_timeout raise Timeout(timeout=duration) dns.exception.Timeout: The DNS operation timed out after 12.403959512710571 seconds
  • Pool sadness state is DEGRADED: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
    The following devices are not healthy:
    • Disk HUH721010AL4200 7PG33KKR is UNAVAIL
    • Disk HUH721010AL4200 7PG3RYSR is DEGRADED

Then immediately after I received this email:
ZFS has detected that a device was removed.

impact: Fault tolerance of the pool may be compromised.
eid: 30800
class: statechange
state: UNAVAIL
host: prod
time: 2023-04-17 20:22:34-0400
vpath: /dev/disk/by-partuuid/e2f3d3c3-0033-4300-8c38-a7a56513f145
vguid: 0xF52756F46C368319
pool: sadness (0x8A76BCC157F6D093)

After a few about 7 or 8 minutes of nothing, I told the server to reset via the IPMI.

When the server came back up, the pool was in an errored state. I thought to export the pool and re-import the pool, as I've had some success with that in the past. But when I exported it, the option to import the pool wasn't present. I SSH'd back into the server and tried to manually import the pool:

root@prod[/var/log]# zpool import -a
cannot import 'sadness': no such pool or dataset
Destroy and re-create the pool from

All of the disks are, at this point, showing up in the UI as in the pool called sadness, but exported. That is, with the exception of one, the one I removed earlier:
1681781258596.png


I tried to remove one of the SAS cables from each of the shelves, so that I had 1 SAS cable going from a single HBA to each of the two shelves. That didn't help matters any.

Digging in the logs, I'm not seeing much of anything useful in /var/log/messages before the system restarted at 8:30ish. At 2 AM today, I see some weird events happening:
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: ses 13:0:9:0: Power-on or device reset occurred
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: ses 3:0:14:0: Power-on or device reset occurred
Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:14 prod.fusco.me kernel: ses 3:0:28:0: Power-on or device reset occurred
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:16 prod.fusco.me kernel: ses 3:0:28:0: Power-on or device reset occurred
Apr 17 02:07:17 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:17 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:17 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:17 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:17 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:18 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:20 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:20 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:20 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:20 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:20 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:20 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
Apr 17 02:07:20 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Then I tried to restart a VM that was hosted on this pool that I hadn't noticed had crashed at around 8:17:
Apr 17 20:17:36 prod.fusco.me kernel: br0: port 2(vnet1) entered disabled state
Apr 17 20:17:37 prod.fusco.me kernel: device vnet1 left promiscuous mode
Apr 17 20:17:37 prod.fusco.me kernel: br0: port 2(vnet1) entered disabled state
Apr 17 20:17:37 prod.fusco.me kernel: kauditd_printk_skb: 7 callbacks suppressed
Apr 17 20:17:37 prod.fusco.me kernel: audit: type=1400 audit(1681777057.381:67): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvir>
Apr 17 20:17:38 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name '12_casey'
Apr 17 20:17:39 prod.fusco.me kernel: audit: type=1400 audit(1681777059.949:68): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt->
Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.061:69): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvi>
Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.205:70): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvi>
Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.317:71): apparmor="STATUS" operation="profile_replace" info="same as current profile, s>
Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.489:72): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvi>
Apr 17 20:17:40 prod.fusco.me kernel: br0: port 2(vnet2) entered blocking state
Apr 17 20:17:40 prod.fusco.me kernel: br0: port 2(vnet2) entered disabled state
Apr 17 20:17:40 prod.fusco.me kernel: device vnet2 entered promiscuous mode
Apr 17 20:17:40 prod.fusco.me kernel: br0: port 2(vnet2) entered blocking state
Apr 17 20:17:40 prod.fusco.me kernel: br0: port 2(vnet2) entered listening state
Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.729:73): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvi>
Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.809:74): apparmor="DENIED" operation="capable" profile="libvirtd" pid=16394 comm="rpc-w>
Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.813:75): apparmor="DENIED" operation="capable" profile="libvirtd" pid=16394 comm="rpc-w>
Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.881:76): apparmor="DENIED" operation="capable" profile="libvirtd" pid=16394 comm="rpc-w>
Apr 17 20:17:42 prod.fusco.me kernel: br0: port 2(vnet2) entered disabled state
Apr 17 20:17:42 prod.fusco.me kernel: device vnet2 left promiscuous mode
Apr 17 20:17:42 prod.fusco.me kernel: br0: port 2(vnet2) entered disabled state
Apr 17 20:17:43 prod.fusco.me kernel: audit: type=1400 audit(1681777063.029:77): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvir>
Apr 17 20:17:51 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name '12_casey'
Apr 17 20:17:51 prod.fusco.me kernel: audit: type=1400 audit(1681777071.782:78): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt->
Apr 17 20:17:51 prod.fusco.me kernel: audit: type=1400 audit(1681777071.898:79): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvi>
Apr 17 20:17:52 prod.fusco.me kernel: audit: type=1400 audit(1681777072.010:80): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvi>
Apr 17 20:17:52 prod.fusco.me kernel: audit: type=1400 audit(1681777072.126:81): apparmor="STATUS" operation="profile_replace" info="same as current profile, s>
Apr 17 20:17:52 prod.fusco.me kernel: audit: type=1400 audit(1681777072.294:82): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvi>
Apr 17 20:17:52 prod.fusco.me kernel: br0: port 2(vnet3) entered blocking state
Apr 17 20:17:52 prod.fusco.me kernel: br0: port 2(vnet3) entered disabled state
Apr 17 20:17:52 prod.fusco.me kernel: device vnet3 entered promiscuous mode
Apr 17 20:17:52 prod.fusco.me kernel: br0: port 2(vnet3) entered blocking state
Apr 17 20:17:52 prod.fusco.me kernel: br0: port 2(vnet3) entered listening state
Apr 17 20:17:52 prod.fusco.me kernel: audit: type=1400 audit(1681777072.558:83): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvi>
Apr 17 20:17:52 prod.fusco.me kernel: audit: type=1400 audit(1681777072.890:84): apparmor="DENIED" operation="capable" profile="libvirtd" pid=16394 comm="rpc-w>
Apr 17 20:18:07 prod.fusco.me kernel: br0: port 2(vnet3) entered learning state
Apr 17 20:18:22 prod.fusco.me kernel: br0: port 2(vnet3) entered forwarding state
Apr 17 20:18:22 prod.fusco.me kernel: br0: topology change detected, propagating
Apr 17 20:18:23 prod.fusco.me kernel: br0: port 2(vnet3) entered disabled state
Apr 17 20:18:23 prod.fusco.me kernel: device vnet3 left promiscuous mode
Apr 17 20:18:23 prod.fusco.me kernel: br0: port 2(vnet3) entered disabled state
Apr 17 20:18:24 prod.fusco.me kernel: audit: type=1400 audit(1681777104.122:85): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvir>
Apr 17 20:18:24 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name '12_casey'
Apr 17 20:18:25 prod.fusco.me kernel: audit: type=1400 audit(1681777105.639:86): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt->
Apr 17 20:18:25 prod.fusco.me kernel: audit: type=1400 audit(1681777105.759:87): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvi>
Apr 17 20:18:25 prod.fusco.me kernel: audit: type=1400 audit(1681777105.871:88): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvi>
Apr 17 20:18:26 prod.fusco.me kernel: audit: type=1400 audit(1681777105.991:89): apparmor="STATUS" operation="profile_replace" info="same as current profile, s>
Apr 17 20:18:26 prod.fusco.me kernel: audit: type=1400 audit(1681777106.171:90): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvi>
Apr 17 20:18:26 prod.fusco.me kernel: br0: port 2(vnet4) entered blocking state
Apr 17 20:18:26 prod.fusco.me kernel: br0: port 2(vnet4) entered disabled state
Apr 17 20:18:26 prod.fusco.me kernel: device vnet4 entered promiscuous mode
Apr 17 20:18:26 prod.fusco.me kernel: br0: port 2(vnet4) entered blocking state
Apr 17 20:18:26 prod.fusco.me kernel: br0: port 2(vnet4) entered listening state
Apr 17 20:18:26 prod.fusco.me kernel: audit: type=1400 audit(1681777106.771:94): apparmor="DENIED" operation="capable" profile="libvirtd" pid=16394 comm="rpc-w>
Apr 17 20:18:41 prod.fusco.me kernel: br0: port 2(vnet4) entered learning state
Apr 17 20:18:43 prod.fusco.me kernel: br0: port 2(vnet4) entered disabled state
Apr 17 20:18:43 prod.fusco.me kernel: device vnet4 left promiscuous mode
Apr 17 20:18:43 prod.fusco.me kernel: br0: port 2(vnet4) entered disabled state
Apr 17 20:18:44 prod.fusco.me kernel: audit: type=1400 audit(1681777124.027:95): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvir>
Apr 17 20:18:45 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name '12_casey'
Apr 17 20:18:46 prod.fusco.me kernel: audit: type=1400 audit(1681777126.615:96): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt->
Apr 17 20:18:46 prod.fusco.me kernel: audit: type=1400 audit(1681777126.739:97): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvi>
Apr 17 20:18:46 prod.fusco.me kernel: audit: type=1400 audit(1681777126.843:98): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvi>
Apr 17 20:18:47 prod.fusco.me kernel: audit: type=1400 audit(1681777126.975:99): apparmor="STATUS" operation="profile_replace" info="same as current profile, s>
Apr 17 20:18:47 prod.fusco.me kernel: audit: type=1400 audit(1681777127.143:100): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libv>
Apr 17 20:18:47 prod.fusco.me kernel: br0: port 2(vnet5) entered blocking state
Apr 17 20:18:47 prod.fusco.me kernel: br0: port 2(vnet5) entered disabled state
Apr 17 20:18:47 prod.fusco.me kernel: device vnet5 entered promiscuous mode
Apr 17 20:18:47 prod.fusco.me kernel: br0: port 2(vnet5) entered blocking state
Apr 17 20:18:47 prod.fusco.me kernel: br0: port 2(vnet5) entered listening state
Apr 17 20:18:47 prod.fusco.me kernel: audit: type=1400 audit(1681777127.387:101): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libv>
Apr 17 20:19:02 prod.fusco.me kernel: br0: port 2(vnet5) entered learning state
Apr 17 20:19:09 prod.fusco.me kernel: br0: port 2(vnet5) entered disabled state
Apr 17 20:19:09 prod.fusco.me kernel: device vnet5 left promiscuous mode
Apr 17 20:19:09 prod.fusco.me kernel: br0: port 2(vnet5) entered disabled state
Apr 17 20:19:10 prod.fusco.me kernel: audit: type=1400 audit(1681777150.100:102): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvi>
Apr 17 20:19:12 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name '12_casey'
Apr 17 20:19:12 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name '12_casey'
Apr 17 20:19:13 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name '12_casey'
Apr 17 20:19:14 prod.fusco.me kernel: audit: type=1400 audit(1681777154.436:103): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt>
Apr 17 20:19:14 prod.fusco.me kernel: audit: type=1400 audit(1681777154.568:104): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libv>
Apr 17 20:19:14 prod.fusco.me kernel: audit: type=1400 audit(1681777154.684:105): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libv>
Apr 17 20:19:14 prod.fusco.me kernel: audit: type=1400 audit(1681777154.800:106): apparmor="STATUS" operation="profile_replace" info="same as current profile, >
Apr 17 20:19:15 prod.fusco.me kernel: audit: type=1400 audit(1681777154.976:107): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libv>
Apr 17 20:19:15 prod.fusco.me kernel: br0: port 2(vnet6) entered blocking state
Apr 17 20:19:15 prod.fusco.me kernel: br0: port 2(vnet6) entered disabled state
Apr 17 20:19:15 prod.fusco.me kernel: device vnet6 entered promiscuous mode
Apr 17 20:19:15 prod.fusco.me kernel: br0: port 2(vnet6) entered blocking state
Apr 17 20:19:15 prod.fusco.me kernel: br0: port 2(vnet6) entered listening state
Apr 17 20:19:15 prod.fusco.me kernel: audit: type=1400 audit(1681777155.180:108): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libv>
Apr 17 20:19:30 prod.fusco.me kernel: br0: port 2(vnet6) entered learning state
Apr 17 20:19:45 prod.fusco.me kernel: br0: port 2(vnet6) entered forwarding state
Apr 17 20:19:45 prod.fusco.me kernel: br0: topology change detected, propagating
Apr 17 20:20:02 prod.fusco.me kernel: br0: port 2(vnet6) entered disabled state
Apr 17 20:20:02 prod.fusco.me kernel: device vnet6 left promiscuous mode
Apr 17 20:20:02 prod.fusco.me kernel: br0: port 2(vnet6) entered disabled state
Apr 17 20:20:03 prod.fusco.me kernel: audit: type=1400 audit(1681777203.273:109): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvi>
Apr 17 20:20:09 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name '12_casey'
Apr 17 20:20:10 prod.fusco.me kernel: audit: type=1400 audit(1681777210.958:110): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt>
Apr 17 20:20:11 prod.fusco.me kernel: audit: type=1400 audit(1681777211.078:111): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libv>
Apr 17 20:20:11 prod.fusco.me kernel: audit: type=1400 audit(1681777211.194:112): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libv>
Apr 17 20:20:11 prod.fusco.me kernel: audit: type=1400 audit(1681777211.306:113): apparmor="STATUS" operation="profile_replace" info="same as current profile, >
Apr 17 20:20:11 prod.fusco.me kernel: audit: type=1400 audit(1681777211.478:114): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libv>
Apr 17 20:20:11 prod.fusco.me kernel: br0: port 2(vnet7) entered blocking state
Apr 17 20:20:11 prod.fusco.me kernel: br0: port 2(vnet7) entered disabled state
Apr 17 20:20:11 prod.fusco.me kernel: device vnet7 entered promiscuous mode
Apr 17 20:20:11 prod.fusco.me kernel: br0: port 2(vnet7) entered blocking state
Apr 17 20:20:11 prod.fusco.me kernel: br0: port 2(vnet7) entered listening state
Apr 17 20:20:11 prod.fusco.me kernel: audit: type=1400 audit(1681777211.698:115): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libv>
Apr 17 20:20:12 prod.fusco.me kernel: audit: type=1400 audit(1681777212.038:116): apparmor="DENIED" operation="capable" profile="libvirtd" pid=16394 comm="rpc->
Apr 17 20:20:21 prod.fusco.me kernel: br0: port 2(vnet7) entered disabled state
Apr 17 20:20:21 prod.fusco.me kernel: device vnet7 left promiscuous mode
Apr 17 20:20:21 prod.fusco.me kernel: br0: port 2(vnet7) entered disabled state
Apr 17 20:20:21 prod.fusco.me kernel: audit: type=1400 audit(1681777221.554:117): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvi>
Apr 17 20:22:35 prod.fusco.me kernel: WARNING: Pool 'sadness' has encountered an uncorrectable I/O failure and has been suspended.
At 20:22 is the first sign that there was a problem, and just there after is when I tried to remove the bad disk. But then there's nothing in the log until 20:30, when the server had started coming back up after it froze.

Any help is appreciated :)

Specs of my server are:
NewProd Server | SCALE 22.12 RC1
| Supermicro H12SSL-I | EPYC 7282 | 256GB DDR4-3200 | 2X LSI 9500-8e to 2X EMC 15-Bay Shelf | Intel X710-DA2 | 28x10TB Drives in 2 Way mirrors | 4x Samsung PM9A1 512GB 2-Way Mirrored SPECIAL | 2x Optane 905p 960GB Mirrored

The shelves are connected to the two HBAs in an "X" configuration, if that's helpful.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
When finding yourself in a hole, stop digging. Your clumsy efforts may have rendered your pool unrecoverable.

In your case, it's very likely both your HBAs lost connection with the EMC shelves. Now we want to find out the exact contours of the failure, in a non-destructive manner.

First, what's the output of zpool import sadness by itself?
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Hmmm. So I'm beginning to think the problem is one of the controllers inside of one of the shelves. Since mulitpathd doesn't exist in SCALE (at least not in the home version), I think I'm screwed and the system got all sorts a corrupted.
There are 28 disks in my shelves. Each disk is a mirror of the one in the same slot in the other shelf.

Which looks like this:
1681783030810.png


Right now I have the shelves plugged into the NAS like this:
NAS Side:
1681783190290.png


Shelf Side:
1681783232876.png



When I have one SAS cable going to controller A in each of the disk shelves I get this, which is incorrect. And the UI says it sees 26 unassigned disks. Which is incorrect. In addition to the 28, I should see the 4 SPECIAL metadata devices and 2 L2ARC devices, plus I had inserted another drive as a hot spare (WHICH I FORGOT ABOUT), so I should have a total of 35
1681782770607.png


If I move the cables on the disk shelf side, to go to the other controller, I actually do see 35.
1681783509224.png
1681783600515.png


But the pool still won't import :(
This is why we can't have nice things.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
When finding yourself in a hole, stop digging. Your clumsy efforts may have rendered your pool unrecoverable.

In your case, it's very likely both your HBAs lost connection with the EMC shelves. Now we want to find out the exact contours of the failure, in a non-destructive manner.

First, what's the output of zpool import sadness by itself?
Output is, unfortunately, the same.
root@prod[/var/log]# zpool import sadness
cannot import 'sadness': no such pool or dataset
Destroy and re-create the pool from
a backup source.
root@prod[/var/log]#
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Just to save me some time pulling data back down from off-site, is there a way to mount an unclean zpool so that I can try and recover a subset of things? I know Alan Jude and Wendell fom LevelOneTechs worked together to recover Linus Tech Tips data last year, so I am sure it's possible.

If know one here knows, no big deal. Figured I'd post.
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
I do remember reading about poking some of the zfs tunables to disable some of the data validation, and may importing with -o readonly=on, but unfortunately no details or locations, and I can't see anything in my bookmarks.There are some zdb options, but I guess you are already read that and tried those. Id be trying to verify if I had connectivity to all the disks before trying to import again.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Try shutting down the server and both shelves cleanly. Let them all sit for 60 seconds to allow capacitors to drain, and then power up your shelves and let them stabilize. Then power up your server.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Try shutting down the server and both shelves cleanly. Let them all sit for 60 seconds to allow capacitors to drain, and then power up your shelves and let them stabilize. Then power up your server.
I did try that after my post here last night, no success.
I do remember reading about poking some of the zfs tunables to disable some of the data validation, and may importing with -o readonly=on, but unfortunately no details or locations, and I can't see anything in my bookmarks.There are some zdb options, but I guess you are already read that and tried those. Id be trying to verify if I had connectivity to all the disks before trying to import again.
root@prod[/var/log]# zpool import sadness -o readonly=on
cannot import 'sadness': no such pool or dataset
Destroy and re-create the pool from
a backup source.
root@prod[/var/log]#
Unfortunately no differance.

Here is output from ZDB
zdb -e sadness

Configuration for import:
vdev_children: 21
version: 5000
pool_guid: 9977369563076415635
name: 'sadness'
state: 0
hostid: 2042556907
hostname: 'prod'
vdev_tree:
type: 'root'
id: 0
guid: 9977369563076415635
children[0]:
type: 'mirror'
id: 0
guid: 12254861329070289191
metaslab_array: 183
metaslab_shift: 34
ashift: 12
asize: 9998678884352
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 5086712831611402237
whole_disk: 0
DTL: 4836
create_txg: 4
path: '/dev/disk/by-partuuid/255f91c5-6fd8-4d11-bfe1-bb0b0995bde1'
children[1]:
type: 'disk'
id: 1
guid: 15943223402894770756
whole_disk: 0
DTL: 4834
create_txg: 4
path: '/dev/disk/by-partuuid/2db30682-bb8d-44b4-8279-960e7071ed66'
children[1]:
type: 'mirror'
id: 1
guid: 12066259498466103666
metaslab_array: 182
metaslab_shift: 34
ashift: 12
asize: 9998678360064
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 4550735796668586180
whole_disk: 0
DTL: 4848
create_txg: 4
path: '/dev/disk/by-partuuid/e3fbe854-0307-473e-9f39-37a84d4747d1'
children[1]:
type: 'disk'
id: 1
guid: 6366035862544255253
whole_disk: 0
DTL: 4847
create_txg: 4
path: '/dev/disk/by-partuuid/49e58faf-2b18-43b6-bd50-29ef9c9bc30f'
children[2]:
type: 'mirror'
id: 2
guid: 14985023760802459005
metaslab_array: 181
metaslab_shift: 34
ashift: 12
asize: 9998678360064
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 8946863842827945649
whole_disk: 0
DTL: 4841
create_txg: 4
path: '/dev/disk/by-partuuid/480d7ade-f786-4511-bb76-6e7c0b64ab48'
children[1]:
type: 'disk'
id: 1
guid: 4700051863249598752
whole_disk: 0
DTL: 4840
create_txg: 4
path: '/dev/disk/by-partuuid/a6b0d83f-4413-45af-91bb-f26a27c56165'
children[3]:
type: 'mirror'
id: 3
guid: 4976704069116612581
metaslab_array: 180
metaslab_shift: 34
ashift: 12
asize: 9998678360064
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 15502149421824249219
whole_disk: 0
DTL: 4831
create_txg: 4
path: '/dev/disk/by-partuuid/10ebed85-ab73-472c-b556-c25c14afd966'
children[1]:
type: 'disk'
id: 1
guid: 13866702663057467586
whole_disk: 0
DTL: 4830
create_txg: 4
path: '/dev/disk/by-partuuid/a299b22e-e339-4e48-8c5b-a980a4057237'
children[4]:
type: 'mirror'
id: 4
guid: 13951913235312177868
metaslab_array: 179
metaslab_shift: 34
ashift: 12
asize: 9998678360064
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 5691914841095922244
whole_disk: 0
DTL: 4839
create_txg: 4
path: '/dev/disk/by-partuuid/762e8aa7-1be0-4e75-b297-f53161ecb047'
children[1]:
type: 'disk'
id: 1
guid: 11916929020424420111
whole_disk: 0
DTL: 4837
create_txg: 4
path: '/dev/disk/by-partuuid/a1f3d1eb-55e0-4e4a-8015-20c372c3001a'
children[5]:
type: 'mirror'
id: 5
guid: 15816290078836845158
metaslab_array: 178
metaslab_shift: 34
ashift: 12
asize: 9998678884352
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 1790559324010269292
whole_disk: 0
DTL: 4850
create_txg: 4
path: '/dev/disk/by-partuuid/d2aef666-ff6a-4d4a-9442-cd70f409f43c'
children[1]:
type: 'disk'
id: 1
guid: 6360281643637752359
whole_disk: 0
DTL: 4849
create_txg: 4
path: '/dev/disk/by-partuuid/8d399a3a-7ecc-496f-bfbd-6ae48a2f89ee'
children[6]:
type: 'mirror'
id: 6
guid: 8591766545015033896
metaslab_array: 177
metaslab_shift: 34
ashift: 12
asize: 9998678360064
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 16275786482488365167
whole_disk: 0
DTL: 4833
create_txg: 4
path: '/dev/disk/by-partuuid/4b60c0ba-f4cf-477a-b230-cb8c4e310112'
children[1]:
type: 'disk'
id: 1
guid: 17197248781955331245
whole_disk: 0
DTL: 4832
create_txg: 4
path: '/dev/disk/by-partuuid/749b9f6f-c208-4900-b210-e623146c830f'
children[7]:
type: 'mirror'
id: 7
guid: 7468040530148448787
metaslab_array: 176
metaslab_shift: 34
ashift: 12
asize: 9998678360064
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 16019146947386432102
whole_disk: 0
DTL: 4844
create_txg: 4
path: '/dev/disk/by-partuuid/e44f5a4c-6463-40a2-8042-d0b9dea3a4c5'
children[1]:
type: 'disk'
id: 1
guid: 789684125560567178
whole_disk: 0
DTL: 4842
create_txg: 4
path: '/dev/disk/by-partuuid/ce167dd2-9f11-4bf8-9ccb-e86042d4aa11'
children[8]:
type: 'mirror'
id: 8
guid: 2962212927076889190
metaslab_array: 175
metaslab_shift: 34
ashift: 12
asize: 9998678360064
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 13279394592555200764
whole_disk: 0
DTL: 4846
create_txg: 4
path: '/dev/disk/by-partuuid/ca2fed1e-edd8-4f91-9126-a9a2f667dc34'
children[1]:
type: 'disk'
id: 1
guid: 6383596830989093462
whole_disk: 0
DTL: 4845
create_txg: 4
path: '/dev/disk/by-partuuid/34cfa66f-66c5-4bf1-a084-7d018f18efdd'
children[9]:
type: 'missing'
id: 9
guid: 0
children[10]:
type: 'missing'
id: 10
guid: 0
children[11]:
type: 'mirror'
id: 11
guid: 13730618831942771340
metaslab_array: 1284
metaslab_shift: 32
ashift: 12
asize: 512105381888
is_log: 0
create_txg: 82
children[0]:
type: 'disk'
id: 0
guid: 14731919620377538683
whole_disk: 0
DTL: 4827
create_txg: 82
path: '/dev/disk/by-partuuid/63de864e-c4c4-41c0-b495-1bd1fd723c64'
children[1]:
type: 'disk'
id: 1
guid: 2033729016575030711
whole_disk: 0
DTL: 4826
create_txg: 82
path: '/dev/disk/by-partuuid/be807c16-c6d9-417e-a5b9-19b6af5ec837'
children[12]:
type: 'mirror'
id: 12
guid: 5809140082036246482
metaslab_array: 1419
metaslab_shift: 32
ashift: 12
asize: 512105381888
is_log: 0
create_txg: 92
children[0]:
type: 'disk'
id: 0
guid: 17493500226162294994
whole_disk: 0
DTL: 4829
create_txg: 92
path: '/dev/disk/by-partuuid/cad9688f-85b8-41f2-8f89-5a66c67789a7'
children[1]:
type: 'disk'
id: 1
guid: 6073052702381961461
whole_disk: 0
DTL: 4828
create_txg: 92
path: '/dev/disk/by-partuuid/323a1964-db36-40c2-be4f-93bc2cb24843'
children[13]:
type: 'missing'
id: 13
guid: 0
children[14]:
type: 'missing'
id: 14
guid: 0
children[15]:
type: 'mirror'
id: 15
guid: 15140286079497109367
metaslab_array: 18893
metaslab_shift: 34
ashift: 12
asize: 9998678884352
is_log: 0
create_txg: 758879
children[0]:
type: 'disk'
id: 0
guid: 17831527895260049240
whole_disk: 0
DTL: 73960
create_txg: 758879
path: '/dev/disk/by-partuuid/6d9d3acf-94b3-4819-a78c-1c23b53212a2'
children[1]:
type: 'disk'
id: 1
guid: 4275542926759592415
whole_disk: 0
DTL: 73959
create_txg: 758879
path: '/dev/disk/by-partuuid/b976ef3b-a7eb-4347-9d36-245f738098be'
children[16]:
type: 'mirror'
id: 16
guid: 5296809474692138764
metaslab_array: 19904
metaslab_shift: 34
ashift: 12
asize: 9998678360064
is_log: 0
create_txg: 759189
children[0]:
type: 'disk'
id: 0
guid: 1596451084264006543
whole_disk: 0
DTL: 73963
create_txg: 759189
path: '/dev/disk/by-partuuid/430faa5c-a3f4-44fd-8e99-5db535f146d6'
children[1]:
type: 'disk'
id: 1
guid: 11509495317434492829
whole_disk: 0
DTL: 73961
create_txg: 759189
path: '/dev/disk/by-partuuid/a81914dd-31c0-4e83-a369-cd4568484c42'
children[17]:
type: 'mirror'
id: 17
guid: 10107206358176273262
metaslab_array: 97115
metaslab_shift: 34
ashift: 12
asize: 9998678360064
is_log: 0
create_txg: 1674631
children[0]:
type: 'disk'
id: 0
guid: 16179806153641235865
whole_disk: 0
DTL: 112681
create_txg: 1674631
path: '/dev/disk/by-partuuid/c2640dc1-ecde-4638-8937-169b740b88aa'
children[1]:
type: 'disk'
id: 1
guid: 6519077389205892531
whole_disk: 0
DTL: 112680
create_txg: 1674631
path: '/dev/disk/by-partuuid/379daf79-69ac-4968-9a27-5a6b503bbcc4'
children[18]:
type: 'mirror'
id: 18
guid: 4971576989779035714
metaslab_array: 22189
metaslab_shift: 34
ashift: 12
asize: 9998678360064
is_log: 0
create_txg: 1724558
children[0]:
type: 'disk'
id: 0
guid: 9307995113113626143
whole_disk: 0
DTL: 115427
create_txg: 1724558
path: '/dev/disk/by-partuuid/822db76c-4def-4ead-9a8c-5b1175a49be8'
children[1]:
type: 'disk'
id: 1
guid: 17260282938147507352
whole_disk: 0
DTL: 115426
create_txg: 1724558
path: '/dev/disk/by-partuuid/1dd58ac2-e8b8-4afd-a358-01d5e69bd07e'
children[19]:
type: 'missing'
id: 19
guid: 0
children[20]:
type: 'spare'
id: 20
guid: 6204868430839235276
whole_disk: 0
metaslab_array: 78756
metaslab_shift: 34
ashift: 12
asize: 9998678360064
is_log: 0
create_txg: 1949674
children[0]:
type: 'disk'
id: 0
guid: 17540057732824797402
whole_disk: 0
DTL: 84654
create_txg: 1949674
degraded: 1
aux_state: 'err_exceeded'
path: '/dev/disk/by-partuuid/e1da746c-2b0a-4297-bb4b-a30088cec248'
children[1]:
type: 'disk'
id: 1
guid: 13181857595583535955
whole_disk: 0
is_spare: 1
DTL: 82025
create_txg: 1949674
path: '/dev/disk/by-partuuid/7793a8b5-da95-4d28-893e-fdf468afdc1c'
load-policy:
load-request-txg: 18446744073709551615
load-rewind-policy: 2
zdb: can't open 'sadness': No such file or directory

ZFS_DBGMSG(zdb) START:
spa.c:6107:spa_import(): spa_import: importing sadness
spa_misc.c:418:spa_load_note(): spa_load(sadness, config trusted): LOADING
vdev.c:160:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/480d7ade-f786-4511-bb76-6e7c0b64ab48': best uberblock found for spa sadness. txg 1967384
spa_misc.c:418:spa_load_note(): spa_load(sadness, config untrusted): using uberblock with txg=1967384
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 5086712831611402237: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:14:0/0' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 15943223402894770756: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/0' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 4550735796668586180: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/1' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 6366035862544255253: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:19:0/1' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 8946863842827945649: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/2' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 4700051863249598752: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:19:0/2' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 15502149421824249219: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:19:0/3' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 13866702663057467586: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/3' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 5691914841095922244: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/9' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 11916929020424420111: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/4' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 1790559324010269292: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/5' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 6360281643637752359: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:14:0/5' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 16275786482488365167: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:14:0/6' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 17197248781955331245: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/6' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 16019146947386432102: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/4' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 789684125560567178: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/8' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 13279394592555200764: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/7' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 6383596830989093462: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/7' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 2033729016575030711: vdev_enc_sysfs_path changed from '/sys/bus/pci/slots/0' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 17831527895260049240: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:14:0/9' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 4275542926759592415: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/10' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 1596451084264006543: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/8' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 11509495317434492829: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:14:0/10' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 16179806153641235865: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/11' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 6519077389205892531: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/11' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 9307995113113626143: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/13' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 17260282938147507352: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/13' to '(null)'
spa_misc.c:418:spa_load_note(): spa_load(sadness, config trusted): vdev tree has 1 missing top-level vdevs.
spa_misc.c:418:spa_load_note(): spa_load(sadness, config trusted): current settings allow for maximum 0 missing top-level vdevs at this stage.
spa_misc.c:403:spa_load_failed(): spa_load(sadness, config trusted): FAILED: unable to open vdev tree [error=2]
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: root, guid: 9977369563076415635, path: N/A, can't open
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: mirror, guid: 12254861329070289191, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 5086712831611402237, path: /dev/disk/by-partuuid/255f91c5-6fd8-4d11-bfe1-bb0b0995bde1, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 15943223402894770756, path: /dev/disk/by-partuuid/2db30682-bb8d-44b4-8279-960e7071ed66, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: mirror, guid: 12066259498466103666, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 4550735796668586180, path: /dev/disk/by-partuuid/e3fbe854-0307-473e-9f39-37a84d4747d1, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 6366035862544255253, path: /dev/disk/by-partuuid/49e58faf-2b18-43b6-bd50-29ef9c9bc30f, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 2: mirror, guid: 14985023760802459005, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 8946863842827945649, path: /dev/disk/by-partuuid/480d7ade-f786-4511-bb76-6e7c0b64ab48, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 4700051863249598752, path: /dev/disk/by-partuuid/a6b0d83f-4413-45af-91bb-f26a27c56165, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 3: mirror, guid: 4976704069116612581, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 15502149421824249219, path: /dev/disk/by-partuuid/10ebed85-ab73-472c-b556-c25c14afd966, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 13866702663057467586, path: /dev/disk/by-partuuid/a299b22e-e339-4e48-8c5b-a980a4057237, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 4: mirror, guid: 13951913235312177868, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 5691914841095922244, path: /dev/disk/by-partuuid/762e8aa7-1be0-4e75-b297-f53161ecb047, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 11916929020424420111, path: /dev/disk/by-partuuid/a1f3d1eb-55e0-4e4a-8015-20c372c3001a, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 5: mirror, guid: 15816290078836845158, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 1790559324010269292, path: /dev/disk/by-partuuid/d2aef666-ff6a-4d4a-9442-cd70f409f43c, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 6360281643637752359, path: /dev/disk/by-partuuid/8d399a3a-7ecc-496f-bfbd-6ae48a2f89ee, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 6: mirror, guid: 8591766545015033896, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 16275786482488365167, path: /dev/disk/by-partuuid/4b60c0ba-f4cf-477a-b230-cb8c4e310112, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 17197248781955331245, path: /dev/disk/by-partuuid/749b9f6f-c208-4900-b210-e623146c830f, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 7: mirror, guid: 7468040530148448787, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 16019146947386432102, path: /dev/disk/by-partuuid/e44f5a4c-6463-40a2-8042-d0b9dea3a4c5, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 789684125560567178, path: /dev/disk/by-partuuid/ce167dd2-9f11-4bf8-9ccb-e86042d4aa11, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 8: mirror, guid: 2962212927076889190, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 13279394592555200764, path: /dev/disk/by-partuuid/ca2fed1e-edd8-4f91-9126-a9a2f667dc34, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 6383596830989093462, path: /dev/disk/by-partuuid/34cfa66f-66c5-4bf1-a084-7d018f18efdd, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 9: indirect, guid: 9332782597973287530, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 10: indirect, guid: 4218078770841086833, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 11: mirror, guid: 13730618831942771340, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 14731919620377538683, path: /dev/disk/by-partuuid/63de864e-c4c4-41c0-b495-1bd1fd723c64, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 2033729016575030711, path: /dev/disk/by-partuuid/be807c16-c6d9-417e-a5b9-19b6af5ec837, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 12: mirror, guid: 5809140082036246482, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 17493500226162294994, path: /dev/disk/by-partuuid/cad9688f-85b8-41f2-8f89-5a66c67789a7, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 6073052702381961461, path: /dev/disk/by-partuuid/323a1964-db36-40c2-be4f-93bc2cb24843, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 13: indirect, guid: 235302787419978197, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 14: indirect, guid: 1381446463215791984, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 15: mirror, guid: 15140286079497109367, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 17831527895260049240, path: /dev/disk/by-partuuid/6d9d3acf-94b3-4819-a78c-1c23b53212a2, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 4275542926759592415, path: /dev/disk/by-partuuid/b976ef3b-a7eb-4347-9d36-245f738098be, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 16: mirror, guid: 5296809474692138764, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 1596451084264006543, path: /dev/disk/by-partuuid/430faa5c-a3f4-44fd-8e99-5db535f146d6, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 11509495317434492829, path: /dev/disk/by-partuuid/a81914dd-31c0-4e83-a369-cd4568484c42, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 17: mirror, guid: 10107206358176273262, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 16179806153641235865, path: /dev/disk/by-partuuid/c2640dc1-ecde-4638-8937-169b740b88aa, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 6519077389205892531, path: /dev/disk/by-partuuid/379daf79-69ac-4968-9a27-5a6b503bbcc4, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 18: mirror, guid: 4971576989779035714, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 9307995113113626143, path: /dev/disk/by-partuuid/822db76c-4def-4ead-9a8c-5b1175a49be8, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 17260282938147507352, path: /dev/disk/by-partuuid/1dd58ac2-e8b8-4afd-a358-01d5e69bd07e, healthy
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 19: mirror, guid: 7567707362911221306, path: N/A, can't open
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 10910638645217480881, path: /dev/disk/by-partuuid/9318f0f4-72fd-4ad1-8292-21e8a8d8b82c, can't open
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 17665183671171580697, path: /dev/disk/by-partuuid/e2f3d3c3-0033-4300-8c38-a7a56513f145, can't open
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 20: spare, guid: 6204868430839235276, path: N/A, degraded
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 17540057732824797402, path: /dev/disk/by-partuuid/e1da746c-2b0a-4297-bb4b-a30088cec248, degraded
vdev.c:212:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 13181857595583535955, path: /dev/disk/by-partuuid/7793a8b5-da95-4d28-893e-fdf468afdc1c, healthy
spa_misc.c:418:spa_load_note(): spa_load(sadness, config trusted): UNLOADING
ZFS_DBGMSG(zdb) END
root@prod[/var/log]#
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Playing around some more
For some reason one of the vdevs failed.

```
root@prod[/var/log]# zpool import
pool: sadness
id: 9977369563076415635
state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
config:

sadness UNAVAIL insufficient replicas
mirror-0 ONLINE
255f91c5-6fd8-4d11-bfe1-bb0b0995bde1 ONLINE
2db30682-bb8d-44b4-8279-960e7071ed66 ONLINE
mirror-1 ONLINE
e3fbe854-0307-473e-9f39-37a84d4747d1 ONLINE
49e58faf-2b18-43b6-bd50-29ef9c9bc30f ONLINE
mirror-2 ONLINE
480d7ade-f786-4511-bb76-6e7c0b64ab48 ONLINE
a6b0d83f-4413-45af-91bb-f26a27c56165 ONLINE
mirror-3 ONLINE
10ebed85-ab73-472c-b556-c25c14afd966 ONLINE
a299b22e-e339-4e48-8c5b-a980a4057237 ONLINE
mirror-4 ONLINE
762e8aa7-1be0-4e75-b297-f53161ecb047 ONLINE
a1f3d1eb-55e0-4e4a-8015-20c372c3001a ONLINE
mirror-5 ONLINE
d2aef666-ff6a-4d4a-9442-cd70f409f43c ONLINE
8d399a3a-7ecc-496f-bfbd-6ae48a2f89ee ONLINE
mirror-6 ONLINE
4b60c0ba-f4cf-477a-b230-cb8c4e310112 ONLINE
749b9f6f-c208-4900-b210-e623146c830f ONLINE
mirror-7 ONLINE
e44f5a4c-6463-40a2-8042-d0b9dea3a4c5 ONLINE
ce167dd2-9f11-4bf8-9ccb-e86042d4aa11 ONLINE
mirror-8 ONLINE
ca2fed1e-edd8-4f91-9126-a9a2f667dc34 ONLINE
34cfa66f-66c5-4bf1-a084-7d018f18efdd ONLINE
indirect-9 ONLINE
indirect-10 ONLINE
mirror-11 ONLINE
63de864e-c4c4-41c0-b495-1bd1fd723c64 ONLINE
be807c16-c6d9-417e-a5b9-19b6af5ec837 ONLINE
mirror-12 ONLINE
cad9688f-85b8-41f2-8f89-5a66c67789a7 ONLINE
323a1964-db36-40c2-be4f-93bc2cb24843 ONLINE
indirect-13 ONLINE
indirect-14 ONLINE
mirror-15 ONLINE
6d9d3acf-94b3-4819-a78c-1c23b53212a2 ONLINE
b976ef3b-a7eb-4347-9d36-245f738098be ONLINE
mirror-16 ONLINE
430faa5c-a3f4-44fd-8e99-5db535f146d6 ONLINE
a81914dd-31c0-4e83-a369-cd4568484c42 ONLINE
mirror-17 ONLINE
c2640dc1-ecde-4638-8937-169b740b88aa ONLINE
379daf79-69ac-4968-9a27-5a6b503bbcc4 ONLINE
mirror-18 ONLINE
822db76c-4def-4ead-9a8c-5b1175a49be8 ONLINE
1dd58ac2-e8b8-4afd-a358-01d5e69bd07e ONLINE
mirror-19 UNAVAIL insufficient replicas
9318f0f4-72fd-4ad1-8292-21e8a8d8b82c UNAVAIL
e2f3d3c3-0033-4300-8c38-a7a56513f145 UNAVAIL
spare-20 ONLINE
e1da746c-2b0a-4297-bb4b-a30088cec248 ONLINE
7793a8b5-da95-4d28-893e-fdf468afdc1c ONLINE

```
I don't know what all of the indirect things mean? But I told ZFS to ignore that one data vdev was missing

I ran:
Code:
echo 1 > /sys/module/zfs/parameters/zfs_max_missing_tvds


and was able to mount the pool as readonly

Code:
root@prod[/sys/module/zfs/parameters]# zpool import -o readonly=on sadness


I was able to browse through all of the directories and see files. The VDEV that failed was only recently added so if that data goes bye-bye I don't really care at all.

But I can't mount the pool in non-read only mode.

Code:
root@prod[~]# zpool import -mfR /mnt sadness
cannot import 'sadness': one or more devices is currently unavailable
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
A pool made by 15 vdevs in a 2-way mirror configuration each is statistically quite risky.

Is the pair that's having issues the one where the malfunctioning drive comes from?
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
A pool made by 15 vdevs in a 2-way mirror configuration each is statistically quite risky.

Is the pair that's having issues the one where the malfunctioning drive comes from?
I’m not sure how much more conservative I could have been than have a 50% storage capacity loss for a homelab server. This is in my basement lol

And yes it is, I’m going to try and plug them into the server itself rather than in the shelf. Gotta run out for a few hours
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
I’m not sure how much more conservative I could have been than have a 50% storage capacity loss for a homelab server. This is in my basement lol
I understand that it might look so, but in reality if you lose two drives in a single vdev you lose your pool (and the probability of happening increase with the increase of vdevs). Please look at the graph.
Screenshot_1.png

If mirrors aren't necessary for their performance, I suggest you consider changing your configuration. You would even gain (a lot of) space!

And yes it is, I’m going to try and plug them into the server itself rather than in the shelf. Gotta run out for a few hours
I would suggest you to check if you pulled out the wrong drive (the sane one), since it happens sometimes.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
I understand that it might look so, but in reality if you lose two drives in a single vdev you lose your pool (and the probability of happening increase with the increase of vdevs). Please look at the graph.

If mirrors aren't necessary for their performance, I suggest you consider changing your configuration. You would even gain (a lot of) space!


I would suggest you to check if you pulled out the wrong drive (the sane one), since it happens sometimes.
.002% failure is acceptable for data that’s replaceable for homelab use. Again I’m going through this to save time pulling down from backup, which will take weeks.

I get the point, and you’re not all together wrong, but I’m not running an HRIS or something mission critical here. It’s so I can watch movies. Being able to grow two disks at a time is how I was able to afford to have such a large system.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
.002% failure is acceptable for data that’s replaceable for homelab use. Again I’m going through this to save time pulling down from backup, which will take weeks.

I get the point, and you’re not all together wrong, but I’m not running an HRIS or something mission critical here. It’s so I can watch movies. Being able to grow two disks at a time is how I was able to afford to have such a large system.
Just wanted to make sure you understood the implications and you had considered the alternatives :smile:
Also do note that the values are approximations and that chart is just to get an idea of the trend.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Just wanted to make sure you understood the implications and you had considered the alternatives :smile:
Also do note that the values are approximations and that chart is just to get an idea of the trend.
If I were to change the geometry of my pool, I would almost certainly move to DRAID before adopting a pool layout with traditional raidz. It’s unequivocally a better design and will eventually be the future.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
If I were to change the geometry of my pool, I would almost certainly move to DRAID before adopting a pool layout with traditional raidz. It’s unequivocally a better design and will eventually be the future.
As far as I understand, dRAID is useful only when spares are added to the equation.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763

NickF

Guru
Joined
Jun 12, 2014
Messages
763
I just took the three drives that were affected:

> * Pool sadness state is DEGRADED: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
> The following devices are not healthy:
> * Disk HUH721010AL4200 7PG33KKR is UNAVAIL
> * Disk HUH721010AL4200 7PG27ZZR is DEGRADED
> * Disk HUH721010AL4200 7PG3RYSR is DEGRADED

and plugged them into a 9207 with a sff 8087 to SAS breakout cable

```
root@prod[~]# zpool import
pool: sadness
id: 9977369563076415635
state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
config:

sadness UNAVAIL insufficient replicas
mirror-0 ONLINE
255f91c5-6fd8-4d11-bfe1-bb0b0995bde1 ONLINE
2db30682-bb8d-44b4-8279-960e7071ed66 ONLINE
mirror-1 ONLINE
e3fbe854-0307-473e-9f39-37a84d4747d1 ONLINE
49e58faf-2b18-43b6-bd50-29ef9c9bc30f ONLINE
mirror-2 ONLINE
480d7ade-f786-4511-bb76-6e7c0b64ab48 ONLINE
a6b0d83f-4413-45af-91bb-f26a27c56165 ONLINE
mirror-3 ONLINE
10ebed85-ab73-472c-b556-c25c14afd966 ONLINE
a299b22e-e339-4e48-8c5b-a980a4057237 ONLINE
mirror-4 ONLINE
762e8aa7-1be0-4e75-b297-f53161ecb047 ONLINE
a1f3d1eb-55e0-4e4a-8015-20c372c3001a ONLINE
mirror-5 ONLINE
d2aef666-ff6a-4d4a-9442-cd70f409f43c ONLINE
8d399a3a-7ecc-496f-bfbd-6ae48a2f89ee ONLINE
mirror-6 ONLINE
4b60c0ba-f4cf-477a-b230-cb8c4e310112 ONLINE
749b9f6f-c208-4900-b210-e623146c830f ONLINE
mirror-7 ONLINE
e44f5a4c-6463-40a2-8042-d0b9dea3a4c5 ONLINE
ce167dd2-9f11-4bf8-9ccb-e86042d4aa11 ONLINE
mirror-8 ONLINE
ca2fed1e-edd8-4f91-9126-a9a2f667dc34 ONLINE
34cfa66f-66c5-4bf1-a084-7d018f18efdd ONLINE
indirect-9 ONLINE
indirect-10 ONLINE
mirror-11 ONLINE
63de864e-c4c4-41c0-b495-1bd1fd723c64 ONLINE
be807c16-c6d9-417e-a5b9-19b6af5ec837 ONLINE
mirror-12 ONLINE
cad9688f-85b8-41f2-8f89-5a66c67789a7 ONLINE
323a1964-db36-40c2-be4f-93bc2cb24843 ONLINE
indirect-13 ONLINE
indirect-14 ONLINE
mirror-15 ONLINE
6d9d3acf-94b3-4819-a78c-1c23b53212a2 ONLINE
b976ef3b-a7eb-4347-9d36-245f738098be ONLINE
mirror-16 ONLINE
430faa5c-a3f4-44fd-8e99-5db535f146d6 ONLINE
a81914dd-31c0-4e83-a369-cd4568484c42 ONLINE
mirror-17 ONLINE
c2640dc1-ecde-4638-8937-169b740b88aa ONLINE
379daf79-69ac-4968-9a27-5a6b503bbcc4 ONLINE
mirror-18 ONLINE
822db76c-4def-4ead-9a8c-5b1175a49be8 ONLINE
1dd58ac2-e8b8-4afd-a358-01d5e69bd07e ONLINE
mirror-19 UNAVAIL insufficient replicas
9318f0f4-72fd-4ad1-8292-21e8a8d8b82c UNAVAIL
e2f3d3c3-0033-4300-8c38-a7a56513f145 UNAVAIL
spare-20 ONLINE
e1da746c-2b0a-4297-bb4b-a30088cec248 ONLINE
7793a8b5-da95-4d28-893e-fdf468afdc1c ONLINE

```


Still says unavailable unfortunately.

```
blkid
/dev/sdb2: LABEL="sadness" UUID="9977369563076415635" UUID_SUB="13181857595583535955" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="**7793a8b5-da95-4d28-893e-fdf468afdc1c**"
/dev/sdc2: LABEL="sadness" UUID="9977369563076415635" UUID_SUB="17540057732824797402" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="**e1da746c-2b0a-4297-bb4b-a30088cec248**"
```

Which is different than expected:
```
mirror-19 UNAVAIL insufficient replicas
9318f0f4-72fd-4ad1-8292-21e8a8d8b82c UNAVAIL
e2f3d3c3-0033-4300-8c38-a7a56513f145 UNAVAIL
```

And are both marked as spares:

spare-20 ONLINE
e1da746c-2b0a-4297-bb4b-a30088cec248 ONLINE
7793a8b5-da95-4d28-893e-fdf468afdc1c ONLINE
Only one of the drives was ever marked as a spare…

The other one, which is serial number 7PG27ZZR and also sda is the one I had told the TrueNAS UI to remove.

So it seems, for some reason or another I have a problem whereby two disks were removed and one of them was marked by ZFS as a spare?
 
Last edited:

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Anything is possible when you run in an unsupported configuration (multipath on SCALE).
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
I've basically given up, seems I am not going to get much further. I've mounted the pool as read only and I am copying files over manually, and I will just reconcile the differances between my most recent backup and now and pull that back down from my off-site.

Thanks all for your help.

Anything is possible when you run in an unsupported configuration (multipath on SCALE).
You're not wrong. I didn't know that it wasn't supported until several months after going live with this setup, but it was working fine so I didn't change anything. Perhaps I shouldn't have assumed all would be fine, but whatever it is what it is. That's what labbing is for. Breaking things and fixing them :)
 
Top