Pool offline after troubleshooting

ekimseekem

Cadet
Joined
Oct 21, 2021
Messages
5
Hey all, got tiny desperate situation here. I have a 3-drive zpool that's showing 2 drives are unavailable, here is the zpool import output:

Code:
root@freenas[~]# zpool import
   pool: media_pool
     id: 5423912395067497430
  state: UNAVAIL
status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
    devices and try again.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-3C
 config:

    media_pool                                      UNAVAIL  insufficient replicas
      raidz1-0                                      UNAVAIL  insufficient replicas
        gptid/08647cb0-ca03-11eb-898a-00270e0dda90  UNAVAIL  cannot open
        gptid/0b454021-ca03-11eb-898a-00270e0dda90  ONLINE
        gptid/0b6c426f-ca03-11eb-898a-00270e0dda90  UNAVAIL  cannot open


Some backstory to how I got here. I logged on to my NAS to check status and i noticed one of my pools was degraded. One disk was reporting as being unavailable. I tried to repair or even wipe the drive so that I could add it back to the pool but no luck, i keep getting this error also on the troublesome drive:

Code:
disks.0.identifier: Test cannot be performed for {serial_lunid}ZA10Q8KF_5000c500910e97e9. Unable to retrieve disk details.


I thought something might be wrong with the drive, so I powered down my NAS, removed that drive by matching its serial number and plugged it into my workstation and ran a SMART test which passed and I then wiped it to make sure all sectors were reading/writing. Plugged it back into the NAS and now two drives are unavailable?

Did I just screw up?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Yes you did. When you wiped the drive, you erased the data on it, and ZFS can't find the data structures to join it back. That's what a wipe is. Unfortunately, your pool is lost, as a RAIDZ1 vdev can only tolerate the loss of a single disk, but you've lost 2.
 

ekimseekem

Cadet
Joined
Oct 21, 2021
Messages
5
Only wiped one disk, which I checked as being the faulty disk. It doesn't make sense that by doing that to just one disk would take down the whole array.
Whats causing the error I see regarding disk details? I can run tests on all my other drives except the one I decided to pull.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Unfortunately, you assumed the serial number in the SMART error matched the faulty drive. Since the drives are assigned to the pool by GUID, you needed to run gpart list before pulling any drives to positively identify the GUID of the faulty drive. As it is, you pulled a different drive and wiped it, and now have 2 inaccessible drives in your pool, the original failed drive, and the drive you wiped.
 

ekimseekem

Cadet
Joined
Oct 21, 2021
Messages
5
Here is my output of gpart list for the drives in the pool:
Code:
Geom name: da3
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 15628053134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da3p1
   Mediasize: 209715200 (200M)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   efimedia: HD(1,GPT,ff36e5ba-f359-4b71-85b4-1a50da628f21,0x28,0x64000)
   rawuuid: ff36e5ba-f359-4b71-85b4-1a50da628f21
   rawtype: c12a7328-f81f-11d2-ba4b-00a0c93ec93b
   label: EFI System Partition
   length: 209715200
   offset: 20480
   type: efi
   index: 1
   end: 409639
   start: 40
2. Name: da3p2
   Mediasize: 8001352105984 (7.3T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   efimedia: HD(2,GPT,f6085035-dff7-43c7-ab55-a7b3aa065253,0x64800,0x3a37ae000)
   rawuuid: f6085035-dff7-43c7-ab55-a7b3aa065253
   rawtype: ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
   label: (null)
   length: 8001352105984
   offset: 210763776
   type: ms-basic-data
   index: 2
   end: 15628052479
   start: 411648
Consumers:
1. Name: da3
   Mediasize: 8001563222016 (7.3T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
  
   Geom name: da4
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 15628053127
first: 40
entries: 128
scheme: GPT
Providers:
1. Name: da4p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   efimedia: HD(1,GPT,0ab0f252-ca03-11eb-898a-00270e0dda90,0x80,0x400000)
   rawuuid: 0ab0f252-ca03-11eb-898a-00270e0dda90
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da4p2
   Mediasize: 7999415652352 (7.3T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   efimedia: HD(2,GPT,0b454021-ca03-11eb-898a-00270e0dda90,0x400080,0x3a3412a08)
   rawuuid: 0b454021-ca03-11eb-898a-00270e0dda90
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 7999415652352
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 15628053127
   start: 4194432
Consumers:
1. Name: da4
   Mediasize: 8001563222016 (7.3T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0


da3 does look like the disk i wiped, so you're correct. My attention therefore should be to see if I can rescue data on the first failed disk.

I'm noticing that the disk labelled da5 is not there, which looks like it was my original bad disk? If it is, what would cause it to not mount and be readable? How do I identify it to a physical disk?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Look under Storage->Disks, which should show the serial number of da5. If you're lucky, this may be a simple case of a loose cable or power connection on the drive. If this is the case, your pool should be available again, and you can run the disk replacement procedure for da3 to bring it back into the pool.
 

ekimseekem

Cadet
Joined
Oct 21, 2021
Messages
5
It doesn't for some reason, da5 still has the same serial it had before. Thankfully I'm not hearing any obvious hardware failure from the drives.

I've been checking cables during various troubleshooting efforts, I suppose I can check to see if each drive is loading in BIOS.
 

ekimseekem

Cadet
Joined
Oct 21, 2021
Messages
5
Found the drive, sure enough the 2nd drive in the chassis does show in BIOS and it was a different drive than the one i wiped. This is what I get for trusting a GUI.

I'll try to clone the drive first, see if that works. Any recovery tools would be helpful. Wish me luck.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Members here have had good results with Klennet ZFS Recovery, if the drive cloning doesn't work.
 
Top