Lost pool after USB boot drive failed

Status
Not open for further replies.

d-rock

Cadet
Joined
Jul 27, 2015
Messages
5
I've gotten quite a bit of help from people on #freenas the past few days, but I wanted to post this here so that I've documented all steps taken thus far.

First, some history: we have a FreeNAS server that was set up by a former admin here. It's been rock-solid for over a year without issue and we've been very happy with it. Although I was not involved in setting it up, I do know that it was set up with 8 x 4TB WD drives and one 40GB Intel SSD (for ARC, I believe) on a SuperMicro server with an LSI 2308 SAS2 controller. As far as I know, it's a single pool with some form of raidz and compression enabled, and I know it was initially set up as 9.2, later upgraded to 9.3. Unfortunately, the box was set up with a single, unmirrored USB thumb drive for its boot disk.

On Monday, 2015-07-27, at about 7am, the NAS rebooted. We're not sure what caused this, but we know that the NAS failed to boot. The tech at the DC said it had a message about a "data disk" on the console and that it would not boot. I drove down to the DC and confirmed that the message on the console was "This is a FreeNAS data disk and can not boot system". At first I thought that the boot order in the BIOS had maybe gotten screwed up, so I rebooted into the setup screen and confirmed that the USB drive was ahead of the LSI disks. At this point I suspected the thumb drive, as we've had issues with other thumb drives failing in our servers. I have no prior experience with FreeNAS, so I started reading the docs. The first thing I saw was that the install image had a shell where I should be able to run `zpool` and at least ensure that things were working correctly other than the boot disk. I downloaded the image, wrote it to a spare (albeit only 4GB) thumb drive, and booted up from that.

I entered the shell and ran `zpool list`, but nothing showed up. After some discussion on IRC, I tried `zpool import`, `zpool import -f`, and `zpool import -D`, but none of these brought the pool online. Next, I double-checked to make sure that all 8 disks were being seen. I ran `gpart` and confirmed that I had 8 disks + 1 SSD:

Code:
# gpart list
Geom name: da0
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 7814037134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da0p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b58199d9-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da0p2
   Mediasize: 3998639460352 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b58e018a-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 3998639460352
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 7814037127
   start: 4194432
Consumers:
1. Name: da0
   Mediasize: 4000787030016 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0

Geom name: da1
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 7814037134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da1p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b65ee41d-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da1p2
   Mediasize: 3998639460352 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b66b4c77-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 3998639460352
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 7814037127
   start: 4194432
Consumers:
1. Name: da1
   Mediasize: 4000787030016 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0

Geom name: da2
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 7814037134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da2p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b734717a-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da2p2
   Mediasize: 3998639460352 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b73e8645-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 3998639460352
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 7814037127
   start: 4194432
Consumers:
1. Name: da2
   Mediasize: 4000787030016 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0

Geom name: da3
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 7814037134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da3p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b87776c9-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da3p2
   Mediasize: 3998639460352 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b884c636-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 3998639460352
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 7814037127
   start: 4194432
Consumers:
1. Name: da3
   Mediasize: 4000787030016 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0

Geom name: da4
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 7814037134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da4p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b94f8984-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da4p2
   Mediasize: 3998639460352 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b95c6bb1-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 3998639460352
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 7814037127
   start: 4194432
Consumers:
1. Name: da4
   Mediasize: 4000787030016 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0

Geom name: da5
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 7814037134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da5p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b2f5e0f8-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da5p2
   Mediasize: 3998639460352 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b3026706-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 3998639460352
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 7814037127
   start: 4194432
Consumers:
1. Name: da5
   Mediasize: 4000787030016 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0

Geom name: da6
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 7814037134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da6p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b3cb2876-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da6p2
   Mediasize: 3998639460352 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b3d79a5e-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 3998639460352
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 7814037127
   start: 4194432
Consumers:
1. Name: da6
   Mediasize: 4000787030016 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0

Geom name: da7
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 7814037134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da7p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b4a3f07b-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da7p2
   Mediasize: 3998639460352 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: b4b64cf7-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 3998639460352
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 7814037127
   start: 4194432
Consumers:
1. Name: da7
   Mediasize: 4000787030016 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0

Geom name: ada0
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 78165326
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: ada0p1
   Mediasize: 40020578304 (37G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 65536
   Mode: r0w0e0
   rawuuid: b9b31c96-338a-11e4-89f0-0cc47a3001fa
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 40020578304
   offset: 65536
   type: freebsd-zfs
   index: 1
   end: 78165319
   start: 128
Consumers:
1. Name: ada0
   Mediasize: 40020664320 (37G)
   Sectorsize: 512
   Mode: r0w0e0


This also confirmed that I had a 2GB freebsd-swap partition and 3.7TB freebsd-zfs partition on each of my 8 spinning disks.

After some more discussion on IRC, I tried digging a little deeper and used `zdb` to show ZFS labels for one of the partitions:

Code:
# zdb -l /dev/da1p2
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3


At this point I was starting to get worried. I couldn't stay at the DC much longer, so I decided to at least get a new thumb drive boot disk working so that I could get in remotely. I installed 9.3 onto a new 16GB drive, made sure it booted it up, got networking configured and took the old thumb drive with me so that I could try to recover the config if it wasn't too badly damaged.

When I got back home, I was able to SSH in and poke around a little more. I was able to talk to the admin who had set it up, and he's looking to see if by any chance he has a backup config somewhere, but he was surprised it wasn't working. He said that it wasn't using encryption for the drive and that it was raidz, but he couldn't remember the level.

One thing I noticed when I first logged into the web console is that there's a warning: "Firmware version 19 does not match driver version 16 for /dev/mps0"

I tried one more thing to see if I could get more info: I ran `strings -a` on the USB zfs slice, and I see things like "version", "whole_disk", etc, but when I run the same against one of the spinning disk ZFS slices, I see only gibberish.

At this point I'm using ddrescue to try and make an image of the failed USB thumb drive in the hopes that I'll be able to recover it and use that, but otherwise I'm not sure about what my next steps should be. Any help, ideas, etc would be greatly appreciated.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Firmware version 19 does not match driver version 16 for /dev/mps0
You absolutely need to take care of this, which means flashing the RAID card to version 16 IT mode firmware. Can't say for sure that's the 1st thing you should do, but it is essential.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Did you try the auto import button in the freenas gui? you should have done this first before you ever touched the CLI. What you did on the CLI might have broken things. Not sure who you talked to in irc but sounds like you or them got extreme really fast and shouldn't have. If auto import doesn't work what does zpool import tell you?

Also you didn't lose your pool because you usb died. You lost it because of something else and that random reboot probably had something to do with it.
 

d-rock

Cadet
Joined
Jul 27, 2015
Messages
5
Sorry, yes, we tried the auto-import first and it found nothing, which is why people on IRC then said to try the 'zpool import' commands from the CLI. I had forgotten about that but it was definitely the first thing I did when I got the unit booted up. I logged into the GUI, went to Storage ⇒ Volumes ⇒ View Volumes and saw nothing, which is when I jumped on IRC. At that point I ran `zpool import` with no other arguments and got no output. I then ran `zpool list` and similarly got nothing.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Seems to me you need to get that box out of the DC and onsite where you can work on it. One approach is to build or acquire another box that meets the FreeNAS hardware requirements and try to mount the pool in that. But there's so much that could be wrong here. My guess is the failed USB stick and missing pool are symptoms of another problem rather than having a causal relationship.

To get help from more knowledgable members, your first step needs to be to post full hardware specs, and probably a debug file too, if you can get one (refer to the forum rules for what's expected).
 

d-rock

Cadet
Joined
Jul 27, 2015
Messages
5
...What you did on the CLI might have broken things.

Which of the operations that I did would be potentially destructive, out of curiosity? I thought that the "zpool import" was safe, and that "zdb -l" should similarly be a read-only op. Is taht wrong?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Which of the operations that I did would be potentially destructive, out of curiosity? I thought that the "zpool import" was safe, and that "zdb -l" should similarly be a read-only op. Is taht wrong?
importing with the force flag can cause problems and should be used as a last ditch effort to get the pool mounted so you can save your data off it before you destroy it.
 

d-rock

Cadet
Joined
Jul 27, 2015
Messages
5
OK, I wish someone had pointed that out. In any case, after actually digging through the blocks on disk with hexdump, it turns out that despite what the person who set this up said, the drives are GELI-encrypted. Now I just need to get those unlocked and I suspect things will be good again. Thanks for your help!
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
If your drive are encrypted doesn't freenas tell you this when you try to import them and prompt you for a password? Are you sure you didn't just see the encryption for the swap space.
 

sfcredfox

Patron
Joined
Aug 26, 2014
Messages
340
Not sure if this will be remotely helpful, but the comment about using CLI before the GUI is true. I was messing around renaming pools and got something out of sequence. The GUI did not show the pool and I could not be accessed (not mounted), but it did show up when I did a zpool status. I had to reboot, do a zpool export, and then another reboot. After that, I was able to do the GUI import.

I guess since you are not showing the pool with zpool list, this doesn't apply, but it might be worth a try. I guess you'll know if you try doing a zpool export with the pool name and it tells you there is no pool with that name, it didn't even get that far. Maybe that will just confirm what you already know.
 
Status
Not open for further replies.
Top