d-rock
Cadet
- Joined
- Jul 27, 2015
- Messages
- 5
I've gotten quite a bit of help from people on #freenas the past few days, but I wanted to post this here so that I've documented all steps taken thus far.
First, some history: we have a FreeNAS server that was set up by a former admin here. It's been rock-solid for over a year without issue and we've been very happy with it. Although I was not involved in setting it up, I do know that it was set up with 8 x 4TB WD drives and one 40GB Intel SSD (for ARC, I believe) on a SuperMicro server with an LSI 2308 SAS2 controller. As far as I know, it's a single pool with some form of raidz and compression enabled, and I know it was initially set up as 9.2, later upgraded to 9.3. Unfortunately, the box was set up with a single, unmirrored USB thumb drive for its boot disk.
On Monday, 2015-07-27, at about 7am, the NAS rebooted. We're not sure what caused this, but we know that the NAS failed to boot. The tech at the DC said it had a message about a "data disk" on the console and that it would not boot. I drove down to the DC and confirmed that the message on the console was "This is a FreeNAS data disk and can not boot system". At first I thought that the boot order in the BIOS had maybe gotten screwed up, so I rebooted into the setup screen and confirmed that the USB drive was ahead of the LSI disks. At this point I suspected the thumb drive, as we've had issues with other thumb drives failing in our servers. I have no prior experience with FreeNAS, so I started reading the docs. The first thing I saw was that the install image had a shell where I should be able to run `zpool` and at least ensure that things were working correctly other than the boot disk. I downloaded the image, wrote it to a spare (albeit only 4GB) thumb drive, and booted up from that.
I entered the shell and ran `zpool list`, but nothing showed up. After some discussion on IRC, I tried `zpool import`, `zpool import -f`, and `zpool import -D`, but none of these brought the pool online. Next, I double-checked to make sure that all 8 disks were being seen. I ran `gpart` and confirmed that I had 8 disks + 1 SSD:
This also confirmed that I had a 2GB freebsd-swap partition and 3.7TB freebsd-zfs partition on each of my 8 spinning disks.
After some more discussion on IRC, I tried digging a little deeper and used `zdb` to show ZFS labels for one of the partitions:
At this point I was starting to get worried. I couldn't stay at the DC much longer, so I decided to at least get a new thumb drive boot disk working so that I could get in remotely. I installed 9.3 onto a new 16GB drive, made sure it booted it up, got networking configured and took the old thumb drive with me so that I could try to recover the config if it wasn't too badly damaged.
When I got back home, I was able to SSH in and poke around a little more. I was able to talk to the admin who had set it up, and he's looking to see if by any chance he has a backup config somewhere, but he was surprised it wasn't working. He said that it wasn't using encryption for the drive and that it was raidz, but he couldn't remember the level.
One thing I noticed when I first logged into the web console is that there's a warning: "Firmware version 19 does not match driver version 16 for /dev/mps0"
I tried one more thing to see if I could get more info: I ran `strings -a` on the USB zfs slice, and I see things like "version", "whole_disk", etc, but when I run the same against one of the spinning disk ZFS slices, I see only gibberish.
At this point I'm using ddrescue to try and make an image of the failed USB thumb drive in the hopes that I'll be able to recover it and use that, but otherwise I'm not sure about what my next steps should be. Any help, ideas, etc would be greatly appreciated.
First, some history: we have a FreeNAS server that was set up by a former admin here. It's been rock-solid for over a year without issue and we've been very happy with it. Although I was not involved in setting it up, I do know that it was set up with 8 x 4TB WD drives and one 40GB Intel SSD (for ARC, I believe) on a SuperMicro server with an LSI 2308 SAS2 controller. As far as I know, it's a single pool with some form of raidz and compression enabled, and I know it was initially set up as 9.2, later upgraded to 9.3. Unfortunately, the box was set up with a single, unmirrored USB thumb drive for its boot disk.
On Monday, 2015-07-27, at about 7am, the NAS rebooted. We're not sure what caused this, but we know that the NAS failed to boot. The tech at the DC said it had a message about a "data disk" on the console and that it would not boot. I drove down to the DC and confirmed that the message on the console was "This is a FreeNAS data disk and can not boot system". At first I thought that the boot order in the BIOS had maybe gotten screwed up, so I rebooted into the setup screen and confirmed that the USB drive was ahead of the LSI disks. At this point I suspected the thumb drive, as we've had issues with other thumb drives failing in our servers. I have no prior experience with FreeNAS, so I started reading the docs. The first thing I saw was that the install image had a shell where I should be able to run `zpool` and at least ensure that things were working correctly other than the boot disk. I downloaded the image, wrote it to a spare (albeit only 4GB) thumb drive, and booted up from that.
I entered the shell and ran `zpool list`, but nothing showed up. After some discussion on IRC, I tried `zpool import`, `zpool import -f`, and `zpool import -D`, but none of these brought the pool online. Next, I double-checked to make sure that all 8 disks were being seen. I ran `gpart` and confirmed that I had 8 disks + 1 SSD:
Code:
# gpart list Geom name: da0 modified: false state: OK fwheads: 255 fwsectors: 63 last: 7814037134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: da0p1 Mediasize: 2147483648 (2.0G) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b58199d9-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b label: (null) length: 2147483648 offset: 65536 type: freebsd-swap index: 1 end: 4194431 start: 128 2. Name: da0p2 Mediasize: 3998639460352 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b58e018a-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 3998639460352 offset: 2147549184 type: freebsd-zfs index: 2 end: 7814037127 start: 4194432 Consumers: 1. Name: da0 Mediasize: 4000787030016 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 Geom name: da1 modified: false state: OK fwheads: 255 fwsectors: 63 last: 7814037134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: da1p1 Mediasize: 2147483648 (2.0G) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b65ee41d-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b label: (null) length: 2147483648 offset: 65536 type: freebsd-swap index: 1 end: 4194431 start: 128 2. Name: da1p2 Mediasize: 3998639460352 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b66b4c77-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 3998639460352 offset: 2147549184 type: freebsd-zfs index: 2 end: 7814037127 start: 4194432 Consumers: 1. Name: da1 Mediasize: 4000787030016 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 Geom name: da2 modified: false state: OK fwheads: 255 fwsectors: 63 last: 7814037134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: da2p1 Mediasize: 2147483648 (2.0G) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b734717a-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b label: (null) length: 2147483648 offset: 65536 type: freebsd-swap index: 1 end: 4194431 start: 128 2. Name: da2p2 Mediasize: 3998639460352 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b73e8645-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 3998639460352 offset: 2147549184 type: freebsd-zfs index: 2 end: 7814037127 start: 4194432 Consumers: 1. Name: da2 Mediasize: 4000787030016 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 Geom name: da3 modified: false state: OK fwheads: 255 fwsectors: 63 last: 7814037134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: da3p1 Mediasize: 2147483648 (2.0G) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b87776c9-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b label: (null) length: 2147483648 offset: 65536 type: freebsd-swap index: 1 end: 4194431 start: 128 2. Name: da3p2 Mediasize: 3998639460352 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b884c636-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 3998639460352 offset: 2147549184 type: freebsd-zfs index: 2 end: 7814037127 start: 4194432 Consumers: 1. Name: da3 Mediasize: 4000787030016 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 Geom name: da4 modified: false state: OK fwheads: 255 fwsectors: 63 last: 7814037134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: da4p1 Mediasize: 2147483648 (2.0G) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b94f8984-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b label: (null) length: 2147483648 offset: 65536 type: freebsd-swap index: 1 end: 4194431 start: 128 2. Name: da4p2 Mediasize: 3998639460352 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b95c6bb1-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 3998639460352 offset: 2147549184 type: freebsd-zfs index: 2 end: 7814037127 start: 4194432 Consumers: 1. Name: da4 Mediasize: 4000787030016 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 Geom name: da5 modified: false state: OK fwheads: 255 fwsectors: 63 last: 7814037134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: da5p1 Mediasize: 2147483648 (2.0G) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b2f5e0f8-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b label: (null) length: 2147483648 offset: 65536 type: freebsd-swap index: 1 end: 4194431 start: 128 2. Name: da5p2 Mediasize: 3998639460352 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b3026706-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 3998639460352 offset: 2147549184 type: freebsd-zfs index: 2 end: 7814037127 start: 4194432 Consumers: 1. Name: da5 Mediasize: 4000787030016 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 Geom name: da6 modified: false state: OK fwheads: 255 fwsectors: 63 last: 7814037134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: da6p1 Mediasize: 2147483648 (2.0G) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b3cb2876-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b label: (null) length: 2147483648 offset: 65536 type: freebsd-swap index: 1 end: 4194431 start: 128 2. Name: da6p2 Mediasize: 3998639460352 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b3d79a5e-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 3998639460352 offset: 2147549184 type: freebsd-zfs index: 2 end: 7814037127 start: 4194432 Consumers: 1. Name: da6 Mediasize: 4000787030016 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 Geom name: da7 modified: false state: OK fwheads: 255 fwsectors: 63 last: 7814037134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: da7p1 Mediasize: 2147483648 (2.0G) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b4a3f07b-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b label: (null) length: 2147483648 offset: 65536 type: freebsd-swap index: 1 end: 4194431 start: 128 2. Name: da7p2 Mediasize: 3998639460352 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: b4b64cf7-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 3998639460352 offset: 2147549184 type: freebsd-zfs index: 2 end: 7814037127 start: 4194432 Consumers: 1. Name: da7 Mediasize: 4000787030016 (3.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 Geom name: ada0 modified: false state: OK fwheads: 16 fwsectors: 63 last: 78165326 first: 34 entries: 128 scheme: GPT Providers: 1. Name: ada0p1 Mediasize: 40020578304 (37G) Sectorsize: 512 Stripesize: 0 Stripeoffset: 65536 Mode: r0w0e0 rawuuid: b9b31c96-338a-11e4-89f0-0cc47a3001fa rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 40020578304 offset: 65536 type: freebsd-zfs index: 1 end: 78165319 start: 128 Consumers: 1. Name: ada0 Mediasize: 40020664320 (37G) Sectorsize: 512 Mode: r0w0e0
This also confirmed that I had a 2GB freebsd-swap partition and 3.7TB freebsd-zfs partition on each of my 8 spinning disks.
After some more discussion on IRC, I tried digging a little deeper and used `zdb` to show ZFS labels for one of the partitions:
Code:
# zdb -l /dev/da1p2 -------------------------------------------- LABEL 0 -------------------------------------------- failed to unpack label 0 -------------------------------------------- LABEL 1 -------------------------------------------- failed to unpack label 1 -------------------------------------------- LABEL 2 -------------------------------------------- failed to unpack label 2 -------------------------------------------- LABEL 3 -------------------------------------------- failed to unpack label 3
At this point I was starting to get worried. I couldn't stay at the DC much longer, so I decided to at least get a new thumb drive boot disk working so that I could get in remotely. I installed 9.3 onto a new 16GB drive, made sure it booted it up, got networking configured and took the old thumb drive with me so that I could try to recover the config if it wasn't too badly damaged.
When I got back home, I was able to SSH in and poke around a little more. I was able to talk to the admin who had set it up, and he's looking to see if by any chance he has a backup config somewhere, but he was surprised it wasn't working. He said that it wasn't using encryption for the drive and that it was raidz, but he couldn't remember the level.
One thing I noticed when I first logged into the web console is that there's a warning: "Firmware version 19 does not match driver version 16 for /dev/mps0"
I tried one more thing to see if I could get more info: I ran `strings -a` on the USB zfs slice, and I see things like "version", "whole_disk", etc, but when I run the same against one of the spinning disk ZFS slices, I see only gibberish.
At this point I'm using ddrescue to try and make an image of the failed USB thumb drive in the hopes that I'll be able to recover it and use that, but otherwise I'm not sure about what my next steps should be. Any help, ideas, etc would be greatly appreciated.