Hi all:
I've been trying to recover from this for days now, and read lots of forum posts, but so far no avail...
I have two Supermicro 36-bay servers, running Xeon CPUs and ECC ram, and NAS HDDs (WD Red / Seagate Ironwolf). Systems have been in production for >5 years. One system was recently replaced as preventative maintenance since funds were available. Systems run in an active/backup configuration: the main system (newest) serves up all shares via SMB, and takes snapshots on a schedule, and replicates those snapshots to the backup. The backup is the system I'm focusing on here.
It has 24GB of ram and 16 cores, LSI SAS card in IT mode. It has about 140TB of storage online (a bunch of 12TB disks, some 8's...Arranged as 4 "pools?" of 8 disks, each in raidz2. As I said, it had been running smoothly for over 5 years; hardware was originally obtained used, so I know there's some age here.
On Sunday, it finished a reguarly scheduled scrub, and e-mailed me that all was good. During the week, the snapshots are replicated to this device from the primary one every few hours. I believe the first one went OK. The 2nd one caused the master to generate a "lost connectivity" e-mail. I ipmi'ed into the console and found it kernel panic'ed. I rebooted, and it KP'ed again.
I did a lot of time troubleshooting and researching this. I found some forum posts that talk about booting with vfs.zfs.recover flags set and such, and I've done that. Trying to zfs import with -F didn't work (still KP'ed). I found another post that said -FX, and that KP'ed as well. However, mounting it as read-only worked, no problems. zpool status shows all is good in this case. Of course, scrub doesn't work due to read-only status.
I've tried unplugging the boot drives, and installing a clean install of the latest (I was one or two versions back on the original boot disks), but that also failed the same way (both importing through the gui or the command line options with or without the vfs.zfs flags set).
On this system, there is very little data that needs to be preserved inherently (its a backup system, after all). The only things that were only on it was a linux VM (Byhve) with its disk in a zvol (it is a unifi video server -- camera NVR, and honestly, the only thing I need out of it is a config backup), and I'd like to recover my amanda jail. I copied the amanda jail off to another system, but the resulting disk usage was 2x what it is on the freenas system. With the zvol, I haven't figured out how to mount the volume and disk inside to grab the file. I tried zfs sending it to the master, but after running for about 18 hours, the receive end was already over twice the size of the original (10.3TB), and was still going.
So, currently my questions are:
1) Why can I import the pool read-only with no problems and everything showing good health, but as soon as I try and mount it read-write, the kernel panics?
2) Is there a way to get into the bhyve disk image from the zvol? Or why is the zfs send over 2x the size of the original?
3) any suggestions on recovery? Recopying the snapshots is doable, but will be a pain...even at 10Gbps networking (what it has), it still takes about a week typically...
Thanks!
I've been trying to recover from this for days now, and read lots of forum posts, but so far no avail...
I have two Supermicro 36-bay servers, running Xeon CPUs and ECC ram, and NAS HDDs (WD Red / Seagate Ironwolf). Systems have been in production for >5 years. One system was recently replaced as preventative maintenance since funds were available. Systems run in an active/backup configuration: the main system (newest) serves up all shares via SMB, and takes snapshots on a schedule, and replicates those snapshots to the backup. The backup is the system I'm focusing on here.
It has 24GB of ram and 16 cores, LSI SAS card in IT mode. It has about 140TB of storage online (a bunch of 12TB disks, some 8's...Arranged as 4 "pools?" of 8 disks, each in raidz2. As I said, it had been running smoothly for over 5 years; hardware was originally obtained used, so I know there's some age here.
On Sunday, it finished a reguarly scheduled scrub, and e-mailed me that all was good. During the week, the snapshots are replicated to this device from the primary one every few hours. I believe the first one went OK. The 2nd one caused the master to generate a "lost connectivity" e-mail. I ipmi'ed into the console and found it kernel panic'ed. I rebooted, and it KP'ed again.
I did a lot of time troubleshooting and researching this. I found some forum posts that talk about booting with vfs.zfs.recover flags set and such, and I've done that. Trying to zfs import with -F didn't work (still KP'ed). I found another post that said -FX, and that KP'ed as well. However, mounting it as read-only worked, no problems. zpool status shows all is good in this case. Of course, scrub doesn't work due to read-only status.
I've tried unplugging the boot drives, and installing a clean install of the latest (I was one or two versions back on the original boot disks), but that also failed the same way (both importing through the gui or the command line options with or without the vfs.zfs flags set).
On this system, there is very little data that needs to be preserved inherently (its a backup system, after all). The only things that were only on it was a linux VM (Byhve) with its disk in a zvol (it is a unifi video server -- camera NVR, and honestly, the only thing I need out of it is a config backup), and I'd like to recover my amanda jail. I copied the amanda jail off to another system, but the resulting disk usage was 2x what it is on the freenas system. With the zvol, I haven't figured out how to mount the volume and disk inside to grab the file. I tried zfs sending it to the master, but after running for about 18 hours, the receive end was already over twice the size of the original (10.3TB), and was still going.
So, currently my questions are:
1) Why can I import the pool read-only with no problems and everything showing good health, but as soon as I try and mount it read-write, the kernel panics?
2) Is there a way to get into the bhyve disk image from the zvol? Or why is the zfs send over 2x the size of the original?
3) any suggestions on recovery? Recopying the snapshots is doable, but will be a pain...even at 10Gbps networking (what it has), it still takes about a week typically...
Thanks!