Unable to boot with data pool drives attached.

StephenW

Cadet
Joined
Feb 9, 2022
Messages
5
I have a repurposed PC that I am running TrueNAS Core 12 on as a media server.
Mobo - Pegatron IPMMB-FM with latest BIOS (2014)
CPU - Core i7 3770
RAM - 4 x 4 GB Mushkin Redline DDR3 non ECC
HBA - IBM M1015 9220-8i flashed with IT firmware
Boot pool - 2 x Kingston SSD 200GB connected to the Mobo SATA ports in a mirror.
Data pool - 6+1 x WD Red Plus CMR 4 TB HDD in two RAIDZ1 vdevs plus one hot spare all connected to the IBM HBA.

So I'll start out by saying I'm an idiot and I replaced a failing drive in one of my vdevs without following the proper procedure because I was in a hurry that day and just plain forgot. When my box didn't reboot I knew immediately what I had done wrong and powered it off and returned the failing drive to the machine. Feel free to point and giggle or send dunce cap emojis.

With the original drive replaced it still wouldn't reboot so I plugged in a monitor and keyboard and it gives me the same error every time.

good
not supported
ZFS can only boot from drive, mirror, raidz1, raidz2, raidz3
better
not supported
done
ZFS found the following pools: Gallifrey boot-pool
UFS found no partition
Consoles: EFI consoles

ZFS can only boot from drive, mirror, raidz1, raidz2, raidz3


The bolded items only appear once, the other lines appear multiple times. I have attached a photo of the screen with the errors below.

If I disconnect the cables from the IBM HBA the system boots. If I plug them back in it won't.

I have gone into BIOS and disabled every device except the SSDs from the boot order. I have entered the Setup menu and used the boot menu to tell it to boot from one or the other of the SSDs. Nothing matters. If the data pool drives are attached the same errors come up.

It seems as if the IBM HBA is somehow able to override the boot order but I'm not sure how or why.

I was prepared to blow away my data pool and start over as I have backups of everything that matters and can re-rip the video collection if need be but I can't boot if the data pool drives are connected.

Short of pulling every drive and putting it in my HDD dock and formatting it I'm not sure what else to try. I am not even convinced that doing that will work.

I would greatly appreciate any insights you may have.

Thank-you for taking the time to read my post.
 

Attachments

  • Error message.jpg
    Error message.jpg
    211.2 KB · Views: 154

StephenW

Cadet
Joined
Feb 9, 2022
Messages
5
So with nothing left to lose I tried a very sketchy solution. I unplugged the data pool drives from the HBA and booted. I logged into the web interface and then plugged the SATA cables into the HBA. My drives appeared. I then had to delete my pool and then add it. My pool reappeared intact. It is busy re-silvering now. Once it has completed I will offline the defective drive and add the replacement. Hopefully once that is done I'll have a working NAS again.
 

StephenW

Cadet
Joined
Feb 9, 2022
Messages
5
So the pool is fully restored to 2 RaidZ1 vdevs and a hot spare. No data was lost. Next is to see what happens when I have to reboot.
 

AJCxZ0

Dabbler
Joined
Mar 11, 2020
Messages
13
So the pool is fully restored to 2 RaidZ1 vdevs and a hot spare. No data was lost. Next is to see what happens when I have to reboot.
That's great news. Thank you for sharing it.

While an update is long overdue to my thread, I was able to get get my datasets and services back online with no data loss after booting with all data drives removed, leaving only the OS drives, then inserting the data drives, importing the zpool, mounting the ZFS and starting the services. Since I strongly suspect that the system remains unbootable (with uncertainly only due a zfs clean step in the process clearing an error), the effort to migrate all data off the system and rebuild it (this time with with a Hybrid pool) was underway when I had to step away for a long while.
 
Last edited:

StephenW

Cadet
Joined
Feb 9, 2022
Messages
5
So after running fine for 72 hours I performed and upgrade to the TrueNAS software and it rebooted fine except that another 4TB WD Red seems to have failed. It is no longer visible in the list of drives but it was there prior to the reboot and seemed to be working fine. One of my two Vdevs is degraded and currently resilvering to the spare drive. It seems a bit like progress. At least it reboots properly now. I have RMA'd the first drive to fail and once I get the replacement back I will replace the one that seems to have just failed. I'll do it properly this time. If that works then the only other thing to fix is that TrueNAS sends me alerts that the first drive to fail is not connected.
 
Top