Boot fails with "ZFS: can only boot from disks, mirror, raidz1, raidz2 and raidz3 vdevs" after BOOTX64.EFI

AJCxZ0

Dabbler
Joined
Mar 11, 2020
Messages
13
24 identical spinning disks. TrueNAS 12.0-U5.1 updated from previous 12.0 Update installed on da0 and da1. Two 10 disk RAIDZ1 vdevs with two hot spares. One data pool, Pacific, across both vdevs. Booted several times without incident.

After a power outage, server fails to boot with console showing,

Code:
not supported
ZFS: can only boot from disks, mirror, raidz1, raidz2 and raidz3 vdevs
not supported
(repaeated)
 done
    ZFS found the following pools: boot-pool Pacific
    UFS found no partitions
Consoles: EFI console
ZFS: can only boot from disks, mirror, raidz1, raidz2 and raidz3 vdevs
(repeated 10 times)
\


where that final "\" is sometimes "-" and is not spinning. Screenshots of console and video below (though messages are sometimes only captured in a single frame).

This issue appear to be very similar to unresolved posts Unable to boot with data pool drives attached. and
TrueNAS Won't boot with Pool - UFS found no partitions.

Trying to boot from TrueNAS 12.0-U8 "ISO" on a USB flash drive* and moving all disks to another server which boots from an ada0+ada1 mirror results in exactly the same boot, even when limiting boot in the "BIOS" to only the USB flash drive or the mirror. TrueNAS boot fine from the USB flash drive to the menu when the disks are are removed. Note that this is before the TrueNAS splash screen and kernel, during BOOTX64.EFI.

Since the previous boot, ZFS errors caused da19 to be removed from the pool and replaced with hot spare da12 over weeks ago. The resilver finished. A Long S.M.A.R.T test of da19 showed no errors. Pool remained marked DEGRADED. I did not try clearing the errors.

Code:
# zpool status
  pool: Pacific
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: resilvered 6.71T in 2 days 22:36:43 with 0 errors on Fri Feb  4 05:56:32 2022
config:

    NAME                                              STATE     READ WRITE CKSUM
    Pacific                                           DEGRADED     0     0     0
      raidz1-0                                        ONLINE       0     0     0
        gptid/e6e10598-d846-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/e6fb145a-d846-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/e74196fb-d846-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/e7b73861-d846-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/e6c726b0-d846-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/e80cb298-d846-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/e84ef3f3-d846-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/e8c5ce48-d846-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/e8b83d4c-d846-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/e8fe9a5f-d846-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
      raidz1-1                                        DEGRADED     0     0     0
        spare-0                                       DEGRADED     0     0   292
          gptid/16a5238d-d847-11eb-ae3e-ac1f6b0a390a  DEGRADED     0     0   243  too many errors
          gptid/f805643f-d847-11eb-ae3e-ac1f6b0a390a  ONLINE       0     0     0
        gptid/1627b212-d847-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/16b8b495-d847-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/1737a1d1-d847-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/17cf6bb6-d847-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/17f006fa-d847-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/185ea22f-d847-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/18b2fcdb-d847-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/19340582-d847-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
        gptid/1901ad03-d847-11eb-ae3e-ac1f6b0a390a    ONLINE       0     0     0
    spares
      gptid/f805643f-d847-11eb-ae3e-ac1f6b0a390a      INUSE     currently in use
      gptid/f820d5f6-d847-11eb-ae3e-ac1f6b0a390a      AVAIL

errors: No known data errors


Long S.M.A.R.T test result for da19:

Code:
ID  Description       Status
1   Extended offline  SUCCESS
Remaining: 0
Lifetime: 26682
Error: N/A


Firmware diagnostics show no problems with any hardware.

Assistance getting this back up and running would be most appreciated. I'm happy to provide more details and test, however changes to hardware require a site visit.



vlcsnap-2022-02-20-14h41m41s106.png

vlcsnap-2022-02-20-14h42m17s825.png

vlcsnap-2022-02-20-14h44m12s517.png

vlcsnap-2022-02-20-14h45m37s172.png
vlcsnap-2022-02-20-14h46m59s005.png

vlcsnap-2022-02-20-14h50m17s800.png

 

AJCxZ0

Dabbler
Joined
Mar 11, 2020
Messages
13
This may be obvious, but did you double check your boot device in the bios?

On some old systems with early UEFI support, a device might be listed twice in the bios: once for legacy boot and a second time for [UEFI] boot. Be certain you have selected the correct one. If you don't know what to select, I would remove all the data disks and then check the bios to see what device is identified as the boot device.
Copying my reply in Musetech2021's thread.

I did. The BIOS is fairly up-to-date and was set to just UEFI mode. I tested with the UEFI devices for each of each entry for "UEFI OS", and "UEFI OS" with the specific device, also disabling all except the TrueNAS 12.0-U8 "ISO" image on a flash drive. I also briefly tested in Legacy (BIOS) mode. The perplexing part was that while I could boot the TrueNAS 12.0-U8 "ISO" image on the flash drive while there were no drives in the server, when the drives were installed the boot always went straight to the failure as if any of the loaders instantly chained to the one on the installed OS.
 

AJCxZ0

Dabbler
Joined
Mar 11, 2020
Messages
13
I removed the OS mirrored devices, da0 and da1, replaced them with new drives and installed TrueNAS 12.0-U8 from the USB "ISO" mirrored on the new da0 and da1. With network configured and password set, a fresh TrueNAS install was ready. With the server off, I re-inserted all the data drives (da2 - da23) and booted... with the same result. It seems that the problem is with the boot-pool being booted by whatever precedes it and failing.
 

bondif

Cadet
Joined
Apr 14, 2022
Messages
9
I had the same issue after restarting my TrueNAS, and I don't have backups!!!
 

AJCxZ0

Dabbler
Joined
Mar 11, 2020
Messages
13
Copying from my earlier post on Unable to boot with data pool drives attached.:
I was able to get get my datasets and services back online with no data loss after booting with all data drives removed, leaving only the OS drives, then inserting the data drives, importing the zpool, mounting the ZFS and starting the services. Since I strongly suspect that the system remains unbootable (with uncertainly only due a zfs clean step in the process clearing an error), the effort to migrate all data off the system and rebuild it (this time with with a Hybrid pool) was underway when I had to step away for a long while.
 

bondif

Cadet
Joined
Apr 14, 2022
Messages
9
Thank you for the reply, but unfortunately my server doesn't support Hot plug (it's very old), so I can't plug the data drives back .
 

AJCxZ0

Dabbler
Joined
Mar 11, 2020
Messages
13
Due to an external event, the filer was powered off. When powered back on, it booted!
While I can only speculate about what made the difference, I suspect that clearing the degraded state from the pool may have been instrumental.

my server doesn't support Hot plug (it's very old), so I can't plug the data drives back .
Maybe you could temporarily connect the data drives to another host with sufficient ZFS feature support and (repair and) clean your pool, then return them to the server.
 

bondif

Cadet
Joined
Apr 14, 2022
Messages
9
Maybe you could temporarily connect the data drives to another host with sufficient ZFS feature support and (repair and) clean your pool, then return them to the server.
That wasn't possible either! What I ended up doing is removing all the data drives, and booting only with boot drives, TrueNAS booted successfully, but the pool says Data not available. Then I kept adding one drive at a time and rebooting, If the TrueNAS boots, I keep the drive and I add another one and reboot, if it didn't, I move the drive away and try another one.
I kept doing that until my pool had enough drives to be in the degraded state and I finally recovered my data !
 
Top