TrueNAS SCALE 22.02.0.1 does not boot after mirrored boot-pool disk failure

malignosama

Cadet
Joined
Oct 28, 2018
Messages
6
One of two mirrored disks of the boot-pool failed. The system remained functional in degraded mode. The faulty drive was removed.

After a controlled restart, the system won’t boot up anymore and remains on grub prompt (grub rescue).

I would appreciate any help to bring the system back to life without reinstalling Scale?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Depending on how your BIOS is set, you may need to adjust it to boot properly. You may need to modify the GRUB partition reference to boot from the different disk.
 

malignosama

Cadet
Joined
Oct 28, 2018
Messages
6
Depending on how your BIOS is set, you may need to adjust it to boot properly. You may need to modify the GRUB partition reference to boot from the different disk.
I have checked that and actually grub is starting on the new disk. It just remains on its command line and does not show the TrueNas start menu.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
You may need to modify the GRUB partition reference to boot from the different disk.
I'm no GRUB expert, but I think it's these lines from /boot/grub/grub.cfg (repeated for each boot menu item) that aren't pointing in the right direction:
Code:
        set root='hd1,gpt3'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd1,gpt3 --hint-efi=hd1,gpt3 --hint-baremetal=ahci1,gpt3  f420d7472648c25c
        else
          search --no-floppy --fs-uuid --set=root f420d7472648c25c
        fi


It may just be quicker to try a reinstall over the top to correct the boot config... be careful to either already have a config saved or go and get it from /data/freenas-v1.db and try to get it to "upgrade" rather than fresh install if you can.
 

malignosama

Cadet
Joined
Oct 28, 2018
Messages
6
Well actually after trying everything I could imagine on the grub command trying to find and/or access the right boot location I decided to use the nuclear option and reinstall Scale on the old remaining functional disk and on a new redundant one that replaced the broken one. Recovering the system was very straightforward as I has a backup of the configuration file.

When everything was running again I performed some test removing one disk of the boot-pool at a time and restarting the system. The results were interesting.
  • When removing the first boot-pool disk right after a clean install I restarted the server. All went well. It loaded grub on boot up, with menu and all, and the system booted correctly in degraded mode. Then I reinserted the first disk again and resilvered.
  • Then I restarted after removing the second boot drive. The server didn’t boot up and the message “this is not a boot drive but a zfs data disk” was displayed. It seems that no grub loader was installed on this remaining disk during the clean Scale installation.
  • I inserted the second boot drive and the server booted correctly again. I wiped out the first boot drive and added it again to the boot-pool. Then I waited until the resilvering was completed, removed the second boot drive and restarted again. This time it did work.
Long story made short: it seems that the Truenas Scale installer does not install grub on both drives at installation time, but it does it when adding disks afterwards to the boot-pool.
 
Top