SOLVED Bootloader Went Bye-Bye !?

MyVizDrake

Dabbler
Joined
Jan 27, 2016
Messages
21
HI!

Was running CORE-12-U8 (I think) ...

Had a failed Core--> SCALE upgrade a month or two ago .... been running fine since. I think I even rebooted since then but I have no idea.

Manually updated my Letsencrypt cer and the webui didn't refresh, which is fine so i rebooted. no errors in the messages. Failed to start.

My IPMI connection shows ...

1660774277641.png


..which clearly isn't good. I know not to panic as the pool is most likely safe.

How do I recover? Reinstall 12-U8?

Thoughts?

Thanks in advance!

Scott
 

homer27081990

Patron
Joined
Aug 9, 2022
Messages
321
Manually updated my Letsencrypt cer and the webui didn't refresh, which is fine so i rebooted
... It is not fine. Grab an TrueNAS install USB and check if it sees your boot drive. If your boot loader is missing (which I really doubt is caused by anything letsencrypt! did) it most likely is gone along with the entire boot pool.
Verify somehow your boot pool exists and you didn't lose BIOS settings (sometimes a bad battery can wipe the boot order), and come back to us.
 

MyVizDrake

Dabbler
Joined
Jan 27, 2016
Messages
21
I booted and got to the BIOS and it sees the boot drive (a Transcend 16GB SLC SSD) and when manually selecting the Transcend boot SSD, it gives the missing bootloader error.

I think I have the saved config file from the most recent one of either a) the failed SCALE upgrade (there is a JIRA bug logged that I think was fixed in 22.02.1 or Bluefin) or b) the last time I did a an update to the 12.0 installation.

So it appears the boot order is fine (the Transcend SSD remains boot priority 1) and I don't get the failed CMOS Checksum error of a bad battery (although the battery is probably in need of replacement, but that can be dealt with later).

I will grab a 12.0-U8.1 installer, boot via USB (via IPMI) and see what I see.

If it doesn't see the boot pool I should be able to just reinstall and then restore my last config file, right?

Thanks again!
 

homer27081990

Patron
Joined
Aug 9, 2022
Messages
321
Did you try enabling/disabling CSM, changing AHCI/Legacy(IDE)? Also, just instinct, can you mount the ssd to a laptop (or whatever) and verify the storage is there? You may be seeing the drive but the flash be gone.
 

MyVizDrake

Dabbler
Joined
Jan 27, 2016
Messages
21
I have not done anything via the BIOS .... I cannot do anything with the SSD as I currently don't have physical access to the HW.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Assuming the SSD is still good (which could be an invalid assumption, as it appears you've suffered a failure of that SSD), you could try reinstalling to that SSD, and then importing the saved config.
 

MyVizDrake

Dabbler
Joined
Jan 27, 2016
Messages
21
I think I confirmed the SSD is bad ... smartctl -i -a shows OLD AGE in all attributes and 100 100 100 in all values.

Installer can see it. Installer can install and preserve database, but subsequent reboots won't see the drive.

Full disclosure I took the oportunity to disable CSM mode at this time and move to UEFI / GPT but that shouldn't impact anything.

Downloaded 12.0-U8.1 and installed on mSATA via USB 3.0 adapter as a test (connected to external USB 3.0 port AND internal USB 3.0 port).

Booted successfully ... uploaded my config file from Jun 22 ...get an error ...
Error: [EFAULT] Failed to upload config, version newer than the current installed.

Reinstalled 12.0-U8 ... same error.

I know I was on 12.0-U8.1 at the time I attempted the SCALE upgrade in June ... I never upgraded to 13.0

SATA was and remains AHCI

I am going to move to SCALE anyway but want to get 12.0 --> 13.0 in place before doing so ...

Do I need to revert to CSM mode 1st? CSM vs UEFI should only impact the boot process unless I completely misunderstand BIOS / Legacy vs UEFI .. which is possible. :smile:

Thoughts?

Appreciate your help, as always!

Scott
 

MyVizDrake

Dabbler
Joined
Jan 27, 2016
Messages
21
Looks like this is resolved ... bad boot ssd (MLC instead of SLC) ...

Eventually found a config from May of this year that loaded. Had to "reinstall" my acme.sh and deploy-config setup but it appears my 3x jails and 1x VM are fully functioning again.

Once everything was updated to 12.0-U8.1 resaved the config.

Will get two SSDs and setup the boot SSDs as a mirror and then worry about 12.0-U8.1 --> 13.0-Ux / SCALE.
Appreciate the help!
 
Top