System can't fully start, boot loops

Parsivan

Cadet
Joined
Aug 26, 2023
Messages
9
I have not touched any settings or chnaged the system itself. It was turned off since where i live electricity is expensive. Few days later i try turning it on and the UI wasn't connecting so i hooked up monitors:

These are what it gets stuck on video

I would try to solve this myself. But i don't even know the cause or the problem at hand. I did some searches but couldn't find anything showing how to fix.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
you have a kernel panic. that means hardware bad, usually.
you have some of your info in your sig, which is a good, however, it's VERY sparse. your gaming system doesn't really help us any.
at the least, HDDs? raidz1/2/3/mirror? storage controller? boot disk(s)?
the board you list is generally not a great choice for truenas, but it shouldn't cause a kernel panic on it's own unless it's failing.

my guess would be RAM, storage controller, cables, or data pool disks, in that order.
 

Parsivan

Cadet
Joined
Aug 26, 2023
Messages
9
you have a kernel panic. that means hardware bad, usually.
you have some of your info in your sig, which is a good, however, it's VERY sparse. your gaming system doesn't really help us any.
at the least, HDDs? raidz1/2/3/mirror? storage controller? boot disk(s)?
the board you list is generally not a great choice for truenas, but it shouldn't cause a kernel panic on it's own unless it's failing.

my guess would be RAM, storage controller, cables, or data pool disks, in that order.
Well, it was running flawlessly for almost a year.
I have 3* 4TB Iron Wolf in Raidz 1. I have 2*m.2 ssd in a mirror for boot. There's a single 2.5" sata ssd for my Minecraft server. corsair (2*16GB) DDR4 3200Mhz cl16. the drives are just straight-up connected to the motherboard.
I'm not experienced with kernel panic! what is it that I have to do?
All of the drives show up in the BIOS
 
Last edited:

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
Well, it was running flawlessly for almost a year.
hardware dies.
I'm not experienced with kernel panic! what is it that I have to do?
you need to figure out which hardware component is causing the crash via regular computer troubleshooting steps (aka remove a part, see if it still crashes). i gave a list of things to try looking at. there isn't really much more I can do.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Run Memtest86+ for 4 passes, if that works, run a CPU Stress Test for at least 30 minutes to an hour. If that passes then you have tested most of your hardware.

How much RAM do you have? 16GB and you are good to bootstrap the machine, 8GB and it should bootstrap fully to a GUI.

The next thing, boot from the second boot device, see if it boots all the way. That is the only thing good about a mirror, actually this could be the first thing you try, if it works then you know your other boot device is corrupt. hopefully the second one isn't.
 

Parsivan

Cadet
Joined
Aug 26, 2023
Messages
9
The problem is the SSD I had for minecraft and my applications.
The server is booting up no problems now.

Is there a way i can still access the data and transfer it to another ssd or something. I have my applications in the SSD.
hardware dies.

you need to figure out which hardware component is causing the crash via regular computer troubleshooting steps (aka remove a part, see if it still crashes). i gave a list of things to try looking at. there isn't really much more I can do.
 

RetroG

Dabbler
Joined
Dec 2, 2023
Messages
16
your kernel panic is ZFS related, it's late enough that it suggests that it's one of your data pools failing to import rather than your boot-pool
take out all your disks (aside from those that belong to the boot-pool), see if you can boot it up.


if you are successful, you may be able to import the other pools read only and migrate the data.

the panic itself suggests that there is an issue with the metaslab/rangetree (how zfs stores it's free-space map). I've seen this caused by improper shutdown during certain destroy options (say... deleting a large dataset). which the ZIL replay is not able to cleanly recover from.
 
Last edited:

Parsivan

Cadet
Joined
Aug 26, 2023
Messages
9
your kernel panic is ZFS related, it's late enough that it suggests that it's one of your data pools failing to import rather than your boot-pool
take out all your disks (aside from those that belong to the boot-pool), see if you can boot it up.


if you are successful, you may be able to import the other pools read only and migrate the data.

the panic itself suggests that there is an issue with the metaslab/rangetree (how zfs stores it's free-space map). I've seen this caused by improper shutdown during certain destroy options (say... deleting a large dataset). which the ZIL replay is not able to cleanly recover from.
the pool on the SSD was just for applications, for which only Minecraft worlds were important. all of the pools are offline. the pool with the 3 HDDs seems fine (just checksum errors on one of the drives)


Alexius is the HDDS and MC_Damien I intended for Minecraft, but I made it my application pool when setting it up by accident.
 

RetroG

Dabbler
Joined
Dec 2, 2023
Messages
16

you will have to play with it... depends on what options will work for your exact circumstance. of course shut anything else you care about when importing the problematic pool just incase it panics again. then connect the disks to the problematic pool to the system after it's booted and try importing with some advanced options on the shell



the above smells like this particular issue, I would recommend reading through some of that. hopefully you can at least get the pool imported enough to copy the MC worlds away to another pool.
 

Parsivan

Cadet
Joined
Aug 26, 2023
Messages
9

you will have to play with it... depends on what options will work for your exact circumstance. of course shut anything else you care about when importing the problematic pool just incase it panics again. then connect the disks to the problematic pool to the system after it's booted and try importing with some advanced options on the shell



the above smells like this particular issue, I would recommend reading through some of that. hopefully you can at least get the pool imported enough to copy the MC worlds away to another pool.
Where can I find instructions on how to import a pool on read-only? Do i just disconnect the pool and then restart the server with the SSD in the machine?
also, how do I shut down other pools as you suggested? i don't see any obvious way of doing so
sorry I couldn't find any resources myself. all I can find are other posts about problems with importing!
 

RetroG

Dabbler
Joined
Dec 2, 2023
Messages
16
Code:
zpool import -o readonly=true -f poolname


again, you may want to try other options (like -F) if this still doesn't work.
 

Parsivan

Cadet
Joined
Aug 26, 2023
Messages
9
ok it imported. I found another post that you also have replied to about moving the ix-applications to another pool. and since I can't use the migrate option would it be possible to do it with rsync? I did a dry run with
Code:
rsync -rvnP MC_Damien/ix-applications /Alexius_repository/ix-applications

the only thing I'm concerned is it gave this error:


Well I went ahead(ignoring the previous error) and synced and when I tried to choose the new pool i got this error:
 
Last edited:

RetroG

Dabbler
Joined
Dec 2, 2023
Messages
16
you can use zfs send/zfs receive on the CLI.. but you probably can't make a snapshot on a read-only pool

but no you cannot migrate the ix-applications datasets with rsync (emphasis on the plural, there are sub-datasets that are hidden in the webui, along with attributes that rsync will not be capable of replicating)

honestly, you are better off recreating the containers and copy the config/data out separately.

the only gotcha is going to be if you used PVC/ix-volumes, which hide the data pretty deep within ix-applications.
 

Parsivan

Cadet
Joined
Aug 26, 2023
Messages
9
you can use zfs send/zfs receive on the CLI.. but you probably can't make a snapshot on a read-only pool

but no you cannot migrate the ix-applications datasets with rsync (emphasis on the plural, there are sub-datasets that are hidden in the webui, along with attributes that rsync will not be capable of replicating)

honestly, you are better off recreating the containers and copy the config/data out separately.

the only gotcha is going to be if you used PVC/ix-volumes, which hide the data pretty deep within ix-applications.
Well, I made sure my Minecraft files were secure and I installed other applications I had. Thank you for the help! I was petrified when I heard the word kernel panic.
Is there a way for me to check what is wrong with the SSD? I know some programs but they are for HDDs?

but this is resolved.
 

RetroG

Dabbler
Joined
Dec 2, 2023
Messages
16
well... the only way this could happen and it be the SSD's fault is if the SSD corrupted the metaslab and also corrupted the related checksums in a way that they actually match. simply put, extremely unlucky. I doubt there is anything wrong with the SSD itself and I suspect there was an improper shutdown or crash during ZFS freeing space. (especially since I've actually seen this behavior before, caused by exactly that)

ZFS while resilient isn't immune to metadata corruption, sometimes it happens. however unlike most filesystems (which would silently "correct" it to whatever it thinks is correct), ZFS is designed with the assumption that something is very wrong and to guarantee data integrity requires manual intervention (be that, import read only, throw out the ZIL replay, etc)

you can run a SMART test, under the storage page, in the top left there is a "disks" button, find your SSDs, and run a long test.

aside from that, if it wasn't an improper shutdown, and was indeed a crash, find out why so you don't end up corrupting your pools this way again. and once you are confident that everything is resolved, just create a new pool on the SSD.
 
Last edited:
Top