How to reboot gracefully with only access to IPMI (not SSH)?

mikael

Dabbler
Joined
Feb 9, 2018
Messages
18
Hi!

I have a X11SSM-F and I've been having some trouble with uptime. After a few days the TrueNAS hangs. I'm a few versions behind and it very much looks like this problem.

Anyway, the machine hangs and I want to shutdown in a orderly way before upgrading TrueNAS. I have not enabled SSH yet, but I can connect via IPMI. "Power Off Server - Orderly Shutdown" via IPMI resulted in the some processes would not die; ps axl advised message in the iKVM/HTML5 console (seen below). It seems I cannot type anything into this terminal. Is there a way to issue a reboot via iKVM/HTML5 via some sort of command or key combination with the virtual keyboard?

orderly.png
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
So you did not tell us your configuration so if I assume you are running TrueNAS on bare metal then you should be able to use a monitor and keyboard to get to the GUI on the machine, not remotely. If the GUI is not responsive and briefly pressing the power button does not force a shutdown, then you need to force it by holding the power button down until the machine turns off.

If I assume you are running in a VM then all you can do is shutdown the VM, force the issue.

Forcing the power off is never ideal but if you disconnect the Ethernet cable, wait a few seconds, then force the machine off, odds are it will be safe.

Since you are having issues let me ask, did you burn in your system to ensure stability? Do you have a proper hardware configured system?
 

mikael

Dabbler
Joined
Feb 9, 2018
Messages
18
So you did not tell us your configuration so if I assume you are running TrueNAS on bare metal then you should be able to use a monitor and keyboard to get to the GUI on the machine, not remotely. If the GUI is not responsive and briefly pressing the power button does not force a shutdown, then you need to force it by holding the power button down until the machine turns off.
Thanks! Bare metal, yes. I have no VGA monitor at hand. But, having connected a keyboard, how would I got about rebooting the machine assuming I'm at the same place as in my OP screenshot?
Forcing the power off is never ideal but if you disconnect the Ethernet cable, wait a few seconds, then force the machine off, odds are it will be safe.
I'd rather not, but if I have to I have to.
Since you are having issues let me ask, did you burn in your system to ensure stability?
I did not burn in. I will Google for instructions.
Do you have a proper hardware configured system?
I hope so. This is what I'm using. I've added two mirrored WD Gold 12TB drives too.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Since you have a keyboard connected, look at your IPMI screen, hopefully when you hit the Enter/Return key the IPMI screen will respond. If it does not then I think you are looking at a hard power off. The good thing is you have data redundancy and if any data becomes corrupt, the system will very likely repair it. Of you cannot read/write to the NAS as it is, they think could be crashed anyway in which there is not additional risk in a hard power off.

After you power the thing off, before really using it again I would recommend that you do some burn-in testing on the motherboard/CPU/RAM. Don't worry about the hard drives, those are already in use. Something like Prime95 for the CPU testing, MemTest86 for the RAM testing. I personally like to run the CPU test for no more than 2 hours. The RAM testing I like to run for at least 24 hours. If you have a failure then the problem could also be a power supply or bad motherboard and heatsink installation, it doesn't have to be the CPU or RAM.

I too have that motherboard and like it a lot. It has had a few upgrades to the BIOS and BMI. While these are likely not giving you the issues, you could flash the current firmware on the board if you like, but I'd do that after the burn-in testing.

You also didn't mention what your boot device is nor the version of FreeNAS/TrueNAS you are running. If your burn-in testing is okay, I'd reinstall the version of FreeNAS/TrueNAS from a clean download then load your backup configuration file or reconfigure from scratch. Your system should not be crashing every few months.

Good luck.
 

mikael

Dabbler
Joined
Feb 9, 2018
Messages
18
Sorry for the late reply and thank you for input @joeschmuck !
Since you have a keyboard connected, look at your IPMI screen, hopefully when you hit the Enter/Return key the IPMI screen will respond. If it does not then I think you are looking at a hard power off. The good thing is you have data redundancy and if any data becomes corrupt, the system will very likely repair it. Of you cannot read/write to the NAS as it is, they think could be crashed anyway in which there is not additional risk in a hard power off.
Data survived the hard power off.
After you power the thing off, before really using it again I would recommend that you do some burn-in testing on the motherboard/CPU/RAM. Don't worry about the hard drives, those are already in use. Something like Prime95 for the CPU testing, MemTest86 for the RAM testing. I personally like to run the CPU test for no more than 2 hours. The RAM testing I like to run for at least 24 hours. If you have a failure then the problem could also be a power supply or bad motherboard and heatsink installation, it doesn't have to be the CPU or RAM.
Built this server a few years ago but sadly I have not used it a lot. I looked back at my build notes and I actually did pretty extensive burn-in tests of the CPU, RAM (memtest86) and the system drive.
I too have that motherboard and like it a lot. It has had a few upgrades to the BIOS and BMI. While these are likely not giving you the issues, you could flash the current firmware on the board if you like, but I'd do that after the burn-in testing. […] You also didn't mention what your boot device is nor the version of FreeNAS/TrueNAS you are running. If your burn-in testing is okay, I'd reinstall the version of FreeNAS/TrueNAS from a clean download then load your backup configuration file or reconfigure from scratch. Your system should not be crashing every few months.
It did not crash every few months, more like once a week. However, I ran an update and it is now behaving well. Have not had a single crash since. What I'm noticing though is that the ZFS cache is slowly but steadily increasing and free RAM is decreasing. It seems to happen over two weeks or so. A week or so ago I had 10 GB free memory and now I'm at 1.3 GiB free memory and 11.5 GiB ZFS Cache. I fear that loss of RAM over time causes the crashing.

Is this normal? Do I need more than my 16 GB RAM for 2*8 TB and 2*12 TB (both mirrors)?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Is this normal?
It is indeed. ZFS will use (nearly) all available RAM for caching, but will release it as other software needs it. If all you're doing on your server is storing and sharing files, 16 GB is likely fine. More RAM is always good, but what you have should be enough.
 

mikael

Dabbler
Joined
Feb 9, 2018
Messages
18
It is indeed. ZFS will use (nearly) all available RAM for caching, but will release it as other software needs it. If all you're doing on your server is storing and sharing files, 16 GB is likely fine. More RAM is always good, but what you have should be enough.
Thanks! Hopefully it will continue to sail gracefully.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Is this normal?
My two cents... Everything @danb35 said was true. My addition is to know if you need more RAM, just look at your SWAP Space, if it goes above 0% used often and you see it using something like 500K or more, I myself would add more RAM. Using the Swap Space means that you ran out of RAM and the computer pushed some piece of the operating system/application to the hard drive to free up some RAM. This will cause issues in the long run, mainly slow operations but could lead to an unstable system as well, but the goal it for things to work fine for the RAM shortage. So look at the SWAP Space, see if there is any real use of the space, if it's staying at zero or you have a few rare blips of 200K or less, I wouldn't worry about it. If you have frequent blips of 200K or an in-frequent blip of 500K, I think you are short.

BTW, I give my system 16GB RAM (in a VM) and have never had an issue, but I only have Plex installed and otherwise, it's just a NAS.

Glad you have the system running again.
 

mikael

Dabbler
Joined
Feb 9, 2018
Messages
18
See now I didn't reply to your last post joeschmuck. Thanks anyways.

Just wanted to add to this thread. I continued having problems with TrueNAS hanging until i replaced power and SATA cables for a drive that was suddenly offline. No problems since that at all.
 
Top