SOLVED Mysterious periodic failures - no log messages, no video output

xale

Dabbler
Joined
Mar 20, 2021
Messages
12
Hi all,

First time poster, very new to TrueNAS, and should probably get a couple things out of the way:
  1. I'm familiar with FreeBSD/*nixes and the command line in general, but I'm not too experienced with more complex system administration.
  2. My hardware (below) is definitely not a recommended configuration, because a) I was trying to put something together using (mostly) parts I already had handy and b) the silicon shortage seems to be making all the "better" equipment rather expensive.
My setup:
  • TrueNAS-12.0-U2.1 (Core)
  • AMD Ryzen 5 1600
  • ASRock B450 PRO4
  • 16GB Crucial DDR4 2666 (non-ECC)
  • Intel EXPI9301CTBLK PCI-E Gigabit Ethernet (mobo NIC is Realtek; not used)
  • GIGABYTE HD Experience Series GeForce 210 (no integrated graphics)
  • 180GB SATA SSD (boot disk)
  • 3TB 7200 RPM HDD x 2
  • 4TB 5400 RPM HDD x 2
My problem:

Everything works great! ...for around one to three days. The web portal is accessible, SSH connections work, my sync and snapshot jobs run correctly, my NFS shares are available, and the MySQL DB I'm running in a jail does its job perfectly.

Then, at some point I can't determine exactly, the system becomes completely unresponsive. The web portal and SSH connections time out, none of the scheduled tasks happen, and the machine won't even respond to the hardware power button (which usually triggers a "graceful" shutdown). My only option is the reboot button on the case.

I've reviewed the /var/log/messages each time this happens, and I see no evidence of errors. I see messages from the boot process, a couple of "configuration reload" notices, (one for each night the machine was up) and then... another boot message.

After the last failure, I threw a graphics card in the machine and hooked up a monitor to see if that would provide anything useful, but there seems to be no video signal when the machine is in this state.

So, my question: what are my next steps for debugging? As I said, I'm inexperienced as a sysadmin, and besides looking for log messages, I don't really know where to start.

I'm guessing the response here will be "sounds like a hardware issue", and that's fair, but I'd really like to see if I can pinpoint the problem more exactly before I give up on this setup entirely.

Thank you,
- Alex
 

xale

Dabbler
Joined
Mar 20, 2021
Messages
12
Quick bump here. If possible, I'd at least like some debugging/data recording steps, so that I might be able to provide more info.
 

ThreeDee

Guru
Joined
Jun 13, 2013
Messages
700
What BIOS revision are you running on that motherboard? Might want to look into updating for better memory compatibility and what not. Memory controller on 1st gen Ryzen wasn't that great ..
Running 2 sticks of RAM or 1?
If 2, are the sticks of memory in slots A2/B2? .. if just 1, in slot A2?
Have you tried running RAM at 2133 and see if it still locks up..?
I'd disable audio and the Realtek NIC in the BIOS if it will let you
Disabling C states in BIOS was something to do for stability in earlier versions of TrueNAS/FreeNAS .. I don't know if it's still applicable in latest TrueNAS, but might want to give it a try if previously mentioned things don't help, though it's not something I had to do with my setup in sig and it's been running rock solid since I set it up.

You can install ECC UDIMM's on your setup and it will work
 

LarsR

Guru
Joined
Oct 23, 2020
Messages
719
I'm using a 1600x on an x370 motherboard and i had to disable C-States and AMD Cool & Quiet for the system to become stable.
I was experiencing random freezes after 3 days of uptime.
 

xale

Dabbler
Joined
Mar 20, 2021
Messages
12
Thank you for all the suggestions! Let me look into these/give them a try and I'll let you know what I find.
 

xale

Dabbler
Joined
Mar 20, 2021
Messages
12
My RAM configuration is a single stick, in slot A2

The motherboard's BIOS version was P3.30, which definitely seems to be an older version (although ASRock's BIOS page is pretty damn confusing). I'll try updating before moving to any other settings.

(edit: I also went ahead and disabled the onboard audio and NIC, and disabled "Cool & Quiet", since those should be pretty innocuous changes)

You can install ECC UDIMM's on your setup and it will work

Awesome! What type(s) of ECC are supported? Unbuffered, etc.? I'm blind, you already said UDIMMs.
 
Last edited:

ThreeDee

Guru
Joined
Jun 13, 2013
Messages
700
ASRock BIOS's ..and other AM4 setups
 

xale

Dabbler
Joined
Mar 20, 2021
Messages
12
ASRock BIOS's ..and other AM4 setups

It looks like that table just lists the latest BIOS versions, correct? I'm somewhat wary of using the latest version, since the ASRock site specifies not to install any version newer than 5.80 on machines using Summit Ridge CPUs, and the download links for all versions newer than 5.00 have a JS pop-up alert (who does this?!) recommending not to install if an older version works (I'm testing 5.00 now).
 

ThreeDee

Guru
Joined
Jun 13, 2013
Messages
700
You install whatever BIOS revision you are comfortable with .. :wink:
 

xale

Dabbler
Joined
Mar 20, 2021
Messages
12
Just saw the same failure mode again. I'll try disabling C-states next, and see how that goes.
 

xale

Dabbler
Joined
Mar 20, 2021
Messages
12
The machine's been up and stable for over a week now, so it looks like everything's probably ship-shape.

For those playing along at home, here's the complete list of changes I made:
  1. Update the BIOS to version 5.00
    1. (If your current version is older than 3.40, you'll need to upgrade to that version before installing any other updates.)
  2. Change the following BIOS config settings:
Code:
Advanced > CPU Configuration > Cool'n'Quiet: [Disabled]
Advanced > South Bridge Configuration > Onboard HD Audio: [Disabled]
Advanced > AMD CBS > Zen Common Options > Global C-State Control: [Disabled]
Advanced > AMD PBS > LAN Power Enable: [Disabled]
 

LarsR

Guru
Joined
Oct 23, 2020
Messages
719
Told you Cool'n'Quiet and C-States are the sons of the devil :D
Glad to hear it's stable now.
 
Top