SOLVED Random crashes - need help troubleshooting

1337Hacker

Dabbler
Joined
Oct 22, 2017
Messages
27
Finally committed to a well needed upgrade (old system was running a Phenom 1 and 8GB of memory). Ran 6 passes of MemTest (no errors) and installed TrueNAS CORE 12.0-U8

System has been running stable 24/7 for about 3 weeks. Last night, I experienced the first random crash. Nothing in var/log

I noticed some UPS errors every day: libusb_get_interrupt: Unknown error
So I changed the polling interval to 10

This afternoon, the system went down again. I decided to disconnect the usb and disable UPS in services to determine if that was the cause. Old system was using the same UPS without any issues. No power spikes, unlikely the UPS itself, but could be usb related.

I noticed a new error on reboot showing: Error opening file for writing; filename='/dev/console', error='Device not configured (6)'
Not running headless, and have a monitor plugged into HDMI, but I set up the script according to this post: https://www.truenas.com/community/threads/truenas-12-on-shuttle-ds77u.89342/#post-618819

Errors no longer showing on reboot, pulled up /data/crash and noticed the most recent crash. Attempting to fetch the report, the system crashes for the third time in 2 days.

I'm currently running another pass of MemTest, but want to look for other potential problems when the system is back on. The system has booted after each crash. Reports show nothing out of the ordinary, but I'm also learning this as I go. And I'm also at a point where I'm not sure how many of the issues on old posts are still relevant, like running SMART on SSD could cause crashes? https://www.truenas.com/community/threads/had-an-unscheduled-system-reboot.82582?

If anyone has any suggestions or can help troubleshoot logs, I'd really appreciate it.

System build:
W480M Vision W
Intel Xeon W-1370
Nemix 32GB ECC
Corsair SF450 - used prior to the system rebuild
2X WD120GB SSD (wds250g2b0a) - mirrored boot used prior to the system rebuild
6X WD Reds - pool used prior to the system rebuild
CyperPower AVRG750U - UPS used prior to the system rebuild
 

Krautmaster

Explorer
Joined
Apr 10, 2017
Messages
81
i think its a FreeBSD related issue which was merged to 12 and 13 anywhere between 12.0 U7 and now. Maybe unmapped IO if anything like this or a ZFS update was merged back to Truenas 12
 

1337Hacker

Dabbler
Joined
Oct 22, 2017
Messages
27
For any future readers: the PSU was the source of the problem. Just enough juice to keep the system running but not enough to power everything during heavy utilization. Power output was most likely diminishing from age since wattage was (theoretically) rated enough for everything. Tough to diagnosis - luckily I had a spare PSU.
 
Top