SOLVED Troubling shooting server with ZFS, (but not TrueNAS)

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Panic on my media server!

This is off topic, but does show some relevant trouble shooting.

For about a month my miniature media server started random reboots. Even while I was logged in. Afterwards I was unable to find a cause. Certainly not power, as it is on a UPS. After all reboots, I checked both ZFS pools and they always said they are fine. Last scrub was fine.

Here is hardware configuration:
  • Bought about May 2015
  • Model fit-PC fitlet H, quad core low power AMD embedded processor
  • Memory, 2 x 8GB SO-DIMMs, non-ECC
  • Network, Intel I211 Gigabit NIC
  • Storage:
    • mSATA with 1TB SSD
    • 2.5" 2TB SATA hard drive
  • Console:
    • BIOS is via HDMI & USB
    • Grub & Linux via serial port
  • Serial terminal server accessed via SSH
  • Power supply is 12v 3amp wall wort
  • UPS is 12v 3-5amp lithium battery, configured between wall wort & chassis

Software:
  • Gentoo Linux
  • Root FS uses 28GB from each drive as a ZFS mirror
  • Rest of storage is striped with ZFS for media
  • Twice a month ZFS pool scrub

First thing I did was monitor a reboot via the serial console, it was clean. Checked logs, considered activating a crash dump. But still could not find anything.

But still, random reboots, even with me logged in investigating.

I started panicing, backups are fine but I like that little thing. Very low power, small and yet still has 700GB for more media.

Wanted to try memtest86+, however that needed to be run from HDMI. (Or I'd have to learn to configure it for serial access.) I worried that I had unknown memory faults and ZFS was causing the crash. Without an easy memory test, that would have to wait.

So, my first step was to clean the mSATA, memory connectors and the disk connector. Probably not touched in 7 years. While removing it from it's shelf, I found that the UPS was blinking like it was re-charging. (It's a pretty dumb UPS, but cheap and suitable for the task.)

After I re-assembled it, it still had UPS problems, blinking. But, it should have been charging for 30 minutes or so. Maybe the UPS is bad, (it's younger, perhaps only 6 years old.) Removing the miniature UPS and going straight from the power supply failed. Could not power up.

Ah, ha!

Because I had another identical power supply, I went and used that instead. Low and behold, my miniature media server is back to rock stable. Even putting the 12v UPS back in, caused it to charge and quit blinking. Testing the old power supply with a simple volt meter says it has 12v.



Thinking about the symptoms, it appears that the old power supply can no longer supply enough amperage during certain tasks. If the UPS is charged, it can supplement the supply power for 10 to 30 minutes, but then out of juice. Thus, crashing the computer. The computer stays off until the UPS charges enough and the computer auto-boots. But, if I immediately attempt something like a software update, it will crash again.

Now I can quit trying to find a replacement with ECC memory & low power, (plus some of the other things like serial console or remote console).



So, after about 7 years, the first serious problem I had with this computer was not CPU, main board, storage, memory or software. But, a wall wort power supply.
 
Top