giftedhamster
Cadet
- Joined
- Feb 6, 2018
- Messages
- 8
tl;dr - System freezes when basically nothing is running. HW tests and logs don't indicate errors. System specs at bottom. Long history of troubleshooting follows.
I've been having some trouble since late December/early January with my FreeNAS system freezing, typically at night. I can't access the web console (it's disappeared from the network) and the monitor attached to the system displays but doesn't respond to keyboard input. The Reset button doesn't work, and I have to hold the Power button to get it to shutdown. When the system reboots, I get an email indicating an unauthorized system reboot. If I look back at the graphs in reporting, they abruptly end at whatever time the system froze, but otherwise don't really show anything suspicious. I have only a couple of hundred MB RAM free, but around 13GB of RAM in Wired. CPU temperatures are in the 20-30C range, and are not under heavy load. Sometimes the disks are under load (depending on the night and what is scheduled), but rarely. The error logs don't seem to indicate any problems either. They just abruptly stop.
The only thing I originally had running on this machine was a single bhyve Ubuntu VM (2 CPUs and 4GB of RAM allocated), which is itself only running Plex (I tried the jail for awhile, but found it frustrating).
Originally, I was running 11.0-U4, to my knowledge without issue. I upgraded to 11.1, and my trouble started not long afterward (possibly a coincidence). Reading about the memory leak in that version, I rolled back to 11.0-U4, but the problem persisted. I didn't have time to diagnose at the time, and knowing that 11.1-U1 was coming, I decided to wait. When 11.1-U1 was released, I upgraded to that version, and the problem remained.
With 11.1-U1, the "Check system health" bug message let me easily tell that the system was routinely locking up at 2am. Recognizing that this was when my Plex server did its routine tasks, I disabled the VM and ran a scrub on my Primary Data volume. The system froze during the scrub, which led me to believe it was a hardware failure.
I ran SMART tests on my drives, none of which reported errors. I ran memtest86 on my RAM sticks, which also didn't show errors. When I went to reboot into FreeNAS after the memtest, my System USB stick wasn't recognized by the BIOS. So I replaced the drive with a new USB stick, loaded my config, and everything ran fine (Plex included) for one day and night. I even did a scrub of my data volume to make sure. On the second night, the system froze again.
Not having a full backup of my data volume, I disabled the Plex VM, and setup a jail that's only job is to run rclone and sync everything to Google Drive. Given Google's per-day bandwidth limit, that's been taking awhile. This ran fine for a week, but two nights ago, my system froze again. The only thing that was running on my system when this froze was a single jail which was running rclone. It froze around 22:00, and all of my scheduled tasks are set to start after midnight.
So now I'm at a loss for what to do or test. I'm obviously going to keep doing my backup as long as it will run, but I have no idea what to do next for troubleshooting. Any ideas or suggestions are useful, either of things to try or of data points I have yet to collect.
System Specs (built 04/2017):
Build FreeNAS-11.1-U1
Platform Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
Memory 16047MB (2x 8GB sticks, non-ECC)
MOBO ASRock Fatal1ty H270M Performance
System Disk SanDisk UltraFit 32GB USB 3.0 SDCZ43-032G-GAM46
Primary Volume
RaidZ2 with the following disks (less than ideal, I know)
I've been having some trouble since late December/early January with my FreeNAS system freezing, typically at night. I can't access the web console (it's disappeared from the network) and the monitor attached to the system displays but doesn't respond to keyboard input. The Reset button doesn't work, and I have to hold the Power button to get it to shutdown. When the system reboots, I get an email indicating an unauthorized system reboot. If I look back at the graphs in reporting, they abruptly end at whatever time the system froze, but otherwise don't really show anything suspicious. I have only a couple of hundred MB RAM free, but around 13GB of RAM in Wired. CPU temperatures are in the 20-30C range, and are not under heavy load. Sometimes the disks are under load (depending on the night and what is scheduled), but rarely. The error logs don't seem to indicate any problems either. They just abruptly stop.
The only thing I originally had running on this machine was a single bhyve Ubuntu VM (2 CPUs and 4GB of RAM allocated), which is itself only running Plex (I tried the jail for awhile, but found it frustrating).
Originally, I was running 11.0-U4, to my knowledge without issue. I upgraded to 11.1, and my trouble started not long afterward (possibly a coincidence). Reading about the memory leak in that version, I rolled back to 11.0-U4, but the problem persisted. I didn't have time to diagnose at the time, and knowing that 11.1-U1 was coming, I decided to wait. When 11.1-U1 was released, I upgraded to that version, and the problem remained.
With 11.1-U1, the "Check system health" bug message let me easily tell that the system was routinely locking up at 2am. Recognizing that this was when my Plex server did its routine tasks, I disabled the VM and ran a scrub on my Primary Data volume. The system froze during the scrub, which led me to believe it was a hardware failure.
I ran SMART tests on my drives, none of which reported errors. I ran memtest86 on my RAM sticks, which also didn't show errors. When I went to reboot into FreeNAS after the memtest, my System USB stick wasn't recognized by the BIOS. So I replaced the drive with a new USB stick, loaded my config, and everything ran fine (Plex included) for one day and night. I even did a scrub of my data volume to make sure. On the second night, the system froze again.
Not having a full backup of my data volume, I disabled the Plex VM, and setup a jail that's only job is to run rclone and sync everything to Google Drive. Given Google's per-day bandwidth limit, that's been taking awhile. This ran fine for a week, but two nights ago, my system froze again. The only thing that was running on my system when this froze was a single jail which was running rclone. It froze around 22:00, and all of my scheduled tasks are set to start after midnight.
So now I'm at a loss for what to do or test. I'm obviously going to keep doing my backup as long as it will run, but I have no idea what to do next for troubleshooting. Any ideas or suggestions are useful, either of things to try or of data points I have yet to collect.
System Specs (built 04/2017):
Build FreeNAS-11.1-U1
Platform Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
Memory 16047MB (2x 8GB sticks, non-ECC)
MOBO ASRock Fatal1ty H270M Performance
System Disk SanDisk UltraFit 32GB USB 3.0 SDCZ43-032G-GAM46
Primary Volume
RaidZ2 with the following disks (less than ideal, I know)
- 1x - 6TB WD Red
- 4x - 4TB WD Red
- Single Disk - SAMSUNG 850 EVO 250GB SSD (stores my VMs and jails)
- Single Disk - External 500GB HDD - Replication target for the previous disk