SOLVED Must reboot EVERY morning

Gr8Britton

Dabbler
Joined
Aug 13, 2020
Messages
12
Running FreeNAS 11.3 most current release.

Starting last Friday morning, we are finding that the NAS suddenly is inaccessible via the web interface and we cannot bring up the menu on a monitor. We have to do a hard reset. After that, it works fine all day. The next morning, we have to do this again.

I just upgraded to 16GB RAM yesterday but it happened again this am. Any suggestions? Anyone have this happen to them?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Per the Forum Rules, please provide the details of your hardware. You haven't provided enough information to help us troubleshoot.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
we cannot bring up the menu on a monitor
If the local console is inaccessible, this would suggest a hardware fault in the system itself. As suggested by @Samuel Tai please post your hardware specifications, but I would suggest looking at any IPMI/out-of-band monitoring available for logs, and if you are able to have a downtime, start with running a bootable testing ISO such as memtest86 if you do not have hardware-level monitoring.
 

Gr8Britton

Dabbler
Joined
Aug 13, 2020
Messages
12
Per the Forum Rules, please provide the details of your hardware. You haven't provided enough information to help us troubleshoot.
Thank you. I had to get into the office to get the hardware info. Then, had to get it running.

Motherboard is a Gigabyte GA-AB350M-DS3H. I'm checking on firmware updates now.
FreeNAS is installed on an SSD drive
The server has an older LSI Controller Card (Trying to get the info but might have to reboot the system after hours) with 16 600GB SATA drives connected in a hardware RAID-5 (8.15 TB)
Have an external USB hard drive (4.55 TB)
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Ah, a Ryzen board. These are known to have stability issues in 11.x. The typical work-around is to disable Cool'n Quiet and C6 power states in the BIOS.

Also, hardware RAID isn't recommended, as ZFS really wants direct access to the drives, without any RAID in the way. See https://www.truenas.com/community/t...s-and-why-cant-i-use-a-raid-controller.81931/. If the hardware RAID fails, then there's no way to recover.
 

Gr8Britton

Dabbler
Joined
Aug 13, 2020
Messages
12
Ah, a Ryzen board. These are known to have stability issues in 11.x. The typical work-around is to disable Cool'n Quiet and C6 power states in the BIOS.

Also, hardware RAID isn't recommended, as ZFS really wants direct access to the drives, without any RAID in the way. See https://www.truenas.com/community/t...s-and-why-cant-i-use-a-raid-controller.81931/. If the hardware RAID fails, then there's no way to recover.


I will look into the power states in the BIOS. I also looked at my CM log and the problem popped up after upgrading to 11.3. Question...would moving to TrueNAS Core 12 be better? Or the same issue?

I did see the article you referenced about the hardware RAID as I was researching yesterday. I would have to research the RAID card a bit to see how to set it up as individual drives and set ZFS handle the RAID. So, I'll take a look at the power states first. Thank you!

I'll look to move to an Intel board and move the RAID control to ZFS after at a later point.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
TrueNAS Core has better support for Ryzen, as it's based on a more recent release of FreeBSD. However, 11.3-U5 is currently the most stable release, and I would stay there unless you have a pressing desire to be more on the cutting edge.
 

Gr8Britton

Dabbler
Joined
Aug 13, 2020
Messages
12
I looked in the BIOS and there were no settings like you mentioned. Any special power settings were already disabled. High Precision Timers was enabled and I tried disabling that, just in case. So, I'll see how it goes tomorrow.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
in a hardware RAID-5 (8.15 TB)
I would say that the most likely source of the problem is this hardware RAID. With that, you only have ONE disk presented to ZFS, so there is no redundancy in the FreeNAS configuration. If there is data in this storage pool, the first priority is getting that data out to safe storage. It is not safe in this configuration. FreeNAS is not able to monitor drive health or reconstruct any damaged data as ZFS has NO redundancy. Multi disk redundancy is how ZFS protects data from corruption. The hardware RADI controller is not the answer.

Please refer to these guides when selecting hardware, paying particular attention to disk controllers as ZFS needs direct access to multiple drives to compute checksums on the data:



Despite the age of these guides, the information is still applicable. FreeNAS does NOT need the latest hardware. It needs hardware that can be relied upon to deliver correct data.

You should also take time to review these guides:



The storage pool (made up of vdevs) in ZFS is where redundancy is created. Never rely on a single disk, not even for the boot drive.
 

Gr8Britton

Dabbler
Joined
Aug 13, 2020
Messages
12
Things went well last night / this AM so I'll continue to watch and update this thread if there are new revelations.
 

Gr8Britton

Dabbler
Joined
Aug 13, 2020
Messages
12
Since this was an inherited box, I can take all of this into account when it is rebuilt. However, it has been plugging along as is for a long time and I read these articles when I had issues a few days ago. At this time, wiping it all out and rebuilding it is not an options unless all else fails. However, I will take this all into account when I do have a chance to rebuild the NAS.

One thing I'll clarify, the HW RAID is not setup as a pool. It is setup an an iSCSI LUN with the disk assigned to the LUN.


I would say that the most likely source of the problem is this hardware RAID. With that, you only have ONE disk presented to ZFS, so there is no redundancy in the FreeNAS configuration. If there is data in this storage pool, the first priority is getting that data out to safe storage. It is not safe in this configuration. FreeNAS is not able to monitor drive health or reconstruct any damaged data as ZFS has NO redundancy. Multi disk redundancy is how ZFS protects data from corruption. The hardware RADI controller is not the answer.

Please refer to these guides when selecting hardware, paying particular attention to disk controllers as ZFS needs direct access to multiple drives to compute checksums on the data:



Despite the age of these guides, the information is still applicable. FreeNAS does NOT need the latest hardware. It needs hardware that can be relied upon to deliver correct data.

You should also take time to review these guides:



The storage pool (made up of vdevs) in ZFS is where redundancy is created. Never rely on a single disk, not even for the boot drive.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
FreeNAS also uses the storage array disks for swap space. It sets aside a 2GB partition per disk and makes mirrors from the space provided by up to ten disks. With only one “disk” supplied to FreeNAS it only has one partition for swap and was not able to create a mirror for swap space. This may have your system in a situation where it needs swap and has none, which could be the cause of the crash.

Just because it looks like it is working, that doesn’t mean it is working right.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,909
@Gr8Britton , may I suggest you search the forums a bit on the subject of hardware RAID. You will find plenty of people who have been in exactly the same situation as you. And while of course not all of them have had data losses, this is not due this setup being reliable. It is just that none of the many possible problems that can occur on the various hardware and software levels has hit them so far. If you like to live with that risk, everything is fine. Otherwise, you are waiting for a disaster to happen.
 

Gr8Britton

Dabbler
Joined
Aug 13, 2020
Messages
12
@Gr8Britton , may I suggest you search the forums a bit on the subject of hardware RAID. You will find plenty of people who have been in exactly the same situation as you. And while of course not all of them have had data losses, this is not due this setup being reliable. It is just that none of the many possible problems that can occur on the various hardware and software levels has hit them so far. If you like to live with that risk, everything is fine. Otherwise, you are waiting for a disaster to happen.

"...I will take this all into account when I do have a chance to rebuild the NAS. "
 

Gr8Britton

Dabbler
Joined
Aug 13, 2020
Messages
12
Update: We came in today and the NAS was down. This time, we were able to see "re0: Watchdog timeout" errors on the screen. A quick search found that this is a common error with RealTek NICs, which is what is onboard. So, I'll try some Intel NICs when we rebuild the NAS this week with an Intel MB.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Actually, you may be able to keep the system limping along until you replace the motherboard. Is there a BIOS setting to disable the watchdog?
 

Gr8Britton

Dabbler
Joined
Aug 13, 2020
Messages
12
Actually, you may be able to keep the system limping along until you replace the motherboard. Is there a BIOS setting to disable the watchdog?

I rebooted and checked the BIOS but could not find anything for watchdog. It listed the RealTEK info, driver ver and etc but it was all read only. Thanks for the suggestion, though!
 

Gr8Britton

Dabbler
Joined
Aug 13, 2020
Messages
12
Installed a GA-Z97-HD3 (rev. 2.1) with a Quad Core i5 CPU. Added 16GB of DDR3 RAM (Will try to upgrade to 32GB later.)
Setup the Disks on the RAID Controller so they would show up as 16 individual disks to TrueNAS.
Currently using the onboard RealTek NIC because FreeNAS/TrueNAS (tried both) would not see the Intel NICs (Intel Gigabit CT Desktop Adapters).
Settled on TrueNAS 12.1 since I was starting over from scratch.

Setup a RAIDz3 with all 16 HDDs and made it available to my servers via iSCSI. All is working and we'll see how it goes as I copy back my data from an offsite NAS.

Any suggestions re: the NICs? They light up for power and link but never show up in TrueNAS.
 

Gr8Britton

Dabbler
Joined
Aug 13, 2020
Messages
12
Just to update:
I upgraded the RAM to 32GB with mfr preferred RAM for the MB. I added an Intel 4-port card recommended in the HW compatibility. It sees all 4 port and the RAM. Found a drive has some bad sectors, so working on repairing that. Otherwise, all recovered. Thank you!
 
Top