TrueNAS crash every few hours and reboot

Kienaba

Explorer
Joined
May 24, 2022
Messages
52
Hello,

I added a screenshot where you can see, that my Truenas crash every few hours. In the past weeks it was like every 3-6 days. Not it is every few hours. And before it changed to every few hours Truenas wasn't able to reboot and load every services. I saw the computer is running, but I was not able to connect to Truenas. Now this problem is gone, but Truenas still reboot every few hours. After few minutes I am able to connect to Truenas again. But this is a big problem, because I always need to enter some encryption keys. I don't want to do this every few hours.

I also checked the logs, but I can't find any error. This is crazy!

And then I checked the router log but I just see that truenas just disconnect and connected again:
Code:
Sep  5 22:29:43 kernel: eth3 (Ext switch port: 2) (Logical Port: 10) (phyId: a) Link DOWN.
Sep  5 22:30:06 kernel: eth3 (Ext switch port: 2) (Logical Port: 10) (phyId: a) Link UP at 1000 mbps full duplex


So internet and routing is working fine.

The only thing I see is this in the error log:

Code:
Sep  6 12:40:34 truenas kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.GPP4.WLAN], AE_NOT_FOUND (20210730/dswload2-162)
Sep  6 12:40:34 truenas kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20210730/psobject-220)
Sep  6 12:40:34 truenas blkmapd[904]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
Sep  6 12:40:35 truenas kernel: Error: Driver 'pcspkr' is already registered, aborting...
Sep  6 12:40:53 truenas systemd-modules-load[3605]: Failed to find module 'nvidia-drm'
Sep  6 12:41:03 truenas systemd-tmpfiles[4108]: "/var/log" already exists and is not a directory.
Sep  6 12:41:03 truenas smartd[4182]: Device: /dev/nvme0n1, number of Error Log entries increased from 30 to 31
Sep  6 12:41:03 truenas systemd[1]: Failed to start nslcd.service - LSB: LDAP connection daemon.
Sep  6 12:41:03 truenas smartd[4182]: Device: /dev/nvme0n1, number of Error Log entries increased from 30 to 31
Sep  6 12:41:03 truenas smartd[4182]: Device: /dev/nvme1n1, number of Error Log entries increased from 30 to 31
Sep  6 12:41:04 truenas smartd[4182]: Device: /dev/nvme1n1, number of Error Log entries increased from 30 to 31
Sep  6 12:41:04 truenas smartd[4182]: Device: /dev/nvme2n1, number of Error Log entries increased from 30 to 31
Sep  6 12:41:04 truenas smartd[4182]: Device: /dev/nvme2n1, number of Error Log entries increased from 30 to 31
Sep  6 12:41:15 truenas libvirtd[4873]: invalid argument: cannot find architecture arm
Sep  6 12:41:16 truenas haproxy[6473]: backend be_14 has no server available!
Sep  6 12:41:17 truenas haproxy[6473]: backend be_26 has no server available!
Sep  6 12:41:19 truenas haproxy[7203]: backend be_26 has no server available!
Sep  6 12:41:19 truenas ntpd[4278]: bind(25) AF_INET6 fe80::2a0:98ff:fe1c:1f22%3#123 flags 0x11 failed: Cannot assign requested address
Sep  6 12:41:19 truenas ntpd[4278]: unable to create socket on macvtap0 (6) for fe80::2a0:98ff:fe1c:1f22%3#123


I hope someone can help me... this is so bad. I don't want to check my Truenas every day. I just want that it will run for years without any problem. With the qnap nas I had no problems, but it was just too slow. And I just run 2 vms on the truenas server. One for PiHole (unencrypted) and one for Nextcloud (encrypted).

I added all debug files. I really hope someone can help me. I just want to have my Nextcloud :(
 

Attachments

  • 1694001575788.png
    1694001575788.png
    46.9 KB · Views: 100
  • debug-truenas-20230906140223.tgz
    3.6 MB · Views: 62

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Please list all your hardware, and include how it is wired up. Plus, what version of TrueNAS, (it may be in the debug file, but I am not downloading it...).


Next, some hardware is simply not suitable for a server. For example, over-clocked CPUs or memory is a no-no for a 24x7 server. Some gamer boards may default to over-clocking.

Further, some people try enabling power save, which may or may not work well for some system boards & CPU combinations. So, leaving it at default of no power save, (or even enabling performance mode), may be the solution.

Last, some BIOS settings don't work well with TrueNAS. So you may try defaulting your BIOS settings, and then only enabling ones that make sense. (For example, if you use a NVMe for boot, but by default the PCIe lanes are on a PCIe slot, you would want to re-transfer those lanes back to the NVMe slot.)
 
Last edited:

Kienaba

Explorer
Joined
May 24, 2022
Messages
52
My Specs:

Server: https://geizhals.de/gigabyte-brix-extreme-gb-ber7hs-5700-a2832407.html
Storage: 1x PCIe 3.0 M.2 SSD 1TB (System) and 2x PCIe 3.0 M.2 SSD 2TB (Data, Raid)
RAM: 2x 32GB DDR4 RAM https://geizhals.de/kingston-fury-i...kf432s20ibk2-64-a2599275.html?hloc=at&hloc=de
Connection: LAN cable 1GBit to my router, power supply and one 4 TB SSD in the USB port for backup
Version: TrueNAS-SCALE-22.12.3.3

BIOS settings are default. I don't think its the reason. Because I have not changed anything in the past 2 month. And it started with a crash, where the server is still running, but vms and Truenas itself were not reachable. And now it will crash every day, but its just a reboot and vms are starting and everything is reachable again. But I always need to enter the encryption key after a reboot. But I will check the BIOS settings, when I have time.

But it make no sense, that there is no log about the error which is the reason for this reboot and crash...
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Defective hardware may cause such a situation. One common part for such a defect is RAM. One thing you could do is power down your server and reboot it with memtest. Let the test run for hours and complete each tests at least a few times. See if it detects anything wrong with your memory.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I second the memory test. Crashes based on memory faults, (without ECC memory), will likely not generate any log entries.

Miniature computers may have heat problems running as a server 24x7. This can cause stress on various parts, like solder joints, connectors and such, leading to crashes.

A TrueNAS server should have both ZFS scrubs enabled and SMART tests enabled to improve data reliability. However, a ZFS scrub can heat up the storage devices since it reads every used block on all media. Even on NVMe SSDs.

There appears to be "noise" options based on how much performance you need. (Fan speed?) If it relates to fan speed, you might try the fastest fan speed for a few weeks and see if the problem goes away. Or you might try an external fan pointing over the computer. Or even both.

Crashes caused by heat, can also occur without log entries. Most modern CPUs will throttle speed to avoid over-heating, so those may show up in logs. But, they would not cause a crash, (except in a faulty CPU).


To sum up, consumer computers that are used as servers, can crash without any indication of why they crashed. Data center servers have more extensive monitoring through an integrated management device, (aka IPMI / BMC / SP / SC). Thus, you might get an answer from them, where you may not get an answer from consumer hardware.
 
Last edited:

Kienaba

Explorer
Joined
May 24, 2022
Messages
52
Thank you. I will try everything in few days. But tbh I don't really want to do this, because I just want to have my Cloud. So maybe I just install Proxmox and hopefully this problem will go away. I don't want to debug... I ran my Cloud on the Qnap system for years without any problem. But Qnap is too slow... And I don't really need a raid. I mean when I do backups every week, its also fine. When a drive crash, then I will buy a new one and use the backup. But there is a problem with the encryption on Proxmox. Its not integrated... :( Also I am not sure if its so easy to create snapshots of a vm on a external drive... but these are questions for the Proxmox community.
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
I just install Proxmox and hopefully this problem will go away. I don't want to debug..
If it is the hardware that is defective, it will be defective for ProxMox too. If the hardware is fine, the test will confirm it. So Proxmox or TrueNAS, solid and reliable hardware is a must.
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
But there is a problem with the encryption on Proxmox. Its not integrated...
Careful here. For most people, such encryption turns to a self-inflicted ransomware more than anything else...
 

Kienaba

Explorer
Joined
May 24, 2022
Messages
52
If it is the hardware that is defective, it will be defective for ProxMox too. If the hardware is fine, the test will confirm it. So Proxmox or TrueNAS, solid and reliable hardware is a must.
You are correct, yes. But I can't believe this. Because I never had defective hardware before. And the hardware in this server is new. But anyway, I will check this.
 

Kienaba

Explorer
Joined
May 24, 2022
Messages
52
Careful here. For most people, such encryption turns to a self-inflicted ransomware more than anything else...
Yes, thanks. I always use encryption, so it won't be a big deal for me. But it would be nicer, when its integrated in the UI.
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
2x PCIe 3.0 M.2 SSD 2TB (Data, Raid)
What do you mean by raid?
 

Kienaba

Explorer
Joined
May 24, 2022
Messages
52
One thing you could do is power down your server and reboot it with memtest.
Okay... Memtest is running and so far it look like this. I think it should be normal, that there is no error, right? So my RAM is defect?

Maybe I picked the wrong RAM and brand. I use this: https://geizhals.de/kingston-fury-i...kf432s20ibk2-64-a2599275.html?hloc=at&hloc=en I bought this one because this ram has the best timings. But maybe Kingston is not the best brand?

Which RAM you would choose? These are possible: https://geizhals.de/?cat=ramddr3&v=...768~15903_DDR4~15903_SO-DIMM~256_2x~5015_3200 Maybe Crucial is good?
 

Attachments

  • 1694272995341.png
    1694272995341.png
    1.8 MB · Views: 105

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
they are running in a raid system
That's a BIG no, and potentially the cause of your issues. You cannot use hardware RAID with software RAID (ZFS).
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Please read the resources that are linked to you and use proper terminology.
 

Kienaba

Explorer
Joined
May 24, 2022
Messages
52
I don't know what you mean. Its pretty normal to have a raid system with Truenas. :grin: I don't get it...

But I think the RAM is the problem. Can someone confirm? And which RAM provider you suggest? I have linked some possible RAM sticks. Thank you!
 

Attachments

  • 1694275445412.png
    1694275445412.png
    1.3 MB · Views: 97

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I don't know what you mean. Its pretty normal to have a raid system with Truenas.
It's not, because TrueNAS does not implement RAID. It uses ZFS which provides mirroring and RAIDZ at the vdev level. RAIDZ is not RAID. Not at all.
 

Kienaba

Explorer
Joined
May 24, 2022
Messages
52
It's not, because TrueNAS does not implement RAID. It uses ZFS which provides mirroring and RAIDZ at the vdev level. RAIDZ is not RAID. Not at all.
Ok, whatever. My bad. But I am a noob and this is what I meant. But this is not the problem.
 
Top