TrueNAS SCALE 21.08 unexpected reboot

majinjing3

Dabbler
Joined
Sep 2, 2021
Messages
19
Alerts like:
  • truenas.home had an unscheduled system reboot. The operating system successfully came back online at Mon Sep 20 17:51:36 2021.
  • truenas.home had an unscheduled system reboot. The operating system successfully came back online at Wed Sep 22 15:06:36 2021.
  • truenas.home had an unscheduled system reboot. The operating system successfully came back online at Wed Sep 22 20:04:35 2021.
SCALE reboots many time every day...

There is only one panic found, I have reported here: https://jira.ixsystems.com/browse/NAS-112439

Others reboot is unexpected. I can not find any useful info.

How to debug the unexpected reboot issue?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Alerts like:
  • truenas.home had an unscheduled system reboot. The operating system successfully came back online at Mon Sep 20 17:51:36 2021.
  • truenas.home had an unscheduled system reboot. The operating system successfully came back online at Wed Sep 22 15:06:36 2021.
  • truenas.home had an unscheduled system reboot. The operating system successfully came back online at Wed Sep 22 20:04:35 2021.
SCALE reboots many time every day...

There is only one panic found, I have reported here: https://jira.ixsystems.com/browse/NAS-112439

Others reboot is unexpected. I can not find any useful info.

How to debug the unexpected reboot issue?
Always start with your hardware description..... was everything operating normally before, when did the rebooting start?
 

majinjing3

Dabbler
Joined
Sep 2, 2021
Messages
19
Always start with your hardware description..... was everything operating normally before, when did the rebooting start?

Motherboard: ASUS Z590M Plus
CPU: i7 11700k
GPU: Radeon RX 560
NVME: 2* WD SN550, 1* WD SN850, 1* ADATA XPG SX8200 Pro
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Is it a new system?
When did it start rebooting... what was the likely event?
 

majinjing3

Dabbler
Joined
Sep 2, 2021
Messages
19
ep 23 02:57:23 truenas kernel: amdgpu 0000:06:00.0: [drm] Cannot find any crtc or sizes Sep 23 02:57:41 truenas env[41452]: E0923 02:57:41.486800 41452 network_services_controller.go:151] Failed to replace route to service VIP 192.168.0.20 configured on kube-dummy-if. Error: exit status 2, Output: RTNETLINK answers: File exists Sep 23 02:57:41 truenas env[41452]: E0923 02:57:41.488009 41452 network_services_controller.go:151] Failed to
replace route to service VIP 192.168.0.20 configured on kube-dummy-if. Error: exit status 2, Output: RTNETLINK answers: File exists Sep 23 02:57:41 truenas env[41452]: E0923 02:57:41.488997 41452 network_services_controller.go:151] Failed to replace route to service VIP 192.168.0.20 configured on kube-dummy-if. Error: exit status 2, Output: RTNETLINK answers: File exists Sep 23 02:57:53 truenas kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000). Sep 23 02:57:53 truenas kernel: [drm] UVD and UVD ENC initialized successfully. Sep 23 02:57:53 truenas kernel: [drm] VCE initialized successfully. Sep 23 02:57:53 truenas kernel: amdgpu 0000:06:00.0: [drm] Cannot find any crtc or sizes
Sep 23 02:57:59 truenas systemd[1]: Starting Certbot... Sep 23 02:58:00 truenas systemd[1]: certbot.service: Succeeded. Sep 23 02:58:00 truenas systemd[1]: Finished Certbot.
Sep 23 02:58:23 truenas kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 23 02:58:23 truenas kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 23 02:58:23 truenas kernel: [drm] VCE initialized successfully. Sep 23 02:58:23 truenas kernel: amdgpu 0000:06:00.0: [drm] Cannot find any crtc or sizes
(by: jim, newly start here)
ep 23 03:00:36 truenas syslog-ng[8633]: syslog-ng starting up; version='3.28.1' Sep 23 02:59:51 truenas kernel: microcode: microcode updated early to revision 0x40, date = 2021-04-11 Sep 23 02:59:51 truenas kernel: Linux version 5.10.42+truenas (root@tnsbuilds01.tn.ixsystems.net) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Mon Aug 30 21:54:59 UTC 2021 Sep 23 02:59:51 truenas kernel: Command line: BOOT_IMAGE=/ROOT/21.08-BETA.1-2021-09-12@/boot/vmlinuz-....
 

Attachments

  • IMG_20210923_133208.jpg
    IMG_20210923_133208.jpg
    238.4 KB · Views: 222

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Can you go back to 21.06... make sure its not rebooting, then upgrade again.
If the rebooting restarts, then" report a bug" seems like the right course of action.
 

majinjing3

Dabbler
Joined
Sep 2, 2021
Messages
19
Can you go back to 21.06... make sure its not rebooting, then upgrade again.
If the rebooting restarts, then" report a bug" seems like the right course of action.
Thanks.
I guess that the suddenly reboot is cause by overheating from PCH chip. I'm verifying it.
 

Jcem

Cadet
Joined
Oct 16, 2021
Messages
3
Alerts like:
  • truenas.home had an unscheduled system reboot. The operating system successfully came back online at Mon Sep 20 17:51:36 2021.
  • truenas.home had an unscheduled system reboot. The operating system successfully came back online at Wed Sep 22 15:06:36 2021.
  • truenas.home had an unscheduled system reboot. The operating system successfully came back online at Wed Sep 22 20:04:35 2021.
SCALE reboots many time every day...

There is only one panic found, I have reported here: https://jira.ixsystems.com/browse/NAS-112439

Others reboot is unexpected. I can not find any useful info.

How to debug the unexpected reboot issue?

I also have frequent reboot problem. Don't look at the logs, but the fact that you restart mesmk without any activity, not even scheduled (cron) is surprising. Temperature is not the problem as I checked it with an infrared thermometer, CPU with ~50°C and chipset with ~38°C.

CPU: Xeon E5-2660v3
Mothrboard: Generic Chinese X79
RAM: 32Gb, 2x 16Gb Athermiter
HD: 3x Seagate IronWolf


It started to reboot just after the early October update.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
If you rollback or load a previous release of software, you can confirm whether the software update is the culprit.
Its's not a common problem, so hard to workout the cause is.
 
Top