TrueNAS Core Unscheduled Reboot

NugentS · Jul 21, 2023

The QNAS in my sig started rebooting last night and I don't know why. This NAS is a replication target from my main NAS and is part of my backup strategy. There is no ECC (its an old repurposed QNAP)

Step 1: Run memtest on the box for a few passes - no issues detected
Step 2: Revert to older version of TN (I have recently upgraded to latest version) - NAS rebooting.

OK - so its either hardware (it is old, although the disks are mostly new) or summat else going on. So what is going on.

On reflection - the NAS seems stable, until I kick of a replication task.
SO I kick of a small, no real changes task - it works
I kick off some others - they work
I kick off the big task - the whole dataset (including child datasets) is 25TB and there are regular, fairly significant, changes going on. NAS reboots after 5-10 seconds - this is repeatable

So I zfs rename the old target dataset, create a new one and then kick off the replication again - and now it seems to be working although given its a 1Gb NIC its gonna take a while to finish a complete replication again and whilst I do have 25TB of spare space (just) that will bing me to 98-99% full. So I will have to delete as I go along

I have even kicked off all the replication jobs and they are running (slowly) and the NAS is staying up.

A scrub showed no issues with the pool and I am scrubbing the source pool as well (currently says 38 years - but I am hoping that will shrink rapidly)

I is confused - and not sure what to make of this - looking for ideas

JohnDigital · Jul 21, 2023

Im not saying that its your issue but I have had horrible luck with Crucial SSDs. I got both of mine from amazon and they failed in seriously short order (like ~3 months) under high load in my plex rig with no warning other than the type of gremlins you describe. Random freezing/reboots/no boot, but not in a QNAP never touched one. Just BOLO. Good luck.

samarium · Jul 21, 2023

Replication of encrypted snapshots? Another instance of the existing forum thread where deleting a replicated encrypted snapshot causes a crash?

NugentS · Jul 22, 2023

Well - it took an hour or so, but the NAS rebooted itself several time. I guess that wasn't it.
As for the Crucial - I used that to replace an NVMe to USB bridge that I thought might be the issue.

NugentS · Jul 23, 2023

I guess replacing the PSU is my only option now - which will be a nusiance, assuming I can even get one

NugentS · Jul 24, 2023

samarium said:
Replication of encrypted snapshots? Another instance of the existing forum thread where deleting a replicated encrypted snapshot causes a crash?

~~One of the child datasets is encrypted - but the replication task hasn't got to that one yet. Its also been replicating just fine for a year or so~~

Actually, on consideration, there might be something to this. I will need to run some more tests

NugentS · Jul 24, 2023

@samarium it looks like you may be on to something. I cleared out all the snapshots (may have been a bit enthusiastic) and redid the replication. Its been running all day with several replications at the same time. I am waiting for the current set to finish before I retry the encrypted dataset

However the NAS has been stable - so it looks like software, not the hardware I thought initially

samarium · Jul 24, 2023

Interesting, from what I recall people saying, the server with received encrypted incremental snapshots was crashing when trying to delete snapshots as part of the replication or manully. So I don't know if this exactly the same bug. It may be related and you are lucky enough not to get a crash when you were cleaning out. There is a reference to a JIRA ticket, and to a github PR from iX somewhere around here, might be worth tracking down and reading. I suggest you use temporary pool for testing the encrypted dataset, so you can destroy the pool if it becomes infected and you are unable to delete snapshots without a crash. Even building a zvol and then building a temporary pool on the zvol for testing would seem to be safer than allowing potentially undeletable datasets onto the main pool. I would also be creating a small temporary dataset to replicate for testing, rather than your main data, since you now have an idea of what might be tested. You could even do it in a VM for further isolation, that is what I would be doing in this case.

NugentS · Jul 24, 2023

Well - that was underwhelming.
All (ignoring the largest that will take a week+) the non-encrypted datasets replicate just fine - destination NAS stays up
So I add the encrypted dataset - which replicates just fine as well.

I did delete everything at the destination - so it was starting a-fresh and I deleted all the snapshots at the source end. I guess its now wait and see what happens when snapshots start being deleted (2 months at the destination in this case). MIght change that for testing purposes once the big one has completed

[I need a faster destination NAS]

ikarlo · Jul 24, 2023

Hi,
following threads deal with the above mentioned issue:

TrueNAS SCALE 22.12.3.2 Restarting by itself every night

So this has been driving me absolutely crazy. My TrueNAS Scale 22.12.3.2 box has been restarting itself every night around 3 AM without any indication as to why it is happening. This is not an "unscheduled system reboot" either, as I don't get a notification like that once it starts back up...

www.truenas.com

Kernel Panic when trying to destroy dataset

Hi, I have 2 truenas scale's with the "truenas1" replicating 2 datasets to the "truenas2". This has worked without any issue for months and survived all the upgrades. Currently on 22.12.3.2. Yesterday I woke up to "truenas2" being in a reboot loop. Long story short I narrowed it down to of all...

www.truenas.com

Important Announcement for the TrueNAS Community.

TrueNAS Core Unscheduled Reboot

NugentS

MVP

JohnDigital

Guru

samarium

Contributor

NugentS

MVP

NugentS

MVP

NugentS

MVP

NugentS

MVP

samarium

Contributor

NugentS

MVP

ikarlo

Dabbler

TrueNAS SCALE 22.12.3.2 Restarting by itself every night

Kernel Panic when trying to destroy dataset

Similar threads

Important Announcement for the TrueNAS Community.

TrueNAS Core Unscheduled Reboot

MVP

Guru

Contributor

MVP

MVP

MVP

MVP

Contributor

MVP

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "TrueNAS Core Unscheduled Reboot"

Similar threads