- Joined
- Apr 16, 2020
- Messages
- 2,947
The QNAS in my sig started rebooting last night and I don't know why. This NAS is a replication target from my main NAS and is part of my backup strategy. There is no ECC (its an old repurposed QNAP)
Step 1: Run memtest on the box for a few passes - no issues detected
Step 2: Revert to older version of TN (I have recently upgraded to latest version) - NAS rebooting.
OK - so its either hardware (it is old, although the disks are mostly new) or summat else going on. So what is going on.
On reflection - the NAS seems stable, until I kick of a replication task.
SO I kick of a small, no real changes task - it works
I kick off some others - they work
I kick off the big task - the whole dataset (including child datasets) is 25TB and there are regular, fairly significant, changes going on. NAS reboots after 5-10 seconds - this is repeatable
So I zfs rename the old target dataset, create a new one and then kick off the replication again - and now it seems to be working although given its a 1Gb NIC its gonna take a while to finish a complete replication again and whilst I do have 25TB of spare space (just) that will bing me to 98-99% full. So I will have to delete as I go along
I have even kicked off all the replication jobs and they are running (slowly) and the NAS is staying up.
A scrub showed no issues with the pool and I am scrubbing the source pool as well (currently says 38 years - but I am hoping that will shrink rapidly)
I is confused - and not sure what to make of this - looking for ideas
Step 1: Run memtest on the box for a few passes - no issues detected
Step 2: Revert to older version of TN (I have recently upgraded to latest version) - NAS rebooting.
OK - so its either hardware (it is old, although the disks are mostly new) or summat else going on. So what is going on.
On reflection - the NAS seems stable, until I kick of a replication task.
SO I kick of a small, no real changes task - it works
I kick off some others - they work
I kick off the big task - the whole dataset (including child datasets) is 25TB and there are regular, fairly significant, changes going on. NAS reboots after 5-10 seconds - this is repeatable
So I zfs rename the old target dataset, create a new one and then kick off the replication again - and now it seems to be working although given its a 1Gb NIC its gonna take a while to finish a complete replication again and whilst I do have 25TB of spare space (just) that will bing me to 98-99% full. So I will have to delete as I go along
I have even kicked off all the replication jobs and they are running (slowly) and the NAS is staying up.
A scrub showed no issues with the pool and I am scrubbing the source pool as well (currently says 38 years - but I am hoping that will shrink rapidly)
I is confused - and not sure what to make of this - looking for ideas