Local snapshot retention task fails when remote host is unavailable

CJRoss · Jul 27, 2023

I have a bunch of periodic snapshot and replication tasks configured with different naming schemes and retention timings. Today I noticed that none of my daily snapshots are being cleaned up. New ones are created, but nothing is being deleted.

Looking in the zettarepl log, I see a retention task from a few days ago called zettarepl.zettarepl ran and deleted the various daily snapshots. After that, every time I see a retention zettarepl.zettarepl task it says that "Local retention failed: error listing snapshots" on a remote host. Then it shows zettarepl connecting to other remote hosts but not destroying any snapshots.

The fact that zettarepl can't list snapshots on that particular remote host is to be expected as it's down for maintenance. What's unusual is that this appears to prevent all snapshot cleanup, even ones that aren't associated with that server or a replication task at all. I've taken other hosts down for maintenance and not had this happen.

I believe this has something to do with the fact that the replication task is a pull from the down host while all of my other tasks are push replication. I'm temporarily standing up the down host to see if snapshots are properly deleted tonight.

Can anyone else replicate this issue or is it a bug unique to my setup?

winnielinnie · Jul 27, 2023

It could be a safety measure to prevent the destruction of base snapshots needed for an incremental replication to your server.

If zettarepl cannot confirm with both sides, then perhaps it defaults to skipping any destructions. Not sure why this isn't the case for your other (push) replication tasks.

CJRoss · Jul 27, 2023

winnielinnie said:
It could be a safety measure to prevent the destruction of base snapshots needed for an incremental replication to your server.

If zettarepl cannot confirm with both sides, then perhaps it defaults to skipping any destructions. Not sure why this isn't the case for your other (push) replication tasks.

That's probably the case and I have no problem with that. I expect it to error out when dealing with snapshots particular to the down host.

What appears to be a bug to me is the fact that the down host error causes all snapshot deletions to fail.

CJRoss · Jul 28, 2023

I can confirm that having the machine down is what was causing the problem. Last night all of the snapshots deleted appropriately. What's interesting is that the locals were deleted first and then all of the remotes were deleted. After that, it created the new snapshots and then pushed those.

Can anyone else replicate this bug? Not sure if it's a zettarepl or TrueNAS issue.

Important Announcement for the TrueNAS Community.

Local snapshot retention task fails when remote host is unavailable

CJRoss

Contributor

winnielinnie

MVP

CJRoss

Contributor

CJRoss

Contributor

Similar threads

Important Announcement for the TrueNAS Community.

Local snapshot retention task fails when remote host is unavailable

CJRoss

Contributor

winnielinnie

MVP

CJRoss

Contributor

CJRoss

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Local snapshot retention task fails when remote host is unavailable"

Similar threads