Replication problem, and deleting a dataset

Status
Not open for further replies.

praecorloth

Contributor
Joined
Jun 2, 2011
Messages
159
Hey everyone. I ran into a problem similar to https://forums.freenas.org/index.php?threads/replication-problem-delete-a-cloned-snapshot.6625/ and I found a solution that worked for me. I didn't want to necro the thread, though.

In my case, a replication task failed, and the dataset seemed to become corrupt. I couldn't list snapshots anymore, when I specified that dataset, it said that the operation was not valid for that type of dataset. At this point, there were a few errors that had crept up, and we wanted to nuke and pave this particular dataset. Well, when we tried to destroy the dataset, in the GUI or at the CLI, it said it couldn't destroy the dataset because the "dataset already exists."

A few things I ran across in troubleshooting this. First, the link above states we can use,

Code:
zdb -d <poolname> | grep % 

to find a cloned snapshot. I had trouble with this on every FreeNAS box I've encountered. The pools created via the FreeNAS GUI do not show up with the zdb command. But you can do,

Code:
zdb -d -e <poolname> 

and they will give you the information you're looking for.

In my case, this didn't help, because there were no snapshots on the dataset anymore. For some reason, the sending FreeNAS box decided that all of the snapshots on the receiving side were out of date, so it destroyed them all (which is weird because we had 4 days left on the newest hourly snapshots before they became out of date).

In the end, the problem was relatively simple. Even after I had disabled the sending replication task, and it failed to replicate, it was still trying even though it said it wasn't running. I only know this because all of my sending FreeNAS boxen replicate via a dedicated user. So in top I saw that this particular replication user still had ssh, zfs, and lz4 processes running in its name. I sent a standard kill signal to the ssh process ID, and all 3 processes went away.

At this point I was so set on destroying the dataset, that attempting to destroy it was the way I checked to see if I had control over the dataset again. If I had thought about it for just a moment, I should have gone back to see if the snapshots showed up. Six different kinds of my bad. I will pay for it by replicating all of that data across the internet again, and by posting my story here in the hopes that it helps someone.
 
Status
Not open for further replies.
Top