SOLVED Replication SSD -> HDD is failing

Status
Not open for further replies.

IceBoosteR

Guru
Joined
Sep 27, 2016
Messages
503
Morning guys,

I need your help as I have a very strange problem with replication and I have no guess whats the root cause and how to solve the problem.
So starting with the setup. I have various jails on SSD which I backup to my main HDD-based pool with ZFS send/receive, set up via GUI. both pools are in one system. At night a snapshot is taken and later on replicated to the other pool, but it is failing.
At the beginning of this, I tryied to solve it with deleting ALL snapshots on the SSD corresponding to the jails (except of the @clean snap as I could not delete it). I also deleted all the destination datasets with the corresponding snapshots, deleted the replication tasks and the snapshot task.
This procedure was working for about 1 month wihout any problem.
t this all up new. And it worked, @clean was transferred and the next two days also. But then it brokes again.
How can I fix this) What can I do, which logs do I have to access as I do net see any kind of reated error message except of:
Code:
Jan. 12, 2018, 1:18 a.m. - Replication SSD/jails -> 192.168.178.100:RED/Backup/SSD failed: Failed: SSD/jails/emby_1 (auto-20180108.0100-2w->auto-20180109.0100-2w)

I did not change anything on my system. I have changed the router in my house, not more.

Any help would be awesome. I am on 11.0-U4.
Cheers,
Ice
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
I did not change anything on my system. I have changed the router in my house, not more.

Are there any hand coded ip addresses involved in the local replications? Are those still correct?

I'm using "127.0.0.1" in the "Remote hostname" and "Remote hostkey" fields of my local replication jobs.
 

IceBoosteR

Guru
Joined
Sep 27, 2016
Messages
503
Are there any hand coded ip addresses involved in the local replications? Are those still correct?

I'm using "127.0.0.1" in the "Remote hostname" and "Remote hostkey" fields of my local replication jobs.
Hi,

I am using the 192.168.178.xxx address from the local network. I could change it to the link-local address, but it worked with the official NAS IP in the network, so did not touched it :)
Yes they are correct and initially and for 2 days it worked. DHCP is advised to give FreeNAS always the same IPv4.
 
Last edited by a moderator:

IceBoosteR

Guru
Joined
Sep 27, 2016
Messages
503
Anyone? :/
Where can I find replication related logs?
 
Last edited:

IceBoosteR

Guru
Joined
Sep 27, 2016
Messages
503
Error happens again.
But the error messages makes no sense to me...
That the one replication is not enabled is not the error. This is normal and done by me.
Got it from /var/log/debug.log
 

Attachments

  • replication.png
    replication.png
    6.6 KB · Views: 280
  • errors.txt
    36.8 KB · Views: 341

droeders

Contributor
Joined
Mar 21, 2016
Messages
179
I just did a quick glance at your errors.txt and it appears that a number of the snapshots you're attempting to replicate already exist on the destination. Here's one of the entries:
Code:
Sending zfs snapshot: /sbin/zfs send -V -p -i SSD/jails/MiniDLNA@auto-20180108.0100-2w SSD/jails/MiniDLNA@auto-20180109.0100-2w | /usr/local/bin/lz4c | /usr/local/bin/pipewatcher $$ | /usr/local/bin/ssh -i /data/ssh/replication -o BatchMode=yes -o StrictHostKeyChecking=yes -o ConnectTimeout=7 -p 22 192.168.178.100 "/usr/bin/env lz4c -d | /sbin/zfs receive -F -d 'RED/Backup/SSD' && echo Succeeded"

Replication result: cannot restore to RED/Backup/SSD/jails/MiniDLNA@auto-20180109.0100-2w: destination already exists
 
Last edited by a moderator:

IceBoosteR

Guru
Joined
Sep 27, 2016
Messages
503
I just did a quick glance at your errors.txt and it appears that a number of the snapshots you're attempting to replicate already exist on the destination. Here's one of the entries:
Code:
Sending zfs snapshot: /sbin/zfs send -V -p -i SSD/jails/MiniDLNA@auto-20180108.0100-2w SSD/jails/MiniDLNA@auto-20180109.0100-2w | /usr/local/bin/lz4c | /usr/local/bin/pipewatcher $$ | /usr/local/bin/ssh -i /data/ssh/replication -o BatchMode=yes -o StrictHostKeyChecking=yes -o ConnectTimeout=7 -p 22 192.168.178.100 "/usr/bin/env lz4c -d | /sbin/zfs receive -F -d 'RED/Backup/SSD' && echo Succeeded"

Replication result: cannot restore to RED/Backup/SSD/jails/MiniDLNA@auto-20180109.0100-2w: destination already exists
Hi @droeders
Yes I have seen this error later on also and could not explain this to myself how the heck this could happen within hte replication.
After a little bit more sleep I got the idea, that I maybe have a dataset (RED/Backup) which is doing recursive snapshots. So after some days I could confirm that this was the issue. Te recursive snapshot has been created as the same time as the snapshot for the jails on the SSDs, so they are getting the same name. When now the replication starts, this error makes sense. But before I had no guess.
Thanks :D
 
Status
Not open for further replies.
Top