All,
I'm looking for more experienced minds to take a look at these circumstances. After almost a year of ZFS replication offsite between two hosts, I'm seeing an unusual error from ssh during encrypted ZFS replication. It is:
Fssh_ssh_dispatch_run_fatal: Connection to IP_ADDRESS port PORT_NUMBER: message authentication code incorrect
Of course, I've redacted the IP address and port number. What I can report:
I'm looking for more experienced minds to take a look at these circumstances. After almost a year of ZFS replication offsite between two hosts, I'm seeing an unusual error from ssh during encrypted ZFS replication. It is:
Fssh_ssh_dispatch_run_fatal: Connection to IP_ADDRESS port PORT_NUMBER: message authentication code incorrect
Of course, I've redacted the IP address and port number. What I can report:
- Since January 2017, SSH keys have been unchanged on these two systems and I've observed no authentication problems between them.
- I upgraded to FreeNAS 11.0-U4 immediately upon release, which appears to be September 25th
- SSH keys are 4096 bits, and CPUs are barely burdened during the transfer (1-4% total utilization across all cores)
- The last ZFS snapshot appearing on the destination is dated October 19th
- I have two snapshot tasks on the source: daily, retained for a week. Weekly, retained for 3 months.
- Daily snapshots average 500MB. Weekly snapshots average 15GB.
- Link speed of source: 80 megabits/sec symmetric; destination: 110 megabits/sec downstream, 30 megabits/sec upstream
- I can only observe this error when I run zfs send and pipe it to SSH manually
- Sometimes this error will appear within minutes of starting zfs send, other times it will take up to four hours to happen
- I have defined no additional options in the FreeNAS configuration for SSH
- I noticed this error after receiving a flurry of "replication failed" emails
- The problem occurs in both directions.
- Replication of smaller snapshots (kilobytes to single-digit megabytes) are uneventful.
- I have had problems in the past with connection persistence between the hosts - but this is new: I've never observed the SSH error, only a broken pipe, several restarts, and finally a successful replication.
- Turning off encryption on the replication task results in endless restarting of the replication task, presumably due to a broken connection. I'm not familiar with how to manually initiate a zfs send without piping it over SSH
Last edited: