Replication will not send corrupt data - resends over and over

amorton12 · May 19, 2016

My pool had two disks (3T Barracudas, surprise

) start throwing uncorrectable sector errors, so I replaced the one that was worst. I also noticed that replication hadn't been running since I updated the push machine from 9.3 to to 9.10 a few weeks ago. During the resilver, I ended up with some minor data corruption, and I am trying to make sure I can "save" the rest in case the other drive craps out before the resilver completes since its error count keeps going up.

I updated my replication pull target to 9.10 and moved the system dataset to make replication run again, but the one affected snapshot fails to send and replication just runs over and over.

This is the state of the pool. I appear to have the rest of the data replicated now, but even after I swap the disks and save the pool, I will still need to fix this somehow.

Code:

[root@freenas] ~# zpool status -v zfspool
  pool: zfspool
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu May 19 06:30:07 2016
        2.85T scanned out of 4.01T at 79.4M/s, 4h14m to go
        612G resilvered, 71.08% done
config:

        NAME                                            STATE     READ WRITE CKSUM
        zfspool                                         ONLINE       0     0     8
          raidz1-0                                      ONLINE       0     0    16
            gptid/ae451901-b3d1-11e4-b68a-001e4fb0f51d  ONLINE       0     0     0
            ada1                                        ONLINE       0     0     0  (resilvering)
            gptid/1d7db6e9-add2-11e2-ab62-525400390d09  ONLINE       8     0     0
            gptid/1e1b3145-add2-11e2-ab62-525400390d09  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        zfspool/home@auto-20160516.1822-7d:/path/to/badfile.gz

I continually get emails with this:

Code:

Hello,
    The replication failed for the local ZFS zfspool/home while attempting to
    apply incremental send of snapshot auto-20160515.1822-7d -> auto-20160516.1822-7d to 10.0.0.52

Is there something like zfs_send_corrupt_data tunable for ZFSoL that will override this behavior and allow me to send the rest of the intact data?
Can I tell ZFS to ignore the corrupt file in the snapshot and carry on?
If I can stop the replication in order to unlock and delete the affected snapshot and its children, how would I do that?

dlavigne · May 23, 2016

AFAIK the only way to fix this is to check the "Delete stale snapshots on remote system" box on the replication task. Note that this will delete all of the snapshots on the remote system which will require a full replication.

amorton12 · May 23, 2016

I checked, and that option is enabled for my replication job already. I seem to remember something about initializing the remote side when I set this up on 9.3, but I don't see that option anywhere since I switched to 9.10.

dlavigne · May 23, 2016

That option was renamed to "Delete stale snapshots..."

amorton12 · May 23, 2016

Gotcha. Since that's enabled, is this a bug? I'm not entirely sure what the expected behavior is if that doesn't do it.

dlavigne · May 23, 2016

Me neither so probably wouldn't hurt to create a bug at bugs.freenas.org. If you do, post the issue number here.

amorton12 · May 24, 2016

I just realized I posted the wrong version number. I am running 10.3, not 9.10. 9.10 is what I just upgraded from.

dlavigne · May 24, 2016

I assume you mean 9.3? If so, that is a downgrade from 9.10.

amorton12 · May 24, 2016

No, I upgraded from 9.10 to 10.3, the newest version.

Code:

FreeBSD 10.3-RC3 (FreeNAS.amd64) #0 86b9b91(freebsd10): Mon Mar 21 17:43:20 PDT 2016

amorton12 · May 24, 2016

Currently, the pool looks like this, presumably because the snapshot rolled off due to age.

Code:

[root@freenas] ~# zpool status -v zfspool
  pool: zfspool
state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 44K in 22h34m with 1 errors on Mon May 23 06:29:11 2016
config:

        NAME                                            STATE     READ WRITE CKSUM
        zfspool                                         ONLINE       0     0   289
          raidz1-0                                      ONLINE       0     0   578
            gptid/ae451901-b3d1-11e4-b68a-001e4fb0f51d  ONLINE       0     0     0
            ada1                                        ONLINE       0     0     0
            gptid/1d7db6e9-add2-11e2-ab62-525400390d09  ONLINE     289     0     0
            gptid/1e1b3145-add2-11e2-ab62-525400390d09  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <0x230b>:<0x29927>

dlavigne · May 24, 2016

That's just the underlying FreeBSD version. What is the actual FreeNAS "Build" (from System -> Information)?

amorton12 · May 24, 2016

amorton12 · May 24, 2016

9.10, so I was correct before. Sorry for the confusion.

Sakuru · May 24, 2016

Do you have any updates available under System --> Update? If I recall correctly, 9.10-RELEASE is the first version of 9.10.

amorton12 · May 24, 2016

Yes, it does offer me some updates, but I didn't see anything in the changelog emails it sends that appeared to reference this. I had planned to update it anyway, but it will be a while before I can schedule downtime to update this machine.

I have filed this issue in the bug tracker:
https://bugs.freenas.org/issues/15532

amorton12 · May 24, 2016

That got quickly marked as "Behaves correctly", with the note that I need to remove the offending data. Since it did that for me, I went ahead and started a scrub. As soon as that started, the pool began reporting no errors.

Important Announcement for the TrueNAS Community.

Replication will not send corrupt data - resends over and over

amorton12

Dabbler

dlavigne

Guest

amorton12

Dabbler

dlavigne

Guest

amorton12

Dabbler

dlavigne

Guest

amorton12

Dabbler

dlavigne

Guest

amorton12

Dabbler

amorton12

Dabbler

dlavigne

Guest

amorton12

Dabbler

amorton12

Dabbler

Sakuru

Guru

amorton12

Dabbler

amorton12

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Replication will not send corrupt data - resends over and over

Dabbler

dlavigne

Guest

Dabbler

dlavigne

Guest

Dabbler

dlavigne

Guest

Dabbler

dlavigne

Guest

Dabbler

Dabbler

dlavigne

Guest

Dabbler

Dabbler

Guru

Dabbler

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replication will not send corrupt data - resends over and over"

Similar threads