What happens if URE is encountered during mirrored vdev resilver?

Status
Not open for further replies.

Zach

Cadet
Joined
Sep 23, 2013
Messages
3
I have a pool composed of 8 1TB WD RE4 drives. The URE rate is <1 in 10 to the 15th. I am currently resilvering on of the mirrored vdevs. Since there is about a .8% or so chance that a URE will be encountered during a resilver, I'd like to know what happens.

With RAID 5, that typically means, total failure of the array and restoring from backup.

What happens with ZFS when using mirrored vdevs?




Code:
cannot open '-v': name must begin with a letter
  pool: BIGPOOL
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Sep 23 09:21:33 2013
        1.31T scanned out of 1.99T at 189M/s, 1h2m to go
        331G resilvered, 65.94% done
config:
 
        NAME                                              STATE    READ WRITE CKSUM
        BIGPOOL                                          DEGRADED    0    0    0
          mirror-0                                        DEGRADED    0    0    0
            replacing-0                                  OFFLINE      0    0    0
              10298620699599008217                        OFFLINE      0    0    0  was /dev/gptid/52ce4171-1c89-11e3-9faa-002590c8d426
              gptid/10c05dfc-2453-11e3-af2a-002590c8d426  ONLINE      0    0    0  (resilvering)
            gptid/532208dc-1c89-11e3-9faa-002590c8d426    ONLINE      0    0    0
          mirror-1                                        ONLINE      0    0    0
            gptid/6ea02f36-1c89-11e3-9faa-002590c8d426    ONLINE      0    0    0
            gptid/6ef5b6b1-1c89-11e3-9faa-002590c8d426    ONLINE      0    0    0
          mirror-2                                        ONLINE      0    0    0
            gptid/9490bd8c-1c89-11e3-9faa-002590c8d426    ONLINE      0    0    0
            gptid/94e51f13-1c89-11e3-9faa-002590c8d426    ONLINE      0    0    0
          mirror-3                                        ONLINE      0    0    0
            gptid/b70b61f6-1c89-11e3-9faa-002590c8d426    ONLINE      0    0    0
            gptid/720a639f-2421-11e3-a39f-002590c8d426    ONLINE      0    0    0
 

Zach

Cadet
Joined
Sep 23, 2013
Messages
3
I'm specifically talking about if there was a URE from the one remaining drive in the degraded mirror. Would I just lose the block? Would the entire pool be caput?
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
You will not lose the pool. The resilver will still finish and zpool status -v will tell you which files are damaged. Being curious myself I did this to quickly simulate it:
  1. Create a mirror of two small drives
  2. Create a bunch of files filled with random data
  3. Offline one drive
  4. Export the pool
  5. Overwrite one sector on the remaining drive with random data
  6. Import the pool
  7. Start the replace operation
Result of zpool statatus -v:

[PRE][root@freenas] ~# zpool status -v t1
pool: t1
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: resilvered 5.04G in 0h0m with 1 errors on Mon Sep 23 23:49:52 2013
config:

NAME STATE READ WRITE CKSUM
t1 ONLINE 0 0 2
mirror-0 ONLINE 0 0 4
gptid/399dee07-2417-11e3-ac80-000c2913533e ONLINE 0 0 4
gptid/3c6c99aa-24e5-11e3-9f3c-000c2913533e ONLINE 0 0 2

errors: Permanent errors have been detected in the following files:

/mnt/t1/f1[/PRE]
There are some CHKSUM errors reported (I guess you would see READ errors in case of an URE), but the resilver still finished. The corrupted sector resulted in one error in file /mnt/t1/f1.
Trying to access the file (I created md5 sums before, so I wanted to compare md5 of the new file):

[PRE][root@freenas] ~# md5 /mnt/t1/f1
md5: /mnt/t1/f1: Input/output error[/PRE]
So, it seems ZFS will also generate an input/output error when you access the bad block to prevent you from accidentally using the bad data (the CHKSUM count in zpool status increases every time I access the file).
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
It ranges from single file corruption to corruption of the ZFS metadata. You really want to be careful because ZFS is designed with the expectation that you won't have corruption. As many people here have found, corruption that exceeds your redundancy is bad because the ZFS code often just won't mount your pool if things go bad. And suddenly you are locked out of your data forever.

So keep backups. :P
 
Status
Not open for further replies.
Top