How is this still alive?! Possible miracles of using Spare drives

JaimieV · Mar 28, 2020

I'm running a burnin period on a new set of disks. Shape is three RAIDZ1's of four 4TB disks, I was messing around with throughput for 10gigE use - this runs about 900meg/second r/w. I have a Spare drive also allocated to the pool.

Yesterday one of the disks popped, recovered, resilvered to the Spare, that completed, then popped again and died, and the Spare stepped in again. The pool was resilvering when I came to look this morning. But some worrying additional r/w errors are seen on other hdds in the same RAIDZ1:

Code:

root@Sisyphus:~ # zpool status
  pool: DataPool
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Mar 28 10:37:08 2020
    25.0T scanned at 3.82G/s, 19.2T issued at 2.93G/s, 25.0T total
    2.62G resilvered, 76.73% done, 0 days 00:33:48 to go
config:

    NAME                                              STATE     READ WRITE CKSUM
    DataPool                                          DEGRADED     2     0 5.48M
      raidz1-0                                        ONLINE       0     0     0
        gptid/7abf8085-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/8d7ccf17-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/94977190-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/9348eedb-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
      raidz1-1                                        DEGRADED     2     0 11.0M
        gptid/82f438bf-6a02-11ea-a922-a0369f4e18bc    DEGRADED     0     0     0  too many errors
        spare-1                                       DEGRADED     0     0 4.03K
          12077469904772203790                        REMOVED      0     0     0  was /dev/gptid/8d6b27be-6a02-11ea-a922-a0369f4e18bc
          gptid/014fbddf-6bcc-11ea-9cdb-a0369f4e18bc  ONLINE       0     0     0
        gptid/8f66452c-6a02-11ea-a922-a0369f4e18bc    FAULTED    114   693     0  too many errors
        gptid/8f771d80-6a02-11ea-a922-a0369f4e18bc    ONLINE     104   112     0
      raidz1-2                                        ONLINE       0     0     0
        gptid/8ad40550-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/8f548563-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/44ad6394-6aec-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/8ac32412-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
    spares
      4973883019692991099                             INUSE     was /dev/gptid/014fbddf-6bcc-11ea-9cdb-a0369f4e18bc

errors: 5744771 data errors, use '-v' for a list

I didn't want to swap out the spare with those other errors also in play, so I'm leaving that to go to completion. I also didn't want to pull a second disk in a RAIDZ1, of course that would be suicidal.

Instead I added another spare to take on that FAULTED drive:

Code:

root@Sisyphus:~ # zpool status
  pool: DataPool
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Mar 28 12:30:51 2020
    14.6T scanned at 471M/s, 9.82T issued at 1.20G/s, 25.0T total
    0 resilvered, 39.34% done, 0 days 03:36:12 to go
config:

    NAME                                              STATE     READ WRITE CKSUM
    DataPool                                          DEGRADED     2     0 5.48M
      raidz1-0                                        ONLINE       0     0     0
        gptid/7abf8085-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/8d7ccf17-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/94977190-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/9348eedb-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
      raidz1-1                                        DEGRADED     2     0 11.0M
        spare-0                                       DEGRADED     0     0     0
          gptid/82f438bf-6a02-11ea-a922-a0369f4e18bc  DEGRADED     0     0     0  too many errors
          gptid/cfe11c7a-70ef-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
        spare-1                                       DEGRADED     0     0 4.03K
          12077469904772203790                        REMOVED      0     0     0  was /dev/gptid/8d6b27be-6a02-11ea-a922-a0369f4e18bc
          gptid/014fbddf-6bcc-11ea-9cdb-a0369f4e18bc  ONLINE       0     0     0
        gptid/8f66452c-6a02-11ea-a922-a0369f4e18bc    FAULTED    114   693     0  too many errors
        gptid/8f771d80-6a02-11ea-a922-a0369f4e18bc    ONLINE     104   112     0
      raidz1-2                                        ONLINE       0     0     0
        gptid/8ad40550-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/8f548563-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/44ad6394-6aec-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/8ac32412-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
    spares
      4973883019692991099                             INUSE     was /dev/gptid/014fbddf-6bcc-11ea-9cdb-a0369f4e18bc
      8076838506038568445                             INUSE     was /dev/gptid/cfe11c7a-70ef-11ea-895f-a0369f4e18bc

errors: No known data errors

Now it's "No known data errors", magically? And the new spare-0 has allocated itself to a different drive? Well, ok - it did say "too many errors" on that one too. Well, let's bang in another spare and try again:

Code:

root@Sisyphus:~ # zpool status
  pool: DataPool
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Mar 28 12:35:20 2020
    24.0T scanned at 1.43G/s, 16.1T issued at 228M/s, 25.0T total
    1.21G resilvered, 64.57% done, 0 days 11:17:39 to go
config:

    NAME                                              STATE     READ WRITE CKSUM
    DataPool                                          DEGRADED     2     0 5.48M
      raidz1-0                                        ONLINE       0     0     0
        gptid/7abf8085-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/8d7ccf17-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/94977190-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/9348eedb-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
      raidz1-1                                        DEGRADED     2     0 11.0M
        spare-0                                       DEGRADED     0     0     0
          gptid/82f438bf-6a02-11ea-a922-a0369f4e18bc  DEGRADED     0     0     0  too many errors
          gptid/cfe11c7a-70ef-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
        spare-1                                       DEGRADED     0     0 4.03K
          12077469904772203790                        REMOVED      0     0     0  was /dev/gptid/8d6b27be-6a02-11ea-a922-a0369f4e18bc
          gptid/014fbddf-6bcc-11ea-9cdb-a0369f4e18bc  ONLINE       0     0     0
        spare-2                                       DEGRADED     0     0 14.9K
          gptid/8f66452c-6a02-11ea-a922-a0369f4e18bc  FAULTED    114   693     0  too many errors
          gptid/4b125e82-70f0-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
        gptid/8f771d80-6a02-11ea-a922-a0369f4e18bc    ONLINE     104   112     0
      raidz1-2                                        ONLINE       0     0     0
        gptid/8ad40550-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/8f548563-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/44ad6394-6aec-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/8ac32412-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
    spares
      4973883019692991099                             INUSE     was /dev/gptid/014fbddf-6bcc-11ea-9cdb-a0369f4e18bc
      8076838506038568445                             INUSE     was /dev/gptid/cfe11c7a-70ef-11ea-895f-a0369f4e18bc
      14815739748345029129                            INUSE     was /dev/gptid/4b125e82-70f0-11ea-895f-a0369f4e18bc

errors: No known data errors

Can someone explain to me how this pool is working at all? Three disks in a RAIDZ1 that are in a bad enough way that the system has all by itself subbed in three spares, and yet "No known data errors" (and there were originally!) and the pool is still up and running?

Spares are clearly doing some sort of deep magic here. Much more than was revealed in my recent thread "What are Spare drives in pools useful for?"

Also I probably ought to add another spare for that last member of the RAIDZ1 since that's having read/write errors too! But I've run out of slots unless I plumb in the other disk shelf. And I don't understand how this is still working! With zero data errors!

Help?

JaimieV · Mar 28, 2020

The fourth disk that was threatening to pop, has popped! What a remarkably unfortunate vdev that is. These are all scattered around a 12-disk Xyratex shelf so it's not like a single cable has gone bad or anything.

Code:

root@Sisyphus:~ # zpool status -v | grep -v snapshot | less
  pool: DataPool
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Mar 28 21:21:04 2020
        25.0T scanned at 1.65G/s, 18.0T issued at 71.7M/s, 25.0T total
        313G resilvered, 71.97% done, 1 days 04:26:20 to go
config:

        NAME                                              STATE     READ WRITE CKSUM
        DataPool                                          DEGRADED     2     0 10.4M
          raidz1-0                                        ONLINE       0     0     0
            gptid/7abf8085-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
            gptid/8d7ccf17-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
            gptid/94977190-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
            gptid/9348eedb-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
          raidz1-1                                        DEGRADED     2     0 20.8M
            spare-0                                       DEGRADED     0     0     1
              gptid/82f438bf-6a02-11ea-a922-a0369f4e18bc  DEGRADED     0     0     0  too many errors
              gptid/cfe11c7a-70ef-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
            spare-1                                       DEGRADED     0     0 4.03K
              12077469904772203790                        REMOVED      0     0     0  was /dev/gptid/8d6b27be-6a02-11ea-a922-a0369f4e18bc
              gptid/014fbddf-6bcc-11ea-9cdb-a0369f4e18bc  ONLINE       0     0     0
            spare-2                                       DEGRADED     0     0 19.5M
              gptid/8f66452c-6a02-11ea-a922-a0369f4e18bc  FAULTED    114   693     0  too many errors
              gptid/4b125e82-70f0-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
            gptid/8f771d80-6a02-11ea-a922-a0369f4e18bc    DEGRADED   106   112     0  too many errors
          raidz1-2                                        ONLINE       0     0     0
            gptid/8ad40550-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
            gptid/8f548563-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
            gptid/44ad6394-6aec-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
            gptid/8ac32412-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        spares
          4973883019692991099                             INUSE     was /dev/gptid/014fbddf-6bcc-11ea-9cdb-a0369f4e18bc
          8076838506038568445                             INUSE     was /dev/gptid/cfe11c7a-70ef-11ea-895f-a0369f4e18bc
          14815739748345029129                            INUSE     was /dev/gptid/4b125e82-70f0-11ea-895f-a0369f4e18bc

errors: Permanent errors have been detected in the following files:

Errors are finally admitted to - but once I discount files in snapshots, it's a list of just seven files. Still impressed that it's limping along!

For giggles I'm definitely going to wedge in another drive as a fourth spare... good thing this is a test rig and not a production device.

JaimieV · Mar 29, 2020

I've put in that fourth spare and waiting on resilver to complete.

Since I've got this in such an entertaining state, I thought I'd continue to use it for recover practice.
1) Remove the dead disks physically
2) zpool detach the dead disks
3) clean up the damaged files
4) Any other tidyups

1) was easy. Particularly when using this fabulous disk info script from Nephri which tells me which shelf door each disk lives behind, as well as matching up guid and /dev/daX. Why can't we get that info in the UI? We don't even get consistency, with the UI bouncing between GUIDs and /dev/daXX to make it actively difficult to work out which physical drive is being referred to as busted - and even then it doesn't usefully identify the location. The script does, so it's perfectly possible. /pet peeve

2) zpool detach of the removed disks fails with "no valid replicas". I'll wait until the resilver completes then try again.
root@Sisyphus:~ # zpool detach DataPool /dev/gptid/82f438bf-6a02-11ea-a922-a0369f4e18bc
cannot detach /dev/gptid/82f438bf-6a02-11ea-a922-a0369f4e18bc: no valid replicas
root@Sisyphus:~ # zpool detach DataPool /dev/gptid/8d6b27be-6a02-11ea-a922-a0369f4e18bc
cannot detach /dev/gptid/8d6b27be-6a02-11ea-a922-a0369f4e18bc: no valid replicas

3) Damaged files. "errors: 5146371 data errors, use '-v' for a list" sounds worrying, but 99.99% are in snapshots. Am I right in thinking I don't need to care about them? They're errors in old blocks, and won't affect the 'live' current files? I can happily ditch all the old snapshots.

What remains when I remove *snapshot* from the list is this:

<metadata>:<0xa00>
<metadata>:<0x902>
(snip about 50 more)
<metadata>:<0x9fc> - Do these matter? If so, how would I scan for/fix them?

/mnt/DataPool/Offline/spare.sparsebundle/bands - This is a folder. Listing it seems fine, is there a problem?
/mnt/DataPool/Offline/spare.sparsebundle/bands/661 - / These don't matter,
/mnt/DataPool/Offline/spare.sparsebundle/bands/cac - | they can be replaced
/mnt/DataPool/Offline/spare.sparsebundle/bands/769 - \ from source.

/mnt/DataPool/TimeMachine/Anaximander.backupbundle/bands - These are both folders and also seem to be fine
/mnt/DataPool/TimeMachine/Anaximander.backupbundle/mapped - for ls purposes, plus they're just a backup anyway

4) Tidyups - not sure what they would be. Any ideas?

So unless I'm making bad assumptions that anything in a snapshot isn't important, and those metadatas aren't important, and nothing more falls off while this last disk is being resilvered, I still do seem to have got away scot-free from a fail of every disk in a RAIDZ1 vdev.

Still amazed at this, despite knowing ZFS was designed to be data secure at all costs.

Still puzzled at how exactly spares work, because it clearly isn't the same as resilvering onto a substitute disk since the RAIDZ1 vdev would have died off as soon as the second disk fell off.

JaimieV · Mar 29, 2020

Okay, so today I learnt a thing. It is a thing worth knowing.

When the spares are in place, paired up with the dying/dead drives, and resilvering is completed, both the spare AND the original are still in use for data. Don't pull the bad disk at this point!

After I pulled the two bad drives out above to add more spares, I started getting visible errors in data - images and movie files corrupt and unreadable. So I (hooked up another SAS disk shelf and) put them back in. Suddenly there are no reading errors on the same files, and now zpool status shows several orders of magnitude fewer problems - "errors: 352 data errors, use '-v' for a list".

Using -v lists a bunch of files with "permanent data errors", but the whole list is made up of the files I had trouble reading while the original two disks were out so I think they're just leftover echoes of that. Now the disks are back in, I can read those files without any issues. I expect after spare-3 completes "resilvering" (in quotes because it's clearly not really the same as a normal resilver) if those are still showing then a scrub will fix those "permanent" errors up.

I love playing with this stuff! Very kind of the test box to have developed such exciting issues. I'm hoping I can bring it all back to healthy eventually.

One thing this is demonstrating is that Spares are *really* useful.

JaimieV · Mar 30, 2020

The resilver completed, although it spent about a day saying 30-ish minutes remaining, whereas here it says it took under 8 hours total.... Perhaps that was the timing of the real resilver after the whatever-the-heck-it-is pseudo "resilver" that a spare triggers when it enters the ring? It apparently came out clean with no known errors:

Code:

root@Sisyphus:~ # zpool status DataPool
  pool: DataPool
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
    repaired.
  scan: resilvered 2.05T in 0 days 07:36:52 with 0 errors on Mon Mar 30 23:49:29 2020
config:

    NAME                                              STATE     READ WRITE CKSUM
    DataPool                                          DEGRADED     2     0 40.7M
      raidz1-0                                        ONLINE       0     0     0
        gptid/7abf8085-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/8d7ccf17-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/94977190-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/9348eedb-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
      raidz1-1                                        DEGRADED     2     0 81.3M
        spare-0                                       DEGRADED     0     0     2
          da18p2                                      DEGRADED     0     0     0  too many errors
          gptid/cfe11c7a-70ef-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
        spare-1                                       DEGRADED     0     0  622K
          12077469904772203790                        REMOVED      0     0     0  was /dev/gptid/8d6b27be-6a02-11ea-a922-a0369f4e18bc
          gptid/014fbddf-6bcc-11ea-9cdb-a0369f4e18bc  ONLINE       0     0     0
        spare-2                                       DEGRADED     0     0     0
          gptid/8f66452c-6a02-11ea-a922-a0369f4e18bc  FAULTED     94   395     0  too many errors
          gptid/4b125e82-70f0-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
        spare-3                                       DEGRADED     0     0     0
          gptid/8f771d80-6a02-11ea-a922-a0369f4e18bc  DEGRADED   208   404     0  too many errors
          gptid/7649fe13-71f3-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
      raidz1-2                                        ONLINE       0     0     0
        gptid/8ad40550-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/8f548563-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/44ad6394-6aec-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
        gptid/8ac32412-6a02-11ea-a922-a0369f4e18bc    ONLINE       0     0     0
    spares
      4973883019692991099                             INUSE     was /dev/gptid/014fbddf-6bcc-11ea-9cdb-a0369f4e18bc
      8076838506038568445                             INUSE     was /dev/gptid/cfe11c7a-70ef-11ea-895f-a0369f4e18bc
      14815739748345029129                            INUSE     was /dev/gptid/4b125e82-70f0-11ea-895f-a0369f4e18bc
      14269308928481110541                            INUSE     was /dev/gptid/7649fe13-71f3-11ea-895f-a0369f4e18bc

errors: No known data errors

Now the detachment of the four duff drives, allowing the spares to take membership in the RAIDZ1:

Code:

root@Sisyphus:~ # zpool detach DataPool 12077469904772203790
root@Sisyphus:~ # zpool detach DataPool da18p2
root@Sisyphus:~ # zpool detach DataPool gptid/8f66452c-6a02-11ea-a922-a0369f4e18bc
root@Sisyphus:~ # zpool detach DataPool gptid/8f771d80-6a02-11ea-a922-a0369f4e18bc
root@Sisyphus:~ # zpool status DataPool
  pool: DataPool
state: ONLINE
  scan: resilvered 2.05T in 0 days 07:36:52 with 0 errors on Mon Mar 30 23:49:29 2020
config:

    NAME                                            STATE     READ WRITE CKSUM
    DataPool                                        ONLINE       2     0 40.7M
      raidz1-0                                      ONLINE       0     0     0
        gptid/7abf8085-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/8d7ccf17-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/94977190-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/9348eedb-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
      raidz1-1                                      ONLINE       2     0 81.3M
        gptid/cfe11c7a-70ef-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
        gptid/014fbddf-6bcc-11ea-9cdb-a0369f4e18bc  ONLINE       0     0     0
        gptid/4b125e82-70f0-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
        gptid/7649fe13-71f3-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
      raidz1-2                                      ONLINE       0     0     0
        gptid/8ad40550-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/8f548563-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/44ad6394-6aec-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/8ac32412-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0

errors: No known data errors

Checking some random files, all seems okay - but there's that CKSUM column admitting to some possible problems. Anyone know how come raidz1-1 can have 81.3M cksum errors, while the pool as a whole has 40.7M? I have set off a scrub.

Pending the result of that, this appears to have been a four-disk screwup in a four-disk RAIDZ1, with one complete fail (the REMOVED), one FAULTED, two DEGRADED, and salvaged with perhaps no data loss. That's crazy. A possible miracle, as per title.

Why doesn't any of the documentation sing the praises of spares? They're mentioned in passing, but barely. The GUI doesn't even have "promote spare to member", instead encouraging you to put in a new disk and run another resilver. Why doesn't anyone apparently know about spares, or how they work? Why am I talking to myself? What's going on here?

Redcoat · Mar 31, 2020

JaimieV said:
Why doesn't any of the documentation sing the praises of spares? They're mentioned in passing, but barely. The GUI doesn't even have "promote spare to member", instead encouraging you to put in a new disk and run another resilver. Why doesn't anyone apparently know about spares, or how they work? Why am I talking to myself? What's going on here?

This all sounds so very interesting. I do hope that @Kris Moore or other(s) at the sharp end of FreeNAS will chime in here and explain/validate/refute/whatever what you have reported.

JaimieV · Mar 31, 2020

I know, right? *waves arms around a bit*

Anyway. The full scrub completed this morning,

Code:

root@Sisyphus:~ # zpool status
  pool: DataPool
state: ONLINE
  scan: scrub repaired 1.89M in 0 days 05:51:56 with 0 errors on Tue Mar 31 06:37:16 2020
config:

    NAME                                            STATE     READ WRITE CKSUM
    DataPool                                        ONLINE       2     0 40.7M
      raidz1-0                                      ONLINE       0     0     0
        gptid/7abf8085-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/8d7ccf17-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/94977190-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/9348eedb-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
      raidz1-1                                      ONLINE       2     0 81.3M
        gptid/cfe11c7a-70ef-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
        gptid/014fbddf-6bcc-11ea-9cdb-a0369f4e18bc  ONLINE       0     0     0
        gptid/4b125e82-70f0-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
        gptid/7649fe13-71f3-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
      raidz1-2                                      ONLINE       0     0     0
        gptid/8ad40550-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/8f548563-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/44ad6394-6aec-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/8ac32412-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
    spares
      gptid/b3f301ed-72e3-11ea-895f-a0369f4e18bc    AVAIL  
      gptid/b80b43e2-72e3-11ea-895f-a0369f4e18bc    AVAIL  
      gptid/b8f20e98-72e3-11ea-895f-a0369f4e18bc    AVAIL  

errors: No known data errors

So, no known data errors. But definitely checksum problems.

What does the 81.3M/40.7M mean, are there file data errors or not? Why are there two values? I see the raidz1-1 value is double the pool value but I have no idea what to make of it. Is my data complete or not? Is there anything left I can actually do, if it is not?

Or has everyone fallen so far down the "just nuke it and restore from backup" path that this is meaningless esoterica and old dead code paths and no-one knows any more? Because if so, what's the point in having a data-integrity-first filesystem at all, I could just be using RAID6es with FAT32 or something equally awful.

JaimieV · Apr 2, 2020

I can't find any reference on the web/manuals/even Oracle docs to checksums except against actual drives. Leftover issues in the pool make sense assuming there was lossage overlap as my drives were falling over (one drive out and several hundred bad blocks across the other three), but as far as I understood how ZFS does checksumming I can't reconcile values existing in that column without matching damage in files being declared. I've never done a 'clear'.

"No known data errors" it still says after another scrub. Can I trust it? Or should I go back to FAT32 for all my file storage where at least I can be sure it won't look after my data.

styno · Apr 2, 2020

I'd say the data should be good. You are actually in a position to actually test it by comparing the files in the pool to the known good files on your backup.

JaimieV · Apr 2, 2020

I have an rsync checking through as of last night :) Great minds etc, thanks!

JaimieV · Apr 3, 2020

Rsync completed and all clear. I took the system down to rearrange the HDDs so I can power down the second shelf again, and after reboot all those stray CKSUM counts have gone to zero, whoops.

So yes - four disks in a four-disk RAIDZ1 failed, and throwing spares at it saved ALL the data even though I pulled two of the member disks during the process (when they were each paired with fully-synced spares). I declare a bona fide miracle.

@jgreco I see you're currently hanging in the forums - any light you can shine on this? Feels like something you'd know about.

Code:

root@Sisyphus:~ # zpool status
  pool: DataPool
 state: ONLINE
  scan: resilvered 2.13T in 0 days 06:08:35 with 0 errors on Wed Apr  1 02:06:21 2020
config:

    NAME                                            STATE     READ WRITE CKSUM
    DataPool                                        ONLINE       0     0     0
      raidz1-0                                      ONLINE       0     0     0
        gptid/7abf8085-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/8d7ccf17-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/813fe339-7381-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
        gptid/9348eedb-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
      raidz1-1                                      ONLINE       0     0     0
        gptid/cfe11c7a-70ef-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
        gptid/014fbddf-6bcc-11ea-9cdb-a0369f4e18bc  ONLINE       0     0     0
        gptid/4b125e82-70f0-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
        gptid/7649fe13-71f3-11ea-895f-a0369f4e18bc  ONLINE       0     0     0
      raidz1-2                                      ONLINE       0     0     0
        gptid/8ad40550-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/8f548563-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/44ad6394-6aec-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
        gptid/8ac32412-6a02-11ea-a922-a0369f4e18bc  ONLINE       0     0     0
    spares
      gptid/b80b43e2-72e3-11ea-895f-a0369f4e18bc    AVAIL   
      gptid/b8f20e98-72e3-11ea-895f-a0369f4e18bc    AVAIL   

errors: No known data errors

tfran1990 · Apr 3, 2020

This is odd, what was that amount of run time on the discs that failed?

JaimieV · Apr 3, 2020

Ah, that's not really the interesting thing. These are all old drives - bought as a job lot from ebay which is why they need lots of burnin to find out if they're any good (but varying between 3000 and 12000 hours according to smartctl).

The fun bit is the way spares are salvaging the massively failed array.

Yorick · Apr 4, 2020

I'm hand-waving here, and ... as I understand it, when a drive fails but is kept online for the replacement / resilver, it's still available to ZFS. ZFS checksums, so any data on the failed drive that's still good, will be used. Obviously for failures that are less severe than "spanabhebende Datenverarbeitung" (*).

So plausible scenario in your case, and don't take this as gospel until someone with actual understanding chimes in:

Disk 1 in raidz1-1 goes bad, block 666 can't be read any more
Spare kicks in and resilvers
You add a second spare
Disk 0 in raidz1-1 goes bad, block 667 can't be read any more. Parity for block 666 on Disk 1 is still available from it.
Disk 1 can supply parity for block 667 on Disk 0.
Resilvering continues

The whole "block" thing is grossly simplified, someone who actually understands how ZFS stores data will be despairing. In a nutshell, I'm assuming that the parity data on the dying disks was still available, even as they were losing data.

This kind of failure behavior is fascinating. ZFS is an amazing filesystem.

(*) German phrase for severe disk failure. I struggle with a translation - maybe "stock-removing data processing"?

JaimieV · Apr 4, 2020

You're probably right, only one drive fully fell out of the array - and clearly ZFS was reading from the two damaged ones that I pulled out! (That's a bit odd actually, as there were replicas of their data on the two spares they paired with so seems bad that ZFS tried to read, failed, and then didn't read from another replica. But that's ZFS itself, not FreeNAS).

Still requires a fair bit of good luck, all those bad blocks across different drives need to be non-coinciding. But without all the spares doing the heavy lifting I can't see how the raidz would have made it through.

So, spares are pretty great! And when your pools get large, they get increasingly efficient in terms of disk usage - I know a RAIDZ1 plus a spare isn't as safe as a RAIDZ2, but five RAIDZ2's and a couple of spares seems like a pretty solid plan vs five RAIDZ3s, let alone for the now-popular "as many mirror-pair vdevs as you can handle" pool.

It's a shame the UI doesn't take advantage of them, with options to "make member" or "return to spare status" options.

Important Announcement for the TrueNAS Community.

How is this still alive?! Possible miracles of using Spare drives

JaimieV

Guru

JaimieV

Guru

JaimieV

Guru

JaimieV

Guru

JaimieV

Guru

Redcoat

MVP

JaimieV

Guru

JaimieV

Guru

styno

Patron

JaimieV

Guru

JaimieV

Guru

tfran1990

Patron

JaimieV

Guru

Yorick

Wizard

JaimieV

Guru

Similar threads

Important Announcement for the TrueNAS Community.

How is this still alive?! Possible miracles of using Spare drives

Guru

Guru

Guru

Guru

Guru

MVP

Guru

Guru

Patron

Guru

Guru

Patron

Guru

Wizard

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "How is this still alive?! Possible miracles of using Spare drives"

Similar threads