Permanent errors have been detected in the following files

Jag Lally · Nov 29, 2014

hi,

I am using FreeNAS-8.3.0-RELEASE-p1-x64 (r12825). My pool started showing smart errors/warnings on two disk at the same time. I bought two new disks to replace them and followed the procedure in the manual.
After resilvering the first disk I got 'Permanent errors have been detected' and couldn't offline the 2nd disk. So I powered down and removed it and then added a new disk instead and resilvered again.

My original errors have not disappeared despite doing zpool scrub twice.

Code:

zpool status -v asgard
  pool: asgard
state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 6h56m with 1 errors on Sat Nov 29 02:49:02 2014
config:

    NAME                                              STATE     READ WRITE CKSUM
    asgard                                            DEGRADED     0     0     1
      raidz1-0                                        DEGRADED     0     0     2
        gptid/5ded79bf-74e1-11e4-ac61-441ea13caae6    ONLINE       0     0     0
        replacing-1                                   DEGRADED     0     0     0
          7511900252865497515                         UNAVAIL      0     0     0  was /dev/gptid/d803cfcd-cb8b-11e1-958d-441ea13caae6
          gptid/77e005b2-7672-11e4-a8d8-441ea13caae6  ONLINE       0     0     0
        gptid/42edf68f-7d53-11e3-a71d-441ea13caae6    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        asgard/data:<0xcf34e>

I logically understand that I need to delete the corrupted data block because it has got detached from the original reference with full path.
I googled around and found the following on using zdb to get more info on the hex reference given.

Code:

zdb -ddddd asgard/data 0xcf34e
Dataset asgard/data [ZPL], ID 32, cr_txg 22, 3.78T, 1340863 objects, rootbp DVA[0]=<0:6d47a9ea000:2000> DVA[1]=<0:38ad762000:2000> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=12080704L/12080704P fill=1340863 cksum=1b74ef303c:8defe7e5b0f:18d38ef8da187:3167b0ed2a5fc5

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
    848718    1    16K    512  5.50K    512  100.00  ZFS plain file
                                        264   bonus  ZFS znode
    dnode flags: USED_BYTES USERUSED_ACCOUNTED
    dnode maxblkid: 0
    path    ???<object#848718>
    uid     1001
    gid     20
    atime    Sun Jul  6 14:01:12 2014
    mtime    Sun Jul  6 13:00:53 2014
    ctime    Sun Jul  6 13:00:53 2014
    crtime    Sun Jul  6 13:00:52 2014
    gen    9665424
    mode    100600
    size    56
    parent    848717
    links    1
    pflags    40800000005
    xattr    0
    rdev    0x0000000000000000
Indirect blocks:
               0 L0 0:38cfc3d6000:2000 200L/200P F=1 B=9665424/9665424

        segment [0000000000000000, 0000000000000200) size   512

I managed to get the content listed using

Code:

zdb -R asgard/data 0:38cfc3d6000:2000:r
Found vdev type: raidz
??????????????????????????????????

It looks like a binary plain file of some sort.

How can I get rid of this single error and restore the pool to health?

I don't mind loosing a few files as I can restore from backup but ideally I don't want to recreate and restore over 3Tb. Using zpool scrub twice hasn't got rid of it and I'm very new to zfs. Can anyone help?

thanks
Jag

cyberjock · Nov 30, 2014

I've played with zdb enough to potentially be very destructive, and I've fixed a few things thanks to zdb. I'm sorry but you'll find very little (read: none) support for zdb. I don't personally offer support with zdb because it's actually much much more complex than people thing. There's never been a one or two-step process to getting things done. It's always taking an input, interpreting it to go to the next step. I've never gotten less than 3-4 steps deep unless its something like validating that your pool is totally fscked.

Generally if you are in a situation where you are having to use zdb, you've probably not followed our recommendations. In your case, choosing to use RAIDZ1 is a major no-no.

Scrubbing twice to fix the error was a futile attempt. Once it can't fix the error because of a lack of redundancy the errors are permanent.

In your case though, it looks like ZFS isn't sure what the name is, so the file doesn't appear to have a name. What you can try doing is moving everything in asgard/data out of ascard/data and then destroying that dataset. You'll obviously lose that one file, but you also won't even know what file it is. In your case, you're better off just restoring asgard/data from backup (and maybe take this opportunity to build pools with more than just a marginal fraction of redundancy more than a striped pool).

Jag Lally · Nov 30, 2014

hi.

Thank you for replying. I initially set up this pool during the time when hard disk prices went through the roof in early 2012. I could only afford RaidZ1. I did however mitigate the risk later on by having multiple backups and some of them are delayed so I could cope with this scenario and "creeping" corruption. I have seen enough expressions on peoples faces during my working career when faced with total loss and no backup whatsoever.

Using zdb was via learning through google search and I have no expertise in it and barely know what I am doing. So I won't be pursuing it further.

I will follow your advise and restore from backup and take the opportunity to have a better raid setup so I can tolerate at least 2 simultaneous disk failures.

Ericloewe · Nov 30, 2014

Jag Lally said:
hi.

Thank you for replying. I initially set up this pool during the time when hard disk prices went through the roof in early 2012. I could only afford RaidZ1. I did however mitigate the risk later on by having multiple backups and some of them are delayed so I could cope with this scenario and "creeping" corruption. I have seen enough expressions on peoples faces during my working career when faced with total loss and no backup whatsoever.

Using zdb was via learning through google search and I have no expertise in it and barely know what I am doing. So I won't be pursuing it further.

I will follow your advise and restore from backup and take the opportunity to have a better raid setup so I can tolerate at least 2 simultaneous disk failures.

Ah, the great ~~scam~~ "Thai floods" of 2012. Drive prices only recently dropped below late 2011 prices.

unca_NAS · Dec 9, 2014

EDIT: nevermind.

unca_NAS · Dec 14, 2014

Well, this has been an interesting week.

AFAIK the root cause to permanent errors was a snapshot. Destroying the snapshot triggered further errors and numerous resilvers.

To cut to the chase - the pool is online again. What I did was:
- restore the original file that triggered the error within snapshot
- destroyed any and all snapshots of that dataset

After resilvering all seems to be working again.

Important Announcement for the TrueNAS Community.

Permanent errors have been detected in the following files

Jag Lally

Cadet

cyberjock

Inactive Account

Jag Lally

Cadet

Ericloewe

Server Wrangler

unca_NAS

Explorer

unca_NAS

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

Permanent errors have been detected in the following files

Jag Lally

Cadet

cyberjock

Inactive Account

Jag Lally

Cadet

Ericloewe

Server Wrangler

unca_NAS

Explorer

unca_NAS

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Permanent errors have been detected in the following files"

Similar threads