Pool has encountered an uncorrectable I/O failure and has been suspended

kdombrovsky

Dabbler
Joined
Jan 21, 2018
Messages
16
Hi guys,

I'm having a problem with one of my pools. I've had random disk failures for some time and did everything to troubleshoot. It turned out to be a faulty PSU.
Got a new one and everything seemed in order. However one of my pools got degraded and every time I access it it gives me:
Solaris: WARNING: Pool 'zraid' has encountered an uncorrectable I/O failure and has been suspended.
SSH/UI is frozen afterwards.

Code:
root@freenas:~ # zpool status zraid
  pool: zraid
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 06:05:36 with 3771 errors on Mon Nov 21 23:07:50 2022
config:

    NAME                                            STATE     READ WRITE CKSUM
    zraid                                           DEGRADED     0     0     0
      raidz1-0                                      DEGRADED     0     0     0
        gptid/e6a8a431-17fe-11eb-b872-c860000235b7  DEGRADED     0     0    12  too many errors
        gptid/e7806e32-17fe-11eb-b872-c860000235b7  DEGRADED     0     0    16  too many errors
        3657722943806036456                         UNAVAIL      0     0     0  was /dev/da6

errors: 3603 data errors, use '-v' for a list


Is it possible to recover some of the files from that pool? Some of them are near and dear to my heart. I was hoping raidz would help protect them but alas...

Hardware:
  • Motherboard: Asus M5A99X
  • Processor: AMD FX(tm)-6100 Six-Core Processor
  • RAM: 8 GIG DDR3 non ECC
  • Storage: 3 x Seagate IronWolf 6TB
  • HBA: LSI 9211-8i
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Your only option is a dedicated recovery tool like Klennet. However, your Z1 pool has 1 missing disk and 2 with multiple checksum errors. I don't hold out high hopes.
 

kdombrovsky

Dabbler
Joined
Jan 21, 2018
Messages
16
There are 3 disks, I just unplugged one of them trying to figure out if that could resolve the error
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
trying to figure out if that could resolve the error

Removing redundancy is a hell of a way to resolve errors. Hopefully you didn't irretrievably kill your pool.

With your RAIDZ1 chosen configuration, you are able to survive the loss of one device, but any additional errors such as bad sectors on the remaining disks render the affected data irretrievable. RAIDZ2 is the minimum suggested RAIDZ level if you are storing valuable data, as it is much more resilient to errors such as an inadvertently unplugged drive.
 

kdombrovsky

Dabbler
Joined
Jan 21, 2018
Messages
16
Haha, yeah sounds kinda dumb. The reason I did that is that the pool seemed to function ok with just 2 disks before I replaced a "faulty" disk (remains to be seen if it's that's the case or it's the damn PSU)
 

MrGuvernment

Patron
Joined
Jun 15, 2017
Messages
268
For future. a single NAS is not a backup, RaidZ any level or mirroring. Those options simply provide redundancy and uptime. If your single PSU decided to act up and take out your mobo and drives with it...so long fair well!

Invest in some other options, even if it is an external USB drive files are synced to now and then, and disconnected or something, it is better than a single device.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
If your single PSU decided to act up and take out your mobo and drives with it...so long fair well!

Listen to MrGuvernment. He's here to help. (With apologies to Ronald Reagan)


But seriously, this is great advice. You never know when a fan is going to die and cook your system, a lightning strike hits nearby, your HBA heatsink comes loose and the HBA starts spewing bits onto your disks, a nearby grass fire burns down your home, etc. You ideally want a copy of your data available locally, plus a remote one, PLUS an offline one. Once the data is gone, it's too late to remediate any shortcomings.

Please accept this as constructive criticism, and, @MrGuvernment , I see your location as Calgary so I hope you don't mind the US political humor. It was just a great opportunity. ;-)
 

kdombrovsky

Dabbler
Joined
Jan 21, 2018
Messages
16
Thanks for the advice, I'll think of a way to make this happen. A tape streamer sounds like a good (but expensive) option.

In the meantime I'd like to make sure I've exhausted all of the options to recover my files. Is there anything left to try?
 

MrGuvernment

Patron
Joined
Jun 15, 2017
Messages
268
Listen to MrGuvernment. He's here to help. (With apologies to Ronald Reagan)


But seriously, this is great advice. You never know when a fan is going to die and cook your system, a lightning strike hits nearby, your HBA heatsink comes loose and the HBA starts spewing bits onto your disks, a nearby grass fire burns down your home, etc. You ideally want a copy of your data available locally, plus a remote one, PLUS an offline one. Once the data is gone, it's too late to remediate any shortcomings.

Please accept this as constructive criticism, and, @MrGuvernment , I see your location as Calgary so I hope you don't mind the US political humor. It was just a great opportunity. ;-)
ahaha! love it!, needed a good laugh today.

All good!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
In the meantime I'd like to make sure I've exhausted all of the options to recover my files. Is there anything left to try?

Yes! By all means, put the dang disk back in and tell us what happens. It may be recoverable. Well, I can guarantee you that there is SOME stuff that WILL not be recoverable. But it doesn't hurt to try to rescue the rest.
 

kdombrovsky

Dabbler
Joined
Jan 21, 2018
Messages
16
Just finished a scrub. Here are the results: https://pastebin.com/MFRR5c9d
Code:
root@freenas:~ # zpool status zraid
  pool: zraid
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 06:06:50 with 3771 errors on Tue Nov 22 16:58:46 2022
config:

    NAME                                            STATE     READ WRITE CKSUM
    zraid                                           DEGRADED     0     0     0
      raidz1-0                                      DEGRADED     0     0     0
        gptid/e6a8a431-17fe-11eb-b872-c860000235b7  DEGRADED     0     0 12.1K  too many errors
        gptid/e7806e32-17fe-11eb-b872-c860000235b7  DEGRADED     0     0 13.5K  too many errors
        da6                                         DEGRADED     0     0 13.5K  too many errors

errors: 3603 data errors, use '-v' for a list


The directory I need to recover isn't listed in the corrupted files section. I hope it's recoverable
 
Last edited:

MrGuvernment

Patron
Joined
Jun 15, 2017
Messages
268
Thanks for the advice, I'll think of a way to make this happen. A tape streamer sounds like a good (but expensive) option.

In the meantime I'd like to make sure I've exhausted all of the options to recover my files. Is there anything left to try?

You def don't need tape for this, backups are the 3-2-1 rule:
  • Keep at least three (3) copies of data.
  • Store two (2) backup copies on different storage media.
  • Store one (1) backup copy offsite.

 

Attachments

  • 1669198702462.jpg
    1669198702462.jpg
    36.1 KB · Views: 105

kdombrovsky

Dabbler
Joined
Jan 21, 2018
Messages
16
Haven't tried much apart from scrubbing since I don't know what I can do.
Rsyncing my files causes "WARNING: Pool `zraid` has encountered an uncorrectable I/O failure and has been suspended". Everything hangs after that.
 

kdombrovsky

Dabbler
Joined
Jan 21, 2018
Messages
16
Is there a way to get a list of affected directories (apart from zpool status -v)? Perhaps I could try to salvage what I can.
Is it possible to recover from "an uncorrectable I/O failure" without reboot/hard reset?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Unfortunately, your manipulations have led to a pool that's too far gone for standard ZFS tools. You're in the realm of professional data recovery services or Klennet ZFS Recovery.
 

MrGuvernment

Patron
Joined
Jun 15, 2017
Messages
268
Haven't tried much apart from scrubbing since I don't know what I can do.
Rsyncing my files causes "WARNING: Pool `zraid` has encountered an uncorrectable I/O failure and has been suspended". Everything hangs after that.

Was this scrub before or after putting the drives back in you took out as @jgreco suggested? If you have not yet put the disks back in...put em in..

Yes! By all means, put the dang disk back in and tell us what happens. It may be recoverable. Well, I can guarantee you that there is SOME stuff that WILL not be recoverable. But it doesn't hurt to try to rescue the rest.
 

kdombrovsky

Dabbler
Joined
Jan 21, 2018
Messages
16
Was this scrub before or after putting the drives back in you took out as @jgreco suggested? If you have not yet put the disks back in...put em in..
This was after I plugged everything back in.

Unfortunately, your manipulations have led to a pool that's too far gone for standard ZFS tools. You're in the realm of professional data recovery services or Klennet ZFS Recovery.
Trying to understand if it's the entire pool that's gone or just some files/directories
 
Top