Data Recovery

HANDLEric

Dabbler
Joined
May 6, 2019
Messages
47
Hello,

we have a smaller array that was recently moved between physical locations and upon arrival things came up normally/as expected. Well after about 4 hours we noticed extreme I/O degradation and them about 20 minutes later a high number of read errors. Knowing the system was just moved we decided to power down the platform with intent to reseat all of the drives.

Well after hitting shut down we noticed the system took an unusually long amount of time to come down (~10-15 minutes), but none the less we proceeded to reseat each of the drives. Now the problem we're facing is after powering everything back up we now have two drives that are reporting unrecoverable errors and are not able to be mounted with a SMART status of "SMART Failure: DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH". So my question here is before we go down the road of a bare metal recovery for the 30T worth of data I was wondering if there were any utilities or tools that we might be able to use that could potentially repair atleast one of the two ZFS drives in hopes to get things online long enough to replace the data to a new drive?

General Specs:
  • TrueNAS Core 13.0u2
  • Storage Pool:
    • 6x 8TB Segate Exos 7E8
    • Intel P3700 (Used for log)
    • Stripe: RAIDz1
    • Pool encrypted with legacy GELI encryption
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Sorry, I don't have any real suggestions, other than:
  • If the 2 failing drives have not failed completely, then ZFS should list which files are affected allowing you to restore only the affected files.
  • If the 2 failing drives seem completely un-recoverable, perhaps you need to verify that GELI encryption is decrypting those drives correctly. If not, they would appear as garbage to ZFS until GELI encryption is fixed.
  • In the future, RAID-Z1 is not recommended for use with drives larger than 1TB or 2TB.
  • Please list the output of the commands zpool import, (or if the pool is imported, zpool status -v)
Good luck, and I hope someone else can help you. (They may also ask for GELI related info, which I don't know...)
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
171
So my question here is before we go down the road of a bare metal recovery for the 40T worth of data I was wondering if there were any utilities or tools that we might be able to use that could potentially repair atleast one of the two ZFS drives in hopes to get things online long enough to replace the data to a new drive?

On a drive that is degraded, presumably mechanically (we do not have any SMART details), any utilities or tools risk further degrading the drive, sometimes to the extent that it becomes truly unrecoverable.

Normally, you should make a clone (full copy) of the drive (onto another drive or into a file). Depending on how valuable your data is, with a mechanically damaged drive, you either attempt to make a clone yourself (not valuable data) or have a data recovery lab make a clone for you (valuable data).

If not, they would appear as garbage to ZFS until GELI encryption is fixed.

A GELI problem cannot produce a SMART problem, so whatever hardware (disk or cable) problem SMART reports, it comes first.
 

HANDLEric

Dabbler
Joined
May 6, 2019
Messages
47
Sorry, I don't have any real suggestions, other than:
  • If the 2 failing drives have not failed completely, then ZFS should list which files are affected allowing you to restore only the affected files.
  • In the future, RAID-Z1 is not recommended for use with drives larger than 1TB or 2TB.
1) Unfortunately, this array is storing block data not object therefore it's all or nothing.
2) Can you sight an official source for this? I circled back through the current documentation and don't see any mention of drive size restrictions for particular RAID levels.

A GELI problem cannot produce a SMART problem, so whatever hardware (disk or cable) problem SMART reports, it comes first.
Agreed - I was already debating attempting a block level clone to see if I would get lucky.
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
171
Can you sight an official source for this? I circled back through the current documentation and don't see any mention of drive size restrictions for particular RAID levels

There isn't any. All of this is perceived risk balancing. There are no hard numbers in hard drive reliability. If a controlled-environment experiment is ever attempted, the results would be obsolete by the time the experiment is ended. And nobody even knows which variables to control for. The current best practice is "thou shalt not have RAIDZ1, or any other single-redundant array, with large disks". Everyone defines large for themselves.
 

HANDLEric

Dabbler
Joined
May 6, 2019
Messages
47
There isn't any. All of this is perceived risk balancing. There are no hard numbers in hard drive reliability. If a controlled-environment experiment is ever attempted, the results would be obsolete by the time the experiment is ended. And nobody even knows which variables to control for. The current best practice is "thou shalt not have RAIDZ1, or any other single-redundant array, with large disks". Everyone defines large for themselves.
Fair enough. I guess I just assumed having a hot spare remediated any risks involved, the only reason it didn't really work out here is because the disks were presumed to have failed from the physical move.
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Fair enough. I guess I just assumed having a hot spare remediated any risks involved, the only reason it didn't really work out here is because the disks were presumed to have failed from the physical move.

Raid-Z1 plus hot spare is far to be equivalent of Raid-Z2.

See my post about this.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
2) Can you sight an official source for this? I circled back through the current documentation and don't see any mention of drive size restrictions for particular RAID levels.
This is Dell advising against using RAID-5 (sort of RAIDZ1 equivalent with traditional RAID) for any business-critical data. They don't even mention disk sizes, so it's probably pretty safe to assume that the recommendation stands for any size.
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
Just a thought, because it's often overlooked... ZFS doesn't care where the disk gets plugged in. It tracks the drives by a kind of UUID, and you can scramble the slot positions like shuffling a deck of cards. ZFS will just sort it out at import and carry on. If you have a cable or slot problem, you can move the drives to an unused slot and they will be picked up and imported into the pool.

This also allows for a certain amount of "desperate crazy", ala: stick the drive in a USB enclosure and throw it in the refrigerator. But that will certainly destroy the drive and probably the enclosure too. So let's not go there...
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
two drives that are reporting unrecoverable errors and are not able to be mounted with a SMART status of "SMART Failure: DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH"

This is a very SAS-sy type of error, and one that I'm normally used to seeing from RAID controllers. ZFS on an HBA should throw up an error similar to "insufficient replicas." Can you confirm that you aren't using any kind of external storage logic?
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
There isn't any. All of this is perceived risk balancing. There are no hard numbers in hard drive reliability. If a controlled-environment experiment is ever attempted, the results would be obsolete by the time the experiment is ended. And nobody even knows which variables to control for. The current best practice is "thou shalt not have RAIDZ1, or any other single-redundant array, with large disks". Everyone defines large for themselves.

Agree here. It comes down to the unrecoverable read error rate. Most HDD drives are in the 10^14 to 10^15 range, which means there's one unreadable sector on every drive larger than X... And that's where the problem lies. The drive manufacturers don't give us enough info to solve for X. Straight math says ~4Tb drives, but this doesn't pan out in practice. There's some weighting factor in the figures, be it age or environment limits, etc... Even if they did give us more info , it would likely only be solved for a uniform configuration of a single production run of one model of drive only. Not even enterprise accounts with fat support contracts run the kind of uniformity. So we're left assessing our risk individually, which is really where the problem needs to be solved.

This is a very SAS-sy type of error, and one that I'm normally used to seeing from RAID controllers. ZFS on an HBA should throw up an error similar to "insufficient replicas." Can you confirm that you aren't using any kind of external storage logic?
"DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH" could also imply a SAS training failure. A damaged cable leading to a case where the drive is detected, but communications cannot be established reliably enough to perform the discovery handshake. With moved equipment, this can be physical damage, or even a topology mis-config, though the latter should throw an immediate error or simply not present any detection at all.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
@HANDLEric , can you please provide the full details of your hardware setup?
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
...
  • If the 2 failing drives seem completely un-recoverable, perhaps you need to verify that GELI encryption is decrypting those drives correctly. If not, they would appear as garbage to ZFS until GELI encryption is fixed.
...
Good luck, and I hope someone else can help you. (They may also ask for GELI related info, which I don't know...)
...
A GELI problem cannot produce a SMART problem, so whatever hardware (disk or cable) problem SMART reports, it comes first.
Thank you for the reminder. I sometimes miss details with long posts.
 
Last edited:

HANDLEric

Dabbler
Joined
May 6, 2019
Messages
47
can you please provide the full details of your hardware setup?
Specs as followed:
General Specs:
  • TrueNAS Core 13.0u2
  • Storage Pool:
    • 6x 8TB Segate Exos 7E8
    • Intel P3700 (Used for log)
    • Stripe: RAIDz1
    • Pool encrypted with legacy GELI encryption
Hardware Specs:
  • HP Proliant DL380 G8
  • 512GB RDIMM
  • OS Volume: 2xSegate IronWolf Pro 256GB Running Hardware managed RAID
  • Intel P3700 PCI-E SSD (Used for pool log)
  • Dual Port QLogic 40Gbe QSFP NIC
  • LSI SAS9200-16e External SAS HBA
  • PowerVault MD1200 (12x 3.5" Drive Shelf)
  • (6) 8TB Segate Exos 7E8 Drives

I think that's everything.
 

HANDLEric

Dabbler
Joined
May 6, 2019
Messages
47
This is a very SAS-sy type of error, and one that I'm normally used to seeing from RAID controllers. ZFS on an HBA should throw up an error similar to "insufficient replicas." Can you confirm that you aren't using any kind of external storage logic?
I do see insufficient replicas if I look at the pool status, the SCSI errors are shown in the console output when it tries to bring either of the failed drives online. I did clone one of the failed drives which so far didn't much help but one of the drives seems to be no longer throwing SCSI errors but instead says "The secondary GPT header is not in the last LBA"

Missing GPT Header.png


gpart Info & Repair Command:
GPT Corrupt.png
 
Last edited:

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
171
"The secondary GPT header is not in the last LBA"

This means your clone is larger than the source. I'm not sure how it affects your specific OS, but generally, it is not a data loss case. The data is still there, plus there are some extra zeros (or junk) after the backup (secondary) GPT header.
 

HANDLEric

Dabbler
Joined
May 6, 2019
Messages
47
This means your clone is larger than the source. I'm not sure how it affects your specific OS, but generally, it is not a data loss case. The data is still there, plus there are some extra zeros (or junk) after the backup (secondary) GPT header.
That's interesting because this is actually being thrown on the initial drive when reattached back to the TrueNAS server. It doesn't appear to be throwing SCSI errors any longer so I feel like if I can overcome this read-only/mounting issue I might be able to get the pool back online.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I do see insufficient replicas if I look at the pool status, the SCSI errors are shown in the console output when it tries to bring either of the failed drives online. I did clone one of the failed drives which so far didn't much help but one of the drives seems to be no longer throwing SCSI errors but instead says "The secondary GPT header is not in the last LBA"
View attachment 59596
Strange that it's called out da12 as "write protected" in the dmesg - the model number doesn't return as a SED drive so it's not a locking issue there. I'm worried that something in the firmware kicked it into a read-only/premature fail state as a "safety measure."

Did it do this for da11 as well?
 

HANDLEric

Dabbler
Joined
May 6, 2019
Messages
47
Strange that it's called out da12 as "write protected" in the dmesg - the model number doesn't return as a SED drive so it's not a locking issue there. I'm worried that something in the firmware kicked it into a read-only/premature fail state as a "safety measure."

Did it do this for da11 as well?
Both paths (da 11 & da12) did, yes.
 
Top