Data Recovery

HANDLEric · Oct 30, 2022

Hello,

we have a smaller array that was recently moved between physical locations and upon arrival things came up normally/as expected. Well after about 4 hours we noticed extreme I/O degradation and them about 20 minutes later a high number of read errors. Knowing the system was just moved we decided to power down the platform with intent to reseat all of the drives.

Well after hitting shut down we noticed the system took an unusually long amount of time to come down (~10-15 minutes), but none the less we proceeded to reseat each of the drives. Now the problem we're facing is after powering everything back up we now have two drives that are reporting unrecoverable errors and are not able to be mounted with a SMART status of "SMART Failure: DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH". So my question here is before we go down the road of a bare metal recovery for the 30T worth of data I was wondering if there were any utilities or tools that we might be able to use that could potentially repair atleast one of the two ZFS drives in hopes to get things online long enough to replace the data to a new drive?

General Specs:

TrueNAS Core 13.0u2
Storage Pool:
- 6x 8TB Segate Exos 7E8
- Intel P3700 (Used for log)
- Stripe: RAIDz1
- Pool encrypted with legacy GELI encryption

Arwen · Oct 30, 2022

Sorry, I don't have any real suggestions, other than:

If the 2 failing drives have not failed completely, then ZFS should list which files are affected allowing you to restore only the affected files.
If the 2 failing drives seem completely un-recoverable, perhaps you need to verify that GELI encryption is decrypting those drives correctly. If not, they would appear as garbage to ZFS until GELI encryption is fixed.
In the future, RAID-Z1 is not recommended for use with drives larger than 1TB or 2TB.
Please list the output of the commands zpool import, (or if the pool is imported, zpool status -v)

Good luck, and I hope someone else can help you. (They may also ask for GELI related info, which I don't know...)

AlexGG · Oct 31, 2022

HANDLEric said:
So my question here is before we go down the road of a bare metal recovery for the 40T worth of data I was wondering if there were any utilities or tools that we might be able to use that could potentially repair atleast one of the two ZFS drives in hopes to get things online long enough to replace the data to a new drive?

On a drive that is degraded, presumably mechanically (we do not have any SMART details), any utilities or tools risk further degrading the drive, sometimes to the extent that it becomes truly unrecoverable.

Normally, you should make a clone (full copy) of the drive (onto another drive or into a file). Depending on how valuable your data is, with a mechanically damaged drive, you either attempt to make a clone yourself (not valuable data) or have a data recovery lab make a clone for you (valuable data).

Arwen said:
If not, they would appear as garbage to ZFS until GELI encryption is fixed.

A GELI problem cannot produce a SMART problem, so whatever hardware (disk or cable) problem SMART reports, it comes first.

HANDLEric · Oct 31, 2022

Arwen said:
Sorry, I don't have any real suggestions, other than:

If the 2 failing drives have not failed completely, then ZFS should list which files are affected allowing you to restore only the affected files.

In the future, RAID-Z1 is not recommended for use with drives larger than 1TB or 2TB.

1) Unfortunately, this array is storing block data not object therefore it's all or nothing.
2) Can you sight an official source for this? I circled back through the current documentation and don't see any mention of drive size restrictions for particular RAID levels.

AlexGG said:
A GELI problem cannot produce a SMART problem, so whatever hardware (disk or cable) problem SMART reports, it comes first.

Agreed - I was already debating attempting a block level clone to see if I would get lucky.

AlexGG · Oct 31, 2022

HANDLEric said:
Can you sight an official source for this? I circled back through the current documentation and don't see any mention of drive size restrictions for particular RAID levels

There isn't any. All of this is perceived risk balancing. There are no hard numbers in hard drive reliability. If a controlled-environment experiment is ever attempted, the results would be obsolete by the time the experiment is ended. And nobody even knows which variables to control for. The current best practice is "thou shalt not have RAIDZ1, or any other single-redundant array, with large disks". Everyone defines large for themselves.

HANDLEric · Oct 31, 2022

AlexGG said:
There isn't any. All of this is perceived risk balancing. There are no hard numbers in hard drive reliability. If a controlled-environment experiment is ever attempted, the results would be obsolete by the time the experiment is ended. And nobody even knows which variables to control for. The current best practice is "thou shalt not have RAIDZ1, or any other single-redundant array, with large disks". Everyone defines large for themselves.

Fair enough. I guess I just assumed having a hot spare remediated any risks involved, the only reason it didn't really work out here is because the disks were presumed to have failed from the physical move.

Heracles · Oct 31, 2022

HANDLEric said:
Fair enough. I guess I just assumed having a hot spare remediated any risks involved, the only reason it didn't really work out here is because the disks were presumed to have failed from the physical move.

Raid-Z1 plus hot spare is far to be equivalent of Raid-Z2.

See my post about this.

Whattteva · Oct 31, 2022

HANDLEric said:
2) Can you sight an official source for this? I circled back through the current documentation and don't see any mention of drive size restrictions for particular RAID levels.

This is Dell advising against using RAID-5 (sort of RAIDZ1 equivalent with traditional RAID) for any business-critical data. They don't even mention disk sizes, so it's probably pretty safe to assume that the recommendation stands for any size.

rvassar · Oct 31, 2022

Just a thought, because it's often overlooked... ZFS doesn't care where the disk gets plugged in. It tracks the drives by a kind of UUID, and you can scramble the slot positions like shuffling a deck of cards. ZFS will just sort it out at import and carry on. If you have a cable or slot problem, you can move the drives to an unused slot and they will be picked up and imported into the pool.

This also allows for a certain amount of "desperate crazy", ala: stick the drive in a USB enclosure and throw it in the refrigerator. But that will certainly destroy the drive and probably the enclosure too. So let's not go there...

HoneyBadger · Oct 31, 2022

HANDLEric said:
two drives that are reporting unrecoverable errors and are not able to be mounted with a SMART status of "SMART Failure: DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH"

This is a very SAS-sy type of error, and one that I'm normally used to seeing from RAID controllers. ZFS on an HBA should throw up an error similar to "insufficient replicas." Can you confirm that you aren't using any kind of external storage logic?

rvassar · Oct 31, 2022

AlexGG said:
There isn't any. All of this is perceived risk balancing. There are no hard numbers in hard drive reliability. If a controlled-environment experiment is ever attempted, the results would be obsolete by the time the experiment is ended. And nobody even knows which variables to control for. The current best practice is "thou shalt not have RAIDZ1, or any other single-redundant array, with large disks". Everyone defines large for themselves.

Agree here. It comes down to the unrecoverable read error rate. Most HDD drives are in the 10^14 to 10^15 range, which means there's one unreadable sector on every drive larger than X... And that's where the problem lies. The drive manufacturers don't give us enough info to solve for X. Straight math says ~4Tb drives, but this doesn't pan out in practice. There's some weighting factor in the figures, be it age or environment limits, etc... Even if they did give us more info , it would likely only be solved for a uniform configuration of a single production run of one model of drive only. Not even enterprise accounts with fat support contracts run the kind of uniformity. So we're left assessing our risk individually, which is really where the problem needs to be solved.

HoneyBadger said:
This is a very SAS-sy type of error, and one that I'm normally used to seeing from RAID controllers. ZFS on an HBA should throw up an error similar to "insufficient replicas." Can you confirm that you aren't using any kind of external storage logic?

"DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH" could also imply a SAS training failure. A damaged cable leading to a case where the drive is detected, but communications cannot be established reliably enough to perform the discovery handshake. With moved equipment, this can be physical damage, or even a topology mis-config, though the latter should throw an immediate error or simply not present any detection at all.

ChrisRJ · Oct 31, 2022

@HANDLEric , can you please provide the full details of your hardware setup?

garm · Oct 31, 2022

HANDLEric said:
2) Can you sight an official source for this? I circled back through the current documentation and don't see any mention of drive size restrictions for particular RAID levels.

@Arwen 's 2000 posts isnt official enough?...

System Administration - OpenZFS

openzfs.org

Triple-Parity RAID and Beyond - ACM Queue

queue.acm.org

ZFS Primer

Background information about the Zettabyte File System (ZFS).

www.truenas.com

Arwen · Oct 31, 2022

Arwen said:
...

If the 2 failing drives seem completely un-recoverable, perhaps you need to verify that GELI encryption is decrypting those drives correctly. If not, they would appear as garbage to ZFS until GELI encryption is fixed.

...
Good luck, and I hope someone else can help you. (They may also ask for GELI related info, which I don't know...)

AlexGG said:
...
A GELI problem cannot produce a SMART problem, so whatever hardware (disk or cable) problem SMART reports, it comes first.

Thank you for the reminder. I sometimes miss details with long posts.

HANDLEric · Nov 1, 2022

ChrisRJ said:
can you please provide the full details of your hardware setup?

Specs as followed:
General Specs:

TrueNAS Core 13.0u2
Storage Pool:
- 6x 8TB Segate Exos 7E8
- Intel P3700 (Used for log)
- Stripe: RAIDz1
- Pool encrypted with legacy GELI encryption

Hardware Specs:

HP Proliant DL380 G8
512GB RDIMM
OS Volume: 2xSegate IronWolf Pro 256GB Running Hardware managed RAID
Intel P3700 PCI-E SSD (Used for pool log)
Dual Port QLogic 40Gbe QSFP NIC
LSI SAS9200-16e External SAS HBA
PowerVault MD1200 (12x 3.5" Drive Shelf)
(6) 8TB Segate Exos 7E8 Drives

I think that's everything.

HANDLEric · Nov 1, 2022

HoneyBadger said:
This is a very SAS-sy type of error, and one that I'm normally used to seeing from RAID controllers. ZFS on an HBA should throw up an error similar to "insufficient replicas." Can you confirm that you aren't using any kind of external storage logic?

I do see insufficient replicas if I look at the pool status, the SCSI errors are shown in the console output when it tries to bring either of the failed drives online. I did clone one of the failed drives which so far didn't much help but one of the drives seems to be no longer throwing SCSI errors but instead says "The secondary GPT header is not in the last LBA"

gpart Info & Repair Command:

AlexGG · Nov 1, 2022

HANDLEric said:
"The secondary GPT header is not in the last LBA"

This means your clone is larger than the source. I'm not sure how it affects your specific OS, but generally, it is not a data loss case. The data is still there, plus there are some extra zeros (or junk) after the backup (secondary) GPT header.

HANDLEric · Nov 1, 2022

AlexGG said:
This means your clone is larger than the source. I'm not sure how it affects your specific OS, but generally, it is not a data loss case. The data is still there, plus there are some extra zeros (or junk) after the backup (secondary) GPT header.

That's interesting because this is actually being thrown on the initial drive when reattached back to the TrueNAS server. It doesn't appear to be throwing SCSI errors any longer so I feel like if I can overcome this read-only/mounting issue I might be able to get the pool back online.

HoneyBadger · Nov 1, 2022

HANDLEric said:
I do see insufficient replicas if I look at the pool status, the SCSI errors are shown in the console output when it tries to bring either of the failed drives online. I did clone one of the failed drives which so far didn't much help but one of the drives seems to be no longer throwing SCSI errors but instead says "The secondary GPT header is not in the last LBA"
View attachment 59596

Strange that it's called out da12 as "write protected" in the dmesg - the model number doesn't return as a SED drive so it's not a locking issue there. I'm worried that something in the firmware kicked it into a read-only/premature fail state as a "safety measure."

Did it do this for da11 as well?

HANDLEric · Nov 1, 2022

HoneyBadger said:
Strange that it's called out da12 as "write protected" in the dmesg - the model number doesn't return as a SED drive so it's not a locking issue there. I'm worried that something in the firmware kicked it into a read-only/premature fail state as a "safety measure."

Did it do this for da11 as well?

Both paths (da 11 & da12) did, yes.

Important Announcement for the TrueNAS Community.

Data Recovery

Dabbler

MVP

Contributor

Dabbler

Contributor

Dabbler

Wizard

Wizard

Guru

actually does care

Guru

Wizard

Wizard

MVP

Dabbler

Dabbler

Contributor

Dabbler

actually does care

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Data Recovery"

Similar threads