How to rescue files from a failing drive with ZFS?

85holmberg · Dec 10, 2014

Hi!

I have searched both this forum and google, but i can’t find a solution to my problem…

I’m runing FreeNAS 9.1.1 with ZFS file system (not in raid). One of my drives seems to be failing the gui gives me this warning:

Code:

WARNING: The volume Saber1 (ZFS) status is UNKNOWN:

I have tried to Scrub the volume in command line, and this takes VERY long time. 44.90% are done and it has been doing this for 6 days.

Trying to copy files from the disk to another does also take ages, it seems like the system stop and retries reading for a very long time for every faulty cluster.

I can read this in console:

Code:

sabertooth kernel: (ada2:ahcich5:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 e0 c0 10 40 d3 00 00 01 00 00
Dec 10 14:33:23 Sabertooth kernel: (ada2:ahcich5:0:0:0): CAM status: ATA Status Error
Dec 10 14:33:23 Sabertooth kernel: (ada2:ahcich5:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
Dec 10 14:33:23 Sabertooth kernel: (ada2:ahcich5:0:0:0): RES: 41 40 e0 c0 10 40 d3 00 00 00 00
Dec 10 14:33:23 Sabertooth kernel: (ada2:ahcich5:0:0:0): Retrying command

Is there any way to just copy the files thats not ar corrupted and directly skip all faulty files?

Thanks in advance for your help :)

cyberjock · Dec 10, 2014

Short answer: Nope.

Long answer: Not exactly. You could script it with a list of corrupt file. But you'd have to know the list of corrupt files first. That means you've done a scrub, which won't necessarily complete at all if you have corruption.

The real answer is to make sure you never have corruption and then you'll never have a problem. Of course it sounds like this was a single disk pool, so you have no redundancy. But now you see why we don't recommend people do non-redundant pools.

85holmberg · Dec 10, 2014

Thats correct, it is a single disk pool, and I know it'a not recomended. As i'm aware of this, i don't used this pool for "very important files".

But isn't it any kind of solution/tools that can help in this situation? To me sounds like a rather simple thing, that some software should have a function for, like a file copy command/script that's totally skips the whole file if any error acurs.

Is waiting for an cp/rsync (which maybe takes weeks) or manually copying files The only way?

Whats happens If i clone The drive with "dd" and then copying files from The clone? Will this also take ages?

Or trying to "repair" the drive with seatools or similar?

cyberjock · Dec 10, 2014

No, there's no software for that because ZFS wasn't really expecting single-disk type setups. While supported it was pretty much expected that if you are doing single disk you are also saying you don't care about the data, so no tool was created.

I'd try a cp (rsync would probably be a trainwreck). But if the disk is failing there's no telling if/when it will finish. It's possible the pool will reach some corrupted part that crashes zfs first.

dd will run into the bad blocks and have problems too. You might be able to get away with ddrescue, but again, no guarantee it will finish or if it will work.

Once you are at the point that the disk is bad, you likely have some kind of mechanical problem. There is no fixing it. You normally replace the disk and life goes on. The mechanical problems are going to make all recovery efforts pretty slow. Maybe not 6 days slow, but slow nonetheless.

rs225 · Dec 10, 2014

export the pool, run ddrescue to image the drive to a new disk, then try to recover your data from that new drive.

85holmberg · Dec 10, 2014

rs225 said:
export the pool, run ddrescue to image the drive to a new disk, then try to recover your data from that new drive.

Ok, what tools/methods do you suggest to use when continuing the recovery on the new disc?

Starpulkka · Dec 10, 2014

Well isn't it easier and faster fix data from backups to a proper raid6 or raidz2 or butterplay. But if i would be in same deep deep under puddle, i would first buy and test new hdd, then ddrescue that bad hdd to new good hdd and the brobably look what checksum it uses and set a time and checksum off and then mount and copy. After that maby do even zdb. If success yay if not, maby go to ubuntu forums look how much ubuntu causes broblems its users every day, just to get my mind happier.

Code:

zfs get checksum name
zfs set checksum=off name
zfs set atime=off

Important Announcement for the TrueNAS Community.

How to rescue files from a failing drive with ZFS?

85holmberg

Cadet

cyberjock

Inactive Account

85holmberg

Cadet

cyberjock

Inactive Account

rs225

Guru

85holmberg

Cadet

Starpulkka

Contributor

Similar threads

Important Announcement for the TrueNAS Community.

How to rescue files from a failing drive with ZFS?

85holmberg

Cadet

cyberjock

Inactive Account

85holmberg

Cadet

cyberjock

Inactive Account

rs225

Guru

85holmberg

Cadet

Starpulkka

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "How to rescue files from a failing drive with ZFS?"

Similar threads