Raidz1 still resilvering after 2 disk missing

Status
Not open for further replies.

proxl

Cadet
Joined
Sep 29, 2013
Messages
8
I hope someone can help me, i have FreeNAS 8.3 x64 system with Asus C60M1I, 16 gb ram, 6x ST3000DM001 (3 is 9YN166 and 3 is 1CH166) With 1 ZFS pool with raidz1.

While copying data from the NAS it stopped and i notices 1 drives was missing, after a reboot it was back and i did a zpool clear.
Then i started copying data again and it stopped rigth away, and the disk was missing again.

I rebooted with logging option and it said for 2 of the drives: Ata status 51 DRDY Serv err and not ready after 31000ms.
I rebooted again after hanging 1 hour at mountd before i pulled the button, then 1 of the drives was not detectable in bios anymore.

I replaced the offline drive with a new one, replaced and the resilvering started.
After 5 hours i was checking the box and i had lost a second disk, the resilvering still continues.
I rebooted the nas and used the very low level diagnostic tool Victora and it said for both drives: Drive not say DRSC, DRDY or not remove BUSY cannot working

Now its 1 day ago and it stills resilvering with zpool status 5.77T scanned out of 9.66T at 50.2M/s, 22h35m to go 4.50G resilvered, 59.72% done. and with errors: 39083386 data errors.
With 4 of the original drives and new one, and 2 of the original drives on the bench.

I also started to get smartd errors on console: Offline uncorrectable sectors and Currently unreadable (pending) sectors.

The 2 failed drives is PN:9YN166 and the third drive with smartd errors is also 9YN166, and the rest with PN:1CH166 is fine.

The question is, should i stop resilvering?
I'm planning to send 1 or more of the disk to recovery company for repairing the disk, i guess its possible to change PCB, head, motor or something, none of the drives makes any bad noises.
But i heard something like a beep a few times.

What will happend after resilvering is done and i'm able to repair the drives and put them back in?
Are there any hope for getting the zpool volume back so i can backup all the data?

Thanks, proxl
 

proxl

Cadet
Joined
Sep 29, 2013
Messages
8
If i get the hard drives repaired will scrub or resilver process fix up the data errors zpool status is Reporting?

Hope someone can answer.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, there's no solid answer. You are the second person in 24 hours to get screwed with RAIDZ1. It's bad. People need to stop doing it. I don't have that link in my sig that RAIDZ1 is "dead" for my own benefit.

See http://forums.freenas.org/threads/f...ool-2-disks-failed-on-raidz.15308/#post-75406

As you can see, do NOT expect recovery to come cheap(or necessarily at all). Hard drive recovery experts generally don't replace the guts and mail you back the drive. They use recovery tools to recover your data and mail you back the actual files. But, whoops, they don't do ZFS recovery. Well, SOAB. So your options for recovery just dropped significantly.

Good luck!
 

proxl

Cadet
Joined
Sep 29, 2013
Messages
8
Thanks for Your reply, i understand the risks of raidz1 and i guess i will have 2 drives as parity next time.

I allready found a company that can help me, 1000 USD pr drive.

The problem is i dont know what happend when it finished resilvering and i connect the original drives back.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If you do spend the money many people would appreciate it if you would update this thread with your total cost for recovery and how well it goes(or doesn't go). Virtually nobody has any experience with doing actual ZFS recovery and most people won't spend 4 or 5 figures to find out if it is even possible.
 

proxl

Cadet
Joined
Sep 29, 2013
Messages
8
Thanks for your reply, i got a price estimate to repair the drive then i get a cloned drive sent to me. The estimate is 600-1000 USD pr disk.

I turned off the freenas box, it was at 70% resilvering, before i continue i need to know if its possible to resilver again when the missing disk is attached?
Or will it delete all the files with data errors?

someone can help?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
There is no way to know what the end result is. ZFS pretty much relies on you ensuring that there is always enough redundancy to handle any corruption it finds. You've broken that cardinal rule so there's no way to easily tell how things will turn out. I will tell you that ZFS doesn't delete files. But the files could very well be corrupted beyond recognition, and the file system could be corrupted to the point at which it will not mount at all. There have been people with all good drives that trashed their pool to the point of being unmountable just by a loss of power(which is supposed to not allow for corruption, but we'll ignore that for now), and we've had people with very broken drives recover some bits of their data with ddrescue.

If I had to assign a percentage chance of recovery of some data, I'd say you have maybe 50%. It depends on if the issue was that there is physical damage to the platters or if there is a firmware bug. If your platters are physically damaged you have pretty much no hope of seeing your data again. Firmware bugs are totally different and it could go either way.

If I had to assign a percentage chance of recovering all of your data, I'd say you have approximately 0% chance. In fact, I'm going to be fairly surprised if the pool will even mount. I expect that as soon as you try to mount it you are going to see a kernel panic. Just based on what I've seen from other people that were "in the ditch" with ZFS, once you've exhausted all parity data and you know you still have some amount of corruption, things go really bad really fast. ZFS is coded to not deal with corruption and simply report it. If the corruption is in the file system itself, you are in serious trouble.

I will say that nobody I know of or have even heard of from 3rd party has ever attempted data recovery by data recovery professionals like you seem inclined to do. The costs for recovery are quite high and people that are willing to spend that kind of money are usually smart enough to just build a backup system and save themselves the cost and risk associated with doing recovery. Why spend that kind of money for a chance at recovery when you could spend that kind of money and have a reliable backup? As I said above, I talked to a technician at a VERY popular recovery company(name is not being mentioned because I do not want to give the impression that I promote any given company or that there is even a reasonable chance for recovery) about a month ago, and they said recovery has never been done by their company before for ZFS, but for 5 figures they'd see if they could develop a tool to try to recover data. But they insisted that I understand that they have not done it before so they can't provide even a guess as to the chances for recovery.

So your outcome for all of this will definitely be interesting to say the least. I'm really a bit skeptical of your $600 to $1000 price tag though. When I talked to the recovery experts for a particular person I was helping out they assured me up and down they've done ZFS recovery, they had tools that could do it, and that they felt we could get most if not all of our data back(he quoted a price of about $3k-5k for the job). But when the rubber hit the road and we paid them the $500 evaluation fee they realized that they were blowing smoke up our bums. They realized had never done ZFS recovery, their tools were not designed to recover the data, and they weren't sure if they could get any data back. Honestly, I think they thought they could run their standard recovery software and get the data back. ZFS is nothing like any other file system out there, and anyone who things that the standard recovery tools will work with ZFS is just plain ignorant. Well, they eventually admitted that they were wrong, and the new quote was a starting price of $10k and no upper limit except for what we were willing to pay. Of course, the client was out the $500 fee which was an "evaluation" and had no data to show for it. I didn't witness the evaluation, but my guess is they tried to run their standard tool which isn't ZFS, then went " oh f&*(". I was very specific with them with my questions because when they told me that they had done ZFS recovery I thought they were full of it. I told the client I thought they were full of it but the client really wanted the data and was willing to spend the $3-5k for the data. Needless to say, no data was ultimately recovered to my knowledge when they saw a 5 digit figure, realized they were lied to, and kind of recognized that the chance for recovery for the cost was very very slim.
 

vegaman

Explorer
Joined
Sep 25, 2013
Messages
58
Sorry if it's a bit off topic. But I'm trying to decide what's safer - 5 drives in RAIDZ or 10 drives in RAIDZ2?
After seeing these data loss stories though I'm starting to wonder if I should just do 6 drives in RAIDZ2 and end up with 2 empty slots in my server - I guess they could always be used for hot spares or something.

Sent from my GT-I9300 using Tapatalk 4
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If you plan to setup regular scrubs, testing of your hard drives at regular intervals, and maybe a nightly script to email you SMART data on your hard drives like I do, a 10 drive RAIDZ2 isn't bad. 6 drives in a RAIDZ2 is obviously "safer" statistically, but its totally your choice on how you want to do things. I watch my server like a hawk via emals. But I rarely log into it because I've setup a custom script to warn me of any problems, enabled and use the SMART monitoring and testing functions in FreeNAS, and built my server with quality server components.

Edit: Add to that having a spare hard drive on the shelf ready in case of issues so you can do immediate disk replacement is a big plus too. Waiting on RMAs is not a good idea at all and you should be ready to be proactive with disk replacement and replace disks at the first sign of problems.
 

proxl

Cadet
Joined
Sep 29, 2013
Messages
8
The freenas box that is screwed is my backup storage, i started with a massive clean up. Reinstalled my computers, bought new disk for my other freenas box. Wiped 8x 2tb drives. and changed my drives to 4tb. I just started using seafile, private cloud solution. so i have like 10 % of my documents there. I had started to move file by file and completly restructure of my folders and files.

So for some few days i had all my critical data at 1 place, i managed to recover the most important work documents from my laptop, since i also rebuild the 4x64gb Raid0 on my laptop i did not manage to get all, but a lot.

The most important things i need to revover is 20-30000 pictures taken since 1998, and some older job documents i need later, but i have paper versjon of them.

So if i did a risk analyze before the massive cleanup i certanly would make a backup og the most critical data.

I believe i have 4 working of 6 drives in my freenas, i'm planning to do a repair of 1 or 2 of the drives and they will send them back cloned to new drives.
The price for repair and cloning is USD 600-800.
If i send all the drives price tag is about USD 2500-5000, they estimate the 80% chance to recover at lest 80% of the data. They have experience with ZFS they told me, it's a seller so i guess he tell me what i need to know.
If it's impossible to recover i do not pay nothing, if i send all the drives.

You are correct about kernel panic, if i connect to my SMB share i can see the folders, if i try to open 1 file i get kernel panic.
When i did that earlier in the resilver process i could sometimes open a file or 2 before kernel panic.

I'm not sure whats happening while its resilvering, i believed it only wrote data to the new drive? Or does it rewrite data to all the drives?

I was hoping that 4 of the drives is as before, i fix 1 or 2 of the drives that was failing, put them back to freenas that is 70 % finished resilvering with zillion data errors. 1 of the drives is replaced, can i undo the replace and start resilvering from start? Or should i only fix the second failed drive and let it finish the resilvering. Or do i need som ZFS experts to do things from command line to fix it?
 

proxl

Cadet
Joined
Sep 29, 2013
Messages
8
I have decided to set up a new freenas with 8x4tb raidz2 after this experience.
I also recommend to check user reviews before you choose a brand and model.

I always preferred Seagate, and i had Maxtor, Samsung, WD, IBM Deathstar, all of them got click of Death after a while.

This is the first time I had problems with Seagate.

The drives I used does not do very well:

http://www.newegg.com/Product/Product.aspx?Item=22-148-844&SortField=0&SummaryType=0&Pagesize=10&PurchaseMark=&SelectedRating=1&VideoOnlyMark=False&VendorMark=&IsFeedbackTab=true&Keywords=(keywords)&Page=1

I guess i will to a research before i choose new drives
 

vegaman

Explorer
Joined
Sep 25, 2013
Messages
58
Thanks for the reply cyberjock.
I take the missing comment about RAIDZ1 as don't even bother even with just 5 drives lol.
Ok, I think I'll do RAIDZ2, just have to decide whether I do 6 or 10 drives. I'd kind of prefer 10, but it means getting 10 drives before I can start the array and I'm already low on space.
Re server components and such, the only thing I've really gone budget on is the Norco case. I got 2 spare drive trays too so I can have 'cold' spares completely ready to go.

Sent from my GT-I9300 using Tapatalk 4
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If i send all the drives price tag is about USD 2500-5000, they estimate the 80% chance to recover at lest 80% of the data. They have experience with ZFS they told me, it's a seller so i guess he tell me what i need to know.
If it's impossible to recover i do not pay nothing, if i send all the drives.

Well, they told me that same thing. "Oh yeah, we've done ZFS before." They even mentioned things like metaslabs and vdevs so the guy clearly knew what ZFS was. But when it came to recovery he came back with an email of "yeah.. our tools don't work on ZFS". How the hell they went from using ZFS lingo to suddenly having no tools is beyond me. When he told me their tools worked for ZFS, that they had recovered ZFS before, etc I felt like they were full of crap right then and there. Nobody, and I mean nobody, does ZFS recovery. Companies don't pay for it and home users can't afford it. And to be honest, I think they're blowing smoke up your butt too. It would be hilarious if we were both talking about the same company and the same guy.

You are correct about kernel panic, if i connect to my SMB share i can see the folders, if i try to open 1 file i get kernel panic.
When i did that earlier in the resilver process i could sometimes open a file or 2 before kernel panic.

I will tell you from personal experience once you have a questionable pool you should be doing all directory listing, file moving and copying, etc from the command line. Don't try to use network shares(in fact, I recommend you disable the services to minimize potential avenues for more crashes).

I'm not sure whats happening while its resilvering, i believed it only wrote data to the new drive? Or does it rewrite data to all the drives?

Resilvering is simply rereading all of the data on all of the disks, comparing checksums and parity and making sure everything is there and valid. If something isn't valid and enough parity or replicas exist to conduct a repair, then it is repaired. If, on the other hand, you find bad data and there is no parity to repair it, you are in serious trouble. The errors are logged(and you'll be able to see them with zpool status -v) but at that point things get really shaky. If the file is corrupted that's ideal. If the file system is corrupted, start praying to your favorite deity. For many people that lose all parity drives and have corruption, a scrub will kernel panic.

So when you first get access to your pool you should not be doing a scrub. You should be looking at copying all of your data to a new pool(or at least somewhere else) from the command line.


I was hoping that 4 of the drives is as before, i fix 1 or 2 of the drives that was failing, put them back to freenas that is 70 % finished resilvering with zillion data errors. 1 of the drives is replaced, can i undo the replace and start resilvering from start? Or should i only fix the second failed drive and let it finish the resilvering. Or do i need som ZFS experts to do things from command line to fix it?

You can't undo much of anything. ZFS keeps transaction logs for 127 transactions, which for most people is just a few minutes of activity. As for resilvering, I would never have started it. I'd be trying to copy data off from the CLI.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I have decided to set up a new freenas with 8x4tb raidz2 after this experience.
I also recommend to check user reviews before you choose a brand and model.

I always preferred Seagate, and i had Maxtor, Samsung, WD, IBM Deathstar, all of them got click of Death after a while.

This is the first time I had problems with Seagate.

The drives I used does not do very well:

http://www.newegg.com/Product/Product.aspx?Item=22-148-844&SortField=0&SummaryType=0&Pagesize=10&PurchaseMark=&SelectedRating=1&VideoOnlyMark=False&VendorMark=&IsFeedbackTab=true&Keywords=(keywords)&Page=1

I guess i will to a research before i choose new drives

Ive been using 24 WD 2TB and 3TB drives and had only 2 or 3 failures in 3 years of total uptime. I used to use Seagate exclusively for desktops and servers, but they bit me in 2009 and I'll never go back. Sorry, but when I drop $2000 on drives alone for a server and they aren't working right 3 months later I don't buy from you again if you won't fix the problem.
 

RvdKraats

Dabbler
Joined
Aug 8, 2012
Messages
34
My personal experience with Seagate (so far) is positive. Other brands have failed on me (Maxtor, Quantum, WD), but the Seagates always kept purring along.
Even so, there's always a chance of failure, so although I run a low-spec setup (4x 1TB disks) I still use RAID-Z2. The personal stuff that's on there (hard to find software, pics of my young kids, scans of photos that are now gone) is irreplaceable, so I gladly trade the loss of storage space (because of the RAID-Z2) and transfer speed for the assurance that my data is safe.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
My personal experience with Seagate (so far) is positive. Other brands have failed on me (Maxtor, Quantum, WD), but the Seagates always kept purring along.
All the manufacturers suck equally.
The personal stuff that's on there (hard to find software, pics of my young kids, scans of photos that are now gone) is irreplaceable
I take it you have a separate physical backup of the data aside from the NAS?
 

RvdKraats

Dabbler
Joined
Aug 8, 2012
Messages
34
Yes sir, I do :P
I work for a company that installs mid-range and high-end storage solutions, and even with these systems I've seen my share of problems (be it mismanagement or hardware failures). Makes you kinda careful ;)
 

RvdKraats

Dabbler
Joined
Aug 8, 2012
Messages
34
On a sidenote, years ago (and I mean a LONG time ago, I worked for a smaller company then), I happened to be at a new customers' site where they hadn't checked their backup *ever*. Turns out their server was stolen, and I was asked to restore their backup tapes.

They were empty. Nothing had been backed up. Years of CAD drawings gone.

That's the first time I'd seen a person's face (the owner of the shop) actually turn pale gray.
 

proxl

Cadet
Joined
Sep 29, 2013
Messages
8
Update: I managed to clone 550 gb of 3000 GB with ddrescue from one of the failed drives. The speed was slowing down to 100 bytes pr sec so i gave up, when i restarted the clone process with another drive it started with great speeds before it slowed down more and more.

I set up a new installation with clone of the 4 good drives, partial clone of 1 failed drive and 1 partial resilvered disk and it started resilvering again. When it finished i managed to open alot of small files 10-20-30kb, bigger files was not possible to open, and i did not recieve any more kernel panics, so i was able to export directory listing.

Now i sent 2 of the failed drives to data recovery, they will try to repair them and then clone the drives.

I also noticed that the 2 failed drives are made in China, rest of the drives are made in Thailand.
The 2 failed drives has the same model number, same PN, same PCB revision, same PCB number. When i look at bottom at the PCB i can see some differeces, when i take off the PCB and look at the chips i find many differences.

I guess there was some quality problems with the china facility, and next time i get a drive made in china i will return them!

Tough guys don't do backup so i set up a new NAS server with raidz2, no more crying cause of data loss :) knock on wood!
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
Didn't read the whole thread, but that's one of the reasons why mixing drives of different vendors or batches is a good idea. If there is a faulty batch, you won't end up with multiple disks dying at the same time. It would be interesting to know if the recovery company can save the disks, please report back when you hear from them! And good luck! :)
 
Status
Not open for further replies.
Top