Hard Drive Troubleshooting - Massive Failures - Need Help Isolating the Problem(s)

Status
Not open for further replies.

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Your top level checksum errors are over 6,000 and you are showing top-level metadata damage.

Backup whatever you must not lose off that pool immediately. If it reboots or fails, move on.

Once the backup is complete or failed, shut the system down. There is no point waiting for a resilver that will never finish. Trying to remove or replace drives will just make your situation worse, because a resilver can not complete. Remove the drives for the main pool only.(Make sure you don't screw up.) Power on, and see whether the drives are all there and what condition the pool is in. Be ready for a funeral.
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
Whatever caused this, all except one of the drives you are using seem ok. As to what did cause it and what you can do, I defer to those more knowledgeable.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
What a hot mess. You don't have regular smart tests set up as evidenced by your smart output. You have a multitude of CRC errors which indicate a cabling or power supply problem as other have already stated. DA7 is dead and needs to be replaced.

You're going to have to do your best to salvage what data you can and rebuild this pool. Do some reading and educate yourself on how to manage your server as well as set up regular smart tests and scrubs. All the information is available here on this site.

Good luck.
 

arameen

Contributor
Joined
Sep 4, 2014
Messages
145
  • No non-standard thing were done when trying to fix all this issues as written before. I did not use the CLI to replaced any drive or make up own solutions, i hardly know my way around in FreeNAS. I use almost only the GUI.
  • Scrubs have been done regularly by schedule, every second week from day 1 i built the server, never any issue while scrubbing.
  • SMART tests have been executed occasionally, but I don't know how to interpret the SMART results. Looked for guides both on the forum and online regarding how to interpret SMART results. What values are ok and not ok. found nothing, if there is anything I missed so please point me there instead of writing something negative.
  • Please enlighten me if there are some other guides I missed that could help in the future.

I provided SMART results for the drives, do they all look ok ? (except for the dying da7)
 
Last edited by a moderator:

Zwck

Patron
Joined
Oct 27, 2016
Messages
371
No help from me, unfortunately. @Jailer Jailer do you have a recommended reading section or tutorial to deal with server management. Because, the "stuff" you find on the interwebs is quite dense, maybe you have a favorite pdf, or ppt, or similar that could guide us.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762

  • SMART tests have been executed occasionally, but I don't know how to interpret the SMART results. Looked for guides both on the forum and online regarding how to interpret SMART results. What values are ok and not ok. found nothing, if there is anything I missed so please point me there instead of writing something negative.
  • Please enlighten me if there are some other guides I missed that could help in the future.

https://forums.freenas.org/index.ph...bleshooting-guide-all-versions-of-freenas.17/
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
Perhaps FreeNAS isn't for you. in your original post you said that you nearly lost your data before.

then you said "SMART tests have been executed occasionally, but I don't know how to interpret the SMART results."

1. if I'd nearly lost all my data I'd definitely learn how to interpret the smart results - you haven't.

2. You need to do your research, you can't just hope that somebody will save your data again.

3. It's not FreeNAS that's too difficult, it's a lack of reading.

4. You've spent a fortune on HDDs - you should invest some time on how to manage the server properly/safely.
 
Last edited by a moderator:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
SMART tests have been executed occasionally
Why only occasionally? Why haven't they been set up on a schedule? Scrubs tell you nothing about the health of your disks.
 

arameen

Contributor
Joined
Sep 4, 2014
Messages
145
Why only occasionally? Why haven't they been set up on a schedule? Scrubs tell you nothing about the health of your disks.

Its not like I knew about the need of this to be honest, now reading your guide about setting this to be done regulary so can do it once i start over with my pool :)
 

arameen

Contributor
Joined
Sep 4, 2014
Messages
145
It's a helpful guide, but I can't take credit for it--I just host it on my server.
if there is a better one, more complete, or some other I should know about I gladly have links to those :)
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Jailer do you have a recommended reading section or tutorial to deal with server management
The resource section at the top of the page is a good place to start. The search function will help you find more specific information.
 

arameen

Contributor
Joined
Sep 4, 2014
Messages
145
Perhaps FreeNAS isn't for you. in your original post you said that you nearly lost your data before.

then you said "SMART tests have been executed occasionally, but I don't know how to interpret the SMART results."

1. if I'd nearly lost all my data I'd definitely learn how to interpret the smart results - you haven't.

2. You need to do your research, you can't just hope that somebody will save your data again.

3. It's not FreeNAS that's too difficult, it's a lack of reading.

4. You've spent a fortune on HDDs - you should invest some time on how to manage the server properly/safely.

Sure, good points. lets for a moment forget my problem and how to fix it, and instead explain and comment your thoughts.
I am sure everyone started somewhere and learned with time.
When I started with FreeNAS few years ago I found very few guides. And it wasn't and isn't timepossible to learn by reading every post and thread in the forums. That will take ages.
Sure I invested lots of time already before, when searching forums to learn few commands and did learn a little.
On the other side the forum seems have developed, when it comes to guides that I see more of now. The one you pointed me to is very good, I am reading it already and trying to read my drives health (they look ok so far).
I havn't been visiting the forums regulary because there hasn't been a need for that, propably like many others. I had enough skills to manage what I do, and it has been only using FreeNAS as storage container with ZFS. Not using it for much more than just storing data with health controll. And that has been working fine. I mean you are supposed to be able to handle most of the isuess through GUI. That is why I have choosen FreeNAS. I never knew I will be need to invest lots of time and It would be hard to find answers when problem occurs. Sure its easier now with some more guides and help from some people taking their time trying to help me.
At the same time, there is no better alternative as I see it for a NAS with a filesystem that is selfhealing. I already did that research. I had NAS4Free for a few years before I moved to FreeNAS instead.
Storing most of my data on windows is not an alternative, I have seen datacorruption several times with my own eyes on windows disks and that is why I desire ZFS. I already invested time and lots of money, prepared to invest more time to avoid problems in the future. That means read this and more guides I find or get tip about :p
Regarding previous time, it was a combination of issues including encryption and that made it just more complicated to solve. However thanks to @rs225 (Many thanks) it worked out and I got all of my data out, despite some other users writing at once "its over", "you lost your pool", instead of trying to help.
I can add that I am not using any encryption anymore with FreeNAS.
Now that I commented and explained this, I hope everyone else got answers too and we can get back to this current case and problem.

This time there seems to be a serious hardware failure or just not the regular easy solved case. And that is not easy to figure out regardless if you are a beginner or experienced. I don't have tripple sets of components (except for harddrives) and can not just switch components to narrow the issue to one component/s. That is why I am here asking for help from the kind users. I am not expecting someone somehow saves my data. I just appreciate all the help I can get and all advice about the future and guides I can find to learn more about FreeNAS :)
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
As a matter of interest, have you got the right firmware on the M1015?
 

arameen

Contributor
Joined
Sep 4, 2014
Messages
145
yes I did that update some times ago. FreeNAS does warn if something if the drive is outdated. and its set in IT mode according to what is written in the forums :)
It has been running ok since I bought it, this issues started recently.
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
yes I did that update some times ago. FreeNAS does warn if something if the drive is outdated. and its set in IT mode according to what is written in the forums :)
It has been running ok since I bought it, this issues started recently.
It doesn't warn now, since the makers broke the rather straightforward identity between driver version and firmware version (or so I hear, I don't use one). What version of FreeNAS and what firmware version do you use?
 

Zwck

Patron
Joined
Oct 27, 2016
Messages
371
The resource section at the top of the page is a good place to start. The search function will help you find more specific information.

The Resource section is a great way to start indeed, unfortunately, there is not (at least i could not find any) common practice to server management, I'd love to see some debate about how a FreeNAS calendar should look like. Maybe you can post whats happening on your servers on a daily basis. As i mentioned i have 0 experience on all of this stuff. Besides Burn-In don't do anything. :( so my calendar is is quite empty :)
 

arameen

Contributor
Joined
Sep 4, 2014
Messages
145
It doesn't warn now, since the makers broke the rather straightforward identity between driver version and firmware version (or so I hear, I don't use one). What version of FreeNAS and what firmware version do you use?

Hmm, then i missed that part. I know i did an upgrade months ago, dont remember the numbers for that one. It was really confusing that time and hard to find drivers to update from for it. had to do it on my windowsmachine. Anyway when it was done the warning was gone and it worked fine. that was then

As of now, I am using FreeNAS-11.0-U4 (54848d13b)
The driverversion for the M1015 is seen during bootup and I can not reboot now. Or is it possible to see the driver version somewhere in the GUI ?
I dont know my way around with commands to read it, but I guess it can be read from the CLI somewhere ?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
sas2flash -listall will show you everything about your hba.

Also I think all of your drives are dead except for da6. You need to run smart test automatically and you don't have to understand anything about the smart output. You will be emailed if something fails or is bad, then you can post the error message if it doesn't make sense.

FreeNAS is very hands off, after setting up emails, ups, scrubs, smart tests and snapshots I have not touched mine in years. I'll do an upgrade every now and then and i get a smart email about drive temps in the summer but that is about the only things i have done.

You also need to read every link in my signature before using FreeNAS again.
 
Last edited by a moderator:

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
The Resource section is a great way to start indeed, unfortunately, there is not (at least i could not find any) common practice to server management, I'd love to see some debate about how a FreeNAS calendar should look like. Maybe you can post whats happening on your servers on a daily basis. As i mentioned i have 0 experience on all of this stuff. Besides Burn-In don't do anything. :( so my calendar is is quite empty :)
That is very difficult given the umpteen number of options that are available in every server. For eg. Some server boards have IPMI, some don't. Some have multi-processors, some don't. Some have multiple FreeNAS systems using replication, some don't. So on and so forth...

But the basic guideline (once the server is built and is up and running) is pretty clear.
  1. You need to have SMART running in regular intervals (I run short every day, long every 3 days)
  2. You need to have regular scrubs of your pool. (My boot pool is every 5 days, tank is every 12 days)
  3. You need to have regular snapshots set up in case shit hits the fan.
  4. You ABSOLUTELY must set up the email in FreeNAS so you can be informed of the various things that FreeNAS is doing.
  5. Set up ssh, so you have access to the box in case the GUI doesn't work for some reason
Once all that is done, you probably won't even need to login to the GUI or into the freenas box.
 
Status
Not open for further replies.
Top