Pool degraded - failed to read SMART Attribute Data

Status
Not open for further replies.

Touche

Explorer
Joined
Nov 26, 2016
Messages
55
Sorry, I don't know why I don't have the HBA listed with the rest of my hardware in my sig. It came with an LSI 9211-8i, I replaced it with an LSI 9240-8i, which was a bunny and a half to remove the IBM firmware it came with so I could flash the correct one. So do you think my issue is the firmware for the HBA? I flashed the latest, P20. I wish I had the option to move them to onboard ports, but that isn't possible in this chassis, since the drives are all connected to a backplane. At this point, I guess I should cancel the RMA on the drive. I don't know what to do to fix this problem though. I just keep rebooting my server every few days when my server isn't responding properly. It's actually causing more problems. Today I started getting spammed every 5 minutes about my UPS not having a connection. While trying to diagnose that problem I saw two drives had faulted. After reboot, my server now has a connection to the UPS again. It's all very frustrating.
I believe a FW fix would do the trick as it seems the controller and/or FreeBSD driver is too quick to fault the drive instead of giving it a second or two to reply properly. FW downgrade is just a shot in the dark that I will try some day.

I too am getting the no UPS connection warnings. About one per day or two.
Here is the thing. I used 6 of the very similar Toshiba DT01ACA200 drives to replace the drives in one of the vdevs in one of my storage pools and after less than 6 months 3 of them had failed. These drives are desktop computer drives and they are NOT suitable to use in a server.
They are not rated for the service and the drives that are showing as failed are really failed. Buy actual NAS drives.
Sorry, but that is kind of BS go-to default reasoning. These drives get trashed far more in my desktop builds in worse conditions, and nothing in the NAS is so special for them to be problematic. This is purely an HBA FW and/or FreeBSD/FreeNAS driver issue. As reported in this and related threads, none of the affected drives are failing and are working without problems when moved from the problematic HBAs.
No. Totally different company and even when it was the same company it was a different design at a different plant.
It's reported as being a rebranded Hitachi HDS723030BLE640, but I don't know the details.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Sorry, but that is kind of BS go-to default reasoning.
Don't give me that. I use Seagate Desktop drives in my server with no problem and I bought 8 of these Toshiba drives and tried to use them. What I told you is based on my personal observations and experience with Toshiba drives in my system. These drives just don't work. I had 6 of them in my system and two as spares. When I bought them, I planned to use them for 3 to 5 years. After less than six months, I had so many failures that I replaced them with Seagate Desktop drives. I am not giving you a, "BS go-to default".
This is purely an HBA FW and/or FreeBSD/FreeNAS driver issue.
No it isn't. The same system (exactly the same system, my system) that the Toshiba drives failed in was running Seagate Desktop drives for five years before the Toshiba dives went in and since I replaced the Toshiba drives with Seagate drives, it has been working for another 6 months with absolutely no errors. The fault is the drive, not the controller or driver, or any other component.
 
Last edited:

Touche

Explorer
Joined
Nov 26, 2016
Messages
55
Don't give me that. I use Seagate Desktop drives in my server with no problem and I bought 8 of these Toshiba drives and tried to use them. What I told you is based on my personal observations and experience with Toshiba drives in my system. These drives just don't work. I had 6 of them in my system and two as spares. When I bought them, I planned to use them for 3 to 5 years. After less than six months, I had so many failures that I replaced them with Seagate Desktop drives. I am not giving you a, "BS go-to default".
This is not a drive failure. The drive is working fine. I would have no problem with the drives failing.
No it isn't. The same system (exactly the same system, my system) that the Toshiba drives failed in was running Seagate Desktop drives for five years before the Toshiba dives went in and since I replaced the Toshiba drives with Seagate drives, it has been working for another 6 months with absolutely no errors. The fault is the drive, not the controller or driver, or any other component.
How do you explain that:
a) the same problem is occurring with Seagate Ironwolf drives
b) the same problem was occuring with WD Reds on LSI 3008 and was resolved by the HBA FW update
c) the same problem was occuring with Seagate Constellation enterprise drives on the LSI 3008 after upgrading to FreeNAS 11 but was resolved with latest FreeNAS versions
d) the problem is not occurring with Intel SATA controller
?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
This is not a drive failure. The drive is working fine. I would have no problem with the drives failing.
The system is reporting the drive as faulted. It isn't responding quickly enough. Probably a TLER type issue. If you did drive burn-in testing on these drives you would probably find bad sectors. The drives I had failed with bad sectors. The last one I pulled had over 500 bad sectors. Having 3 of 8 drives fail within six months of service was enough to convince me.
How do you explain that:
a) the same problem is occurring with Seagate Ironwolf drives
I bet that if the problems were examined closely enough it would be determined that they are similar but not exactly the same and it probably boils down to how quickly the drive gives up on a read and reports back to the controller. The SAS controller is likely defaulted to expect a response back in a certain number of milliseconds and the drive is not responding quickly enough because it is retrying. The timeout may be able to be adjusted in the controller.
d) the problem is not occurring with Intel SATA controller
Different timeout defaults in a SATA controller than in a SAS controller. Simple.

I am not discussing it further. Do what you like.
 

Touche

Explorer
Joined
Nov 26, 2016
Messages
55
The system is reporting the drive as faulted. It isn't responding quickly enough. Probably a TLER type issue. If you did drive burn-in testing on these drives you would probably find bad sectors.

I have, with badblocks and smart tests. They are fine. 6 of 6 would be quite a failure rate.

I bet that if the problems were examined closely enough it would be determined that they are similar but not exactly the same and it probably boils down to how quickly the drive gives up on a read and reports back to the controller. The SAS controller is likely defaulted to expect a response back in a certain number of milliseconds and the drive is not responding quickly enough because it is retrying. The timeout may be able to be adjusted in the controller.

Different timeout defaults in a SATA controller than in a SAS controller. Simple.

I am not discussing it further. Do what you like.
It's simple if you ignore b) and c). I would like LSI or FreeBSD to fix their fault.
 

DGenerateKane

Explorer
Joined
Sep 4, 2014
Messages
95
You are hijacking the thread. Why don't you post your own?
Because a search for the issue I'm having brought me to this thread with the same problem? Why would I post my own topic with the same exact issue?

The system is reporting the drive as faulted. It isn't responding quickly enough. Probably a TLER type issue. If you did drive burn-in testing on these drives you would probably find bad sectors.
So since I already established I am having the same problem, I must have bad drives too? I did proper burn-in tests when I got them in July last year, and guess what, all eight drives passed every test just fine, including not finding a single bad block on a single drive. Are you really trying to tell me I have 8 NAS drives less than a year old that have all failed? Really? P.S. They were sourced from two different retailers with various manufacturers dates. Oh, did I mention a replacement drive from Seagate failed within days? Yeah, all 9 drives are bad. That MUST be the problem.

I believe a FW fix would do the trick as it seems the controller and/or FreeBSD driver is too quick to fault the drive instead of giving it a second or two to reply properly. FW downgrade is just a shot in the dark that I will try some day.

I too am getting the no UPS connection warnings. About one per day or two.

I'd say that's further proof we are dealing with the same problem, and it sure does not look like a drive issue. I even asked Seagate support if there were any issues with this line and they said no, there are not any problems. I'd say it is safe to say the drives alone are not the problem. I certainly would like to not have to monitor my drives an a basically hourly basis at this point. Which I can't do when I'm asleep, and which is when it typically goes down because I didn't reboot it in time. Sometimes it faults so fast I don't even get a warning email before my whole server is offline.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Because a search for the issue I'm having brought me to this thread with the same problem? Why would I post my own topic with the same exact issue?


So since I already established I am having the same problem, I must have bad drives too? I did proper burn-in tests when I got them in July last year, and guess what, all eight drives passed every test just fine, including not finding a single bad block on a single drive. Are you really trying to tell me I have 8 NAS drives less than a year old that have all failed? Really? P.S. They were sourced from two different retailers with various manufacturers dates. Oh, did I mention a replacement drive from Seagate failed within days? Yeah, all 9 drives are bad. That MUST be the problem.



I'd say that's further proof we are dealing with the same problem, and it sure does not look like a drive issue. I even asked Seagate support if there were any issues with this line and they said no, there are not any problems. I'd say it is safe to say the drives alone are not the problem. I certainly would like to not have to monitor my drives an a basically hourly basis at this point. Which I can't do when I'm asleep, and which is when it typically goes down because I didn't reboot it in time. Sometimes it faults so fast I don't even get a warning email before my whole server is offline.
No, you're problem is different. Seagate is clueless.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

sggigante

Cadet
Joined
May 24, 2018
Messages
1
HI,

I have the same problem, but the errors change between disks, it keeps changings disks, i make tests to disks and they don't give errors. Memory test didn't gave errors. change backplane and the beahvior is the same. I'm getting another strange error (maybe just a coincidence but, when i login via ssh sometimes the server changes the host identification).


Freenas 11.1-U4
Chassis - supermicro CSE-826E16-R500LPB
Processor - Intel Xeon E3-1230v6 3.50 GHz,
Motherboard - supermicro MB X11SSL-CF
Memory - 16Gb Samsung ECC
Boot Device - Supermicro 16gb dom
Disks 12 Toshiba MG05ACA800E

I'd appreciate some help
Best regards
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I'm going to close this thread.

If you have a problem, start a new thread and properly explain your problem with as much detail as possible. "I have the same problem" is rarely useful, unless it's a very specific set of circumstances.
 
Status
Not open for further replies.
Top