Pool degraded - failed to read SMART Attribute Data

Touche · Apr 27, 2018

DGenerateKane said:
Sorry, I don't know why I don't have the HBA listed with the rest of my hardware in my sig. It came with an LSI 9211-8i, I replaced it with an LSI 9240-8i, which was a bunny and a half to remove the IBM firmware it came with so I could flash the correct one. So do you think my issue is the firmware for the HBA? I flashed the latest, P20. I wish I had the option to move them to onboard ports, but that isn't possible in this chassis, since the drives are all connected to a backplane. At this point, I guess I should cancel the RMA on the drive. I don't know what to do to fix this problem though. I just keep rebooting my server every few days when my server isn't responding properly. It's actually causing more problems. Today I started getting spammed every 5 minutes about my UPS not having a connection. While trying to diagnose that problem I saw two drives had faulted. After reboot, my server now has a connection to the UPS again. It's all very frustrating.

I believe a FW fix would do the trick as it seems the controller and/or FreeBSD driver is too quick to fault the drive instead of giving it a second or two to reply properly. FW downgrade is just a shot in the dark that I will try some day.

I too am getting the no UPS connection warnings. About one per day or two.

Chris Moore said:
Here is the thing. I used 6 of the very similar Toshiba DT01ACA200 drives to replace the drives in one of the vdevs in one of my storage pools and after less than 6 months 3 of them had failed. These drives are desktop computer drives and they are NOT suitable to use in a server.
They are not rated for the service and the drives that are showing as failed are really failed. Buy actual NAS drives.

Sorry, but that is kind of BS go-to default reasoning. These drives get trashed far more in my desktop builds in worse conditions, and nothing in the NAS is so special for them to be problematic. This is purely an HBA FW and/or FreeBSD/FreeNAS driver issue. As reported in this and related threads, none of the affected drives are failing and are working without problems when moved from the problematic HBAs.

Chris Moore said:
No. Totally different company and even when it was the same company it was a different design at a different plant.

It's reported as being a rebranded Hitachi HDS723030BLE640, but I don't know the details.

Chris Moore · Apr 27, 2018

Touche said:
Sorry, but that is kind of BS go-to default reasoning.

Don't give me that. I use Seagate Desktop drives in my server with no problem and I bought 8 of these Toshiba drives and tried to use them. What I told you is based on my personal observations and experience with Toshiba drives in my system. These drives just don't work. I had 6 of them in my system and two as spares. When I bought them, I planned to use them for 3 to 5 years. After less than six months, I had so many failures that I replaced them with Seagate Desktop drives. I am not giving you a, "BS go-to default".

Touche said:
This is purely an HBA FW and/or FreeBSD/FreeNAS driver issue.

No it isn't. The same system (exactly the same system, my system) that the Toshiba drives failed in was running Seagate Desktop drives for five years before the Toshiba dives went in and since I replaced the Toshiba drives with Seagate drives, it has been working for another 6 months with absolutely no errors. The fault is the drive, not the controller or driver, or any other component.

Touche · Apr 27, 2018

Chris Moore said:
Don't give me that. I use Seagate Desktop drives in my server with no problem and I bought 8 of these Toshiba drives and tried to use them. What I told you is based on my personal observations and experience with Toshiba drives in my system. These drives just don't work. I had 6 of them in my system and two as spares. When I bought them, I planned to use them for 3 to 5 years. After less than six months, I had so many failures that I replaced them with Seagate Desktop drives. I am not giving you a, "BS go-to default".

This is not a drive failure. The drive is working fine. I would have no problem with the drives failing.

No it isn't. The same system (exactly the same system, my system) that the Toshiba drives failed in was running Seagate Desktop drives for five years before the Toshiba dives went in and since I replaced the Toshiba drives with Seagate drives, it has been working for another 6 months with absolutely no errors. The fault is the drive, not the controller or driver, or any other component.

How do you explain that:
a) the same problem is occurring with Seagate Ironwolf drives
b) the same problem was occuring with WD Reds on LSI 3008 and was resolved by the HBA FW update
c) the same problem was occuring with Seagate Constellation enterprise drives on the LSI 3008 after upgrading to FreeNAS 11 but was resolved with latest FreeNAS versions
d) the problem is not occurring with Intel SATA controller
?

Chris Moore · Apr 27, 2018

Touche said:
This is not a drive failure. The drive is working fine. I would have no problem with the drives failing.

The system is reporting the drive as faulted. It isn't responding quickly enough. Probably a TLER type issue. If you did drive burn-in testing on these drives you would probably find bad sectors. The drives I had failed with bad sectors. The last one I pulled had over 500 bad sectors. Having 3 of 8 drives fail within six months of service was enough to convince me.

Touche said:
How do you explain that:
a) the same problem is occurring with Seagate Ironwolf drives

I bet that if the problems were examined closely enough it would be determined that they are similar but not exactly the same and it probably boils down to how quickly the drive gives up on a read and reports back to the controller. The SAS controller is likely defaulted to expect a response back in a certain number of milliseconds and the drive is not responding quickly enough because it is retrying. The timeout may be able to be adjusted in the controller.

Touche said:
d) the problem is not occurring with Intel SATA controller

Different timeout defaults in a SATA controller than in a SAS controller. Simple.

I am not discussing it further. Do what you like.

Touche · Apr 27, 2018

Chris Moore said:
The system is reporting the drive as faulted. It isn't responding quickly enough. Probably a TLER type issue. If you did drive burn-in testing on these drives you would probably find bad sectors.

I have, with badblocks and smart tests. They are fine. 6 of 6 would be quite a failure rate.

I bet that if the problems were examined closely enough it would be determined that they are similar but not exactly the same and it probably boils down to how quickly the drive gives up on a read and reports back to the controller. The SAS controller is likely defaulted to expect a response back in a certain number of milliseconds and the drive is not responding quickly enough because it is retrying. The timeout may be able to be adjusted in the controller.

Different timeout defaults in a SATA controller than in a SAS controller. Simple.

I am not discussing it further. Do what you like.

It's simple if you ignore b) and c). I would like LSI or FreeBSD to fix their fault.

DGenerateKane · Apr 27, 2018

Chris Moore said:
You are hijacking the thread. Why don't you post your own?

Because a search for the issue I'm having brought me to this thread with the same problem? Why would I post my own topic with the same exact issue?

Chris Moore said:
The system is reporting the drive as faulted. It isn't responding quickly enough. Probably a TLER type issue. If you did drive burn-in testing on these drives you would probably find bad sectors.

So since I already established I am having the same problem, I must have bad drives too? I did proper burn-in tests when I got them in July last year, and guess what, all eight drives passed every test just fine, including not finding a single bad block on a single drive. Are you really trying to tell me I have 8 NAS drives less than a year old that have all failed? Really? P.S. They were sourced from two different retailers with various manufacturers dates. Oh, did I mention a replacement drive from Seagate failed within days? Yeah, all 9 drives are bad. That MUST be the problem.

Touche said:
I believe a FW fix would do the trick as it seems the controller and/or FreeBSD driver is too quick to fault the drive instead of giving it a second or two to reply properly. FW downgrade is just a shot in the dark that I will try some day.

I too am getting the no UPS connection warnings. About one per day or two.

I'd say that's further proof we are dealing with the same problem, and it sure does not look like a drive issue. I even asked Seagate support if there were any issues with this line and they said no, there are not any problems. I'd say it is safe to say the drives alone are not the problem. I certainly would like to not have to monitor my drives an a basically hourly basis at this point. Which I can't do when I'm asleep, and which is when it typically goes down because I didn't reboot it in time. Sometimes it faults so fast I don't even get a warning email before my whole server is offline.

Chris Moore · Apr 27, 2018

DGenerateKane said:
Because a search for the issue I'm having brought me to this thread with the same problem? Why would I post my own topic with the same exact issue?

So since I already established I am having the same problem, I must have bad drives too? I did proper burn-in tests when I got them in July last year, and guess what, all eight drives passed every test just fine, including not finding a single bad block on a single drive. Are you really trying to tell me I have 8 NAS drives less than a year old that have all failed? Really? P.S. They were sourced from two different retailers with various manufacturers dates. Oh, did I mention a replacement drive from Seagate failed within days? Yeah, all 9 drives are bad. That MUST be the problem.

I'd say that's further proof we are dealing with the same problem, and it sure does not look like a drive issue. I even asked Seagate support if there were any issues with this line and they said no, there are not any problems. I'd say it is safe to say the drives alone are not the problem. I certainly would like to not have to monitor my drives an a basically hourly basis at this point. Which I can't do when I'm asleep, and which is when it typically goes down because I didn't reboot it in time. Sometimes it faults so fast I don't even get a warning email before my whole server is offline.

No, you're problem is different. Seagate is clueless.

Sent from my SAMSUNG-SGH-I537 using Tapatalk

sggigante · May 24, 2018

HI,

I have the same problem, but the errors change between disks, it keeps changings disks, i make tests to disks and they don't give errors. Memory test didn't gave errors. change backplane and the beahvior is the same. I'm getting another strange error (maybe just a coincidence but, when i login via ssh sometimes the server changes the host identification).

Freenas 11.1-U4
Chassis - supermicro CSE-826E16-R500LPB
Processor - Intel Xeon E3-1230v6 3.50 GHz,
Motherboard - supermicro MB X11SSL-CF
Memory - 16Gb Samsung ECC
Boot Device - Supermicro 16gb dom
Disks 12 Toshiba MG05ACA800E

I'd appreciate some help
Best regards

Chris Moore · May 24, 2018

sggigante said:
I have the same problem,

No, you have a different problem.

sggigante said:
when i login via ssh sometimes the server changes the host identification

Strange problem, but absolutely different. Please submit a bug report.

Ericloewe · May 24, 2018

I'm going to close this thread.

If you have a problem, start a new thread and properly explain your problem with as much detail as possible. "I have the same problem" is rarely useful, unless it's a very specific set of circumstances.

Important Announcement for the TrueNAS Community.

Pool degraded - failed to read SMART Attribute Data

Touche

Explorer

Chris Moore

Hall of Famer

Touche

Explorer

Chris Moore

Hall of Famer

Touche

Explorer

DGenerateKane

Explorer

Chris Moore

Hall of Famer

sggigante

Cadet

Chris Moore

Hall of Famer

Ericloewe

Server Wrangler

Similar threads

Important Announcement for the TrueNAS Community.

Pool degraded - failed to read SMART Attribute Data

Explorer

Hall of Famer

Explorer

Hall of Famer

Explorer

Explorer

Hall of Famer

Cadet

Hall of Famer

Server Wrangler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Pool degraded - failed to read SMART Attribute Data"

Similar threads