Failing HDD or SAS card?

Status
Not open for further replies.

petr

Contributor
Joined
Jun 13, 2013
Messages
142
I am starting to see errors coming from one of my drives (see screenshot please).

As it is a second drive failure this month, I am wondering - could these error correspond to a failing SAS card? What are the usual error messages of each?
 

Attachments

  • Screen Shot 2015-08-14 at 17.22.25.png
    Screen Shot 2015-08-14 at 17.22.25.png
    390.7 KB · Views: 191

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
We'll need more information (smartctl -a /dev/adaWhatever).

Possibilities:
  • Bad drives
  • SAS2 driver/firmware mismatch
  • Bad cables
  • Bad backplane
  • Bad controller
  • Bad expander
 

petr

Contributor
Joined
Jun 13, 2013
Messages
142
Yes, that is my thinking as well. The drives are scattered along different cables and enclosures (had 3 failures so far). Firmware was OK on the card, no expander in use - cables directly from the M1015, which I also replaced.

I actually had a spare M1015 laying around, which I now flashed and I am trying to see if that was the problem.

I've attached the 3 drives now that previously had problems, please have a look as I am starting to think they may actually be OK! That would be some money saved!

The only place where I can see anything significant is the UDMA_CRC_Error_Count.
 

Attachments

  • da0.txt
    4.6 KB · Views: 213
  • da1.txt
    6.1 KB · Views: 178
  • da2.txt
    9.2 KB · Views: 277

petr

Contributor
Joined
Jun 13, 2013
Messages
142
Ok, that is weird - wit the new M1015 card, I am seeing very similar problem (please see log file). I've created a mirror with those 3 drives, then started to write some junk data (disabled compression to actually write something to the drives). When I checked the pool status, all but 1 drive were offline / had too many errors.

What else could be the problem? I am now tempted to move the card into another PCIexp slot otherwise I may start thinking it's the motherboard or something.

Could a bad drive throw off the whole system?

EDIT: Moved the card one slot up, all 3 drives now came back up. Started a few dd from zero to null to keep the CPU busy, started scrub and started generating more junk data.. let's see. Also plugged in back all of the other drives to simulate heat/load environment (though they are not mounted).
 

Attachments

  • freenas_log.txt
    25.8 KB · Views: 200
Last edited:
Status
Not open for further replies.
Top