(RANT) Disappointed in WD Reds reliability.

Status
Not open for further replies.

soulburn

Contributor
Joined
Jul 6, 2014
Messages
100
Just a rant here. So I have two FreeNAS systems in production that relate to this post. Between the two of them they have 26x WD Red 6 TB drives in total. One system has 14x and the other 12x. These are servers in a climate controlled server room, temps on the drives never touch 40c. I know it's officially not recommended to use this many WD Reds in one enclosure, probably due to vibration among other reasons, so that's on me for not following their guidelines.

Regardless of that, I know it's only my opinion, but I really will never buy WD Red drives again. I've had similar server setups with similar drive amounts and configs with other brand drives that never had failure rates like these WD Reds do. I've had literally 8 die in total over the last 12 months in various systems. Four of them died all within the last two weeks in two separate systems, with three of the four being purchased at different times from different places so they can't be the same batches. The RMA process from WD is easy, so there's at least that. Just voicing my opinion. Here's screen shots of both systems that had drives all fail within a two week span.

System 1: 3 drives all dead within two weeks. Thankfully it hasn't taken down a vdev (yet) and destroyed the whole zvol!
index.php


System 2: 1 drive dead. This system has had 4 drives in total fail in it in the last 12 months
index.php
 

Attachments

  • 2016-11-16 12_45_55-mnm-fn-backup1 - FreeNAS-9.10.1-U2 (f045a8b).png
    26.1 KB · Views: 803
  • 2016-11-16 15_32_46-et-fn - FreeNAS-9.10.1-U4 (ec9a7d3).png
    16.8 KB · Views: 833
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I know it's officially not recommended to use this many WD Reds in one enclosure
It's pure marketing crap, there isn't a shred of evidence to back up what they claim.
I've had literally 8 die in total over the last 12 months in various systems.
Those are Seagate 3TB numbers. You really should investigate other possibilities, such as bad power, because WD Reds are not known for such failure rates.
 

soulburn

Contributor
Joined
Jul 6, 2014
Messages
100
It's pure marketing crap, there isn't a shred of evidence to back up what they claim.

The only thing I can think of is like I said, they're not rated for the vibrations, but don't get me wrong I agree.

Those are Seagate 3TB numbers. You really should investigate other possibilities, such as bad power, because WD Reds are not known for such failure rates.

I was really surprised, too. These are both in Supermicro servers with proper backplanes, connected to fully supported LSI 2008 or 3008 controllers. Power is from redundant Supermicro PSU's and each system has an APC UPS. I really don't know what else I could be doing wrong. Both of these servers are in different geographical locations.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Maybe they're not screwed in tightly? Or not enough screws? I dunno, I'm grasping at straws here.
 

soulburn

Contributor
Joined
Jul 6, 2014
Messages
100
Maybe they're not screwed in tightly? Or not enough screws? I dunno, I'm grasping at straws here.

No I hear ya. I appreciate the help. They're all in there tightly with all the screws in the caddies. For what it's worth upon touch the servers don't really vibrate anymore than any other servers.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
What are their temps and LLC count?

Sent from my Nexus 5X using Tapatalk
 

soulburn

Contributor
Joined
Jul 6, 2014
Messages
100
What are their temps and LLC count?

Sent from my Nexus 5X using Tapatalk

I took out the bad drives to RMA them today, however, here are a few of the remaining WD Red 6 TB drives.

Code:
[root@freenas] ~# smartctl -a /dev/da0 | awk '/Temperature_Celsius/{print $0}' | awk '{print $10 "C"}'
32C
[root@freenas] ~# smartctl -a /dev/da1 | awk '/Temperature_Celsius/{print $0}' | awk '{print $10 "C"}'
33C
[root@freenas] ~# smartctl -a /dev/da2 | awk '/Temperature_Celsius/{print $0}' | awk '{print $10 "C"}'
33C
[root@freenas] ~# smartctl -a /dev/da3 | awk '/Temperature_Celsius/{print $0}' | awk '{print $10 "C"}'
32C
[root@freenas] ~# 

Code:
da0 = 193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   232
da1 = 193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   200
da2 = 193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   212
da3 = 193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   156
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
This rate of failure certainly suggests something else is afoot. As others have said, some kind of power bus problem? I don't know what to tell you, other than to tell you we have literally hundreds of thousands of WD red drives deployed on the userbase's FreeNAS's, and this rate of failure is absolutely unheard of.
 

soulburn

Contributor
Joined
Jul 6, 2014
Messages
100
This rate of failure certainly suggests something else is afoot. As others have said, some kind of power bus problem? I don't know what to tell you, other than to tell you we have literally hundreds of thousands of WD red drives deployed on the userbase's FreeNAS's, and this rate of failure is absolutely unheard of.

I debated even posting this because I didn't want to cause a crapstorm, but this is my real and true data. As I said before, these are both Supermicro servers, located in different geographical locations, both running APC UPS systems, both are in server rooms with climate control that's kept at around 70F. I really just have no idea as to what it could be, but am more than happy to entertain ideas and discussion as to what could be causing them to fail at such a crazy rate.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
What kind of failures do they develop, anyway? Mechanical, surface defects or electronic?

And when were they manufactured, approximately?
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
this rate of failure is absolutely unheard of
My WD Red 2TB failure rate is 25%, but at 1 out of 4 it's meaningless. However, I note that the 6TB model does worse than most in the BackBlaze environment.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
I debating even posting this because I didn't want to cause a crapstorm
I certainly wouldn't say you're inducing a crapstorm. I think we are all legitimately academically curious about what is happening to your drives.
 

soulburn

Contributor
Joined
Jul 6, 2014
Messages
100
What kind of failures do they develop, anyway? Mechanical, surface defects or electronic?

And when were they manufactured, approximately?

To be honest they just dropped out of the array and I pulled them for RMA so I can't tell you unless it's in the FreeNAS logs somewhere?

Manufacture dates are as follows:

10 JUN 15
10 JUN 15
05 DEC 15
21 SEP 15
 

Borja Marcos

Contributor
Joined
Nov 24, 2014
Messages
125
I've got 9 WD Reds. One has given some intermitent problems, the rest are flawless.

One of them is a 2 TB model, installed inside an old Time Capsule and running strong for 2+ years. The other 8 are installed in a pair of HP Microserver Gen8 servers.

In case of mechanical problems, have you purchased all of them from the same dealer? At the same time? Back in the middle 90's I had a terrifying experience with poorly handled hard disks. At that time I designed voice recording systems which are supposed to be left alone and just work for a reasonable amount of time.

One day we begun to suffer disk failures like 2 months after being installed at the customer's location. Which really sucked. No data was lost (the system was designed as a hierarchical storage management and the hard disk was a cache, everything went to tape immediately) but rebuilding required a plane ticket and some time at the customer's premises.

After investigating the issue, I found out that our parts supplier had developed the healthy habit of just tossing the hard disks inside the box with the rest of the parts. No cushioning no nothing.

So I gave an order to the guy who received the orders: Open the box. If there is a single hard disk without proper packaging, close the box and return the order. No matter it's a 10,000 € order with a single hard disk among other parts. Return it and tell the financial department that it's a DoA, do not pay.

First order arrived, was immediately returned. Next day of course someone from the dealer called me, quite surprised. I told him that poorly packed goods in an order were not acceptable and that I could quote him a reasonable engineering fee for parts testing and certification (reasonable as in the 100+ euro/hour for that work). He told me that they had new staff in logistics, they didn't know how to pack them... of course that was none of my business.

Three large orders rejected later, they learned (hooray!) to wrap the disks in some bubble wrap. Know what? Mysterious disk failures didn't happen again.

Moral of this story is: traditionally it's been considered good advice to avoid using disks from the same batch/supplier when setting up storage systems. You don't know wether a box full of disks has been mistreated in transit to your supplier, for example. And I have seen cases of buying a disk at a store, returning it after a failure only to see the replacement fail in the same way...
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
I became convinced long ago that rough treatment of hard disk
drives during shipment has a very serious effect on failure rates
from all manufacturers, with vibration issues bringing up a
close second.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Between my friends and I we have around 50 WD reds. After burn in they usually never fail unexpectedly. Most are 2-3 years old now.

Sent from my Nexus 5X using Tapatalk
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@soulburn

One thing to check is the power connections to the disk backplanes. Some disk backplanes have multiple
power connectors. Using 4 out of 5 may not be enough at high ampere loads, (all disks seeking at once).

Further, if you have all the power connections populated, check the power supplies' rating for EACH of
the backplane's power cables. It's possible that one or more is reduced current compared to the others.

That said, it's un-likely to be your problem. But, this is a more straight forward investigation and does not
require too much time.
 

soulburn

Contributor
Joined
Jul 6, 2014
Messages
100
@soulburn

One thing to check is the power connections to the disk backplanes. Some disk backplanes have multiple
power connectors. Using 4 out of 5 may not be enough at high ampere loads, (all disks seeking at once).

Further, if you have all the power connections populated, check the power supplies' rating for EACH of
the backplane's power cables. It's possible that one or more is reduced current compared to the others.

That said, it's un-likely to be your problem. But, this is a more straight forward investigation and does not
require too much time.

Thanks for the ideas. I checked the molex connectors for both servers when I puled the drives, and they're all good. As for the PSU's, these are both Supermicro servers with redundant 1000 and 1400 watt PSU's that are designed for their respective chassis. I'm all out of ideas. I'll keep everyone updated if I figure anything out
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Well for what's it's worth, I've have 60 3TB WD Red drives in production for just over 3 years now and have had to RMA 4 of them due to failed smart tests. Did you burn-in the drives before putting the arrays into production?
 
Status
Not open for further replies.
Top