(RANT) Disappointed in WD Reds reliability.

soulburn · Nov 16, 2016

Just a rant here. So I have two FreeNAS systems in production that relate to this post. Between the two of them they have 26x WD Red 6 TB drives in total. One system has 14x and the other 12x. These are servers in a climate controlled server room, temps on the drives never touch 40c. I know it's officially not recommended to use this many WD Reds in one enclosure, probably due to vibration among other reasons, so that's on me for not following their guidelines.

Regardless of that, I know it's only my opinion, but I really will never buy WD Red drives again. I've had similar server setups with similar drive amounts and configs with other brand drives that never had failure rates like these WD Reds do. I've had literally 8 die in total over the last 12 months in various systems. Four of them died all within the last two weeks in two separate systems, with three of the four being purchased at different times from different places so they can't be the same batches. The RMA process from WD is easy, so there's at least that. Just voicing my opinion. Here's screen shots of both systems that had drives all fail within a two week span.

System 1: 3 drives all dead within two weeks. Thankfully it hasn't taken down a vdev (yet) and destroyed the whole zvol!

System 2: 1 drive dead. This system has had 4 drives in total fail in it in the last 12 months

Ericloewe · Nov 16, 2016

soulburn said:
I know it's officially not recommended to use this many WD Reds in one enclosure

It's pure marketing crap, there isn't a shred of evidence to back up what they claim.

soulburn said:
I've had literally 8 die in total over the last 12 months in various systems.

Those are Seagate 3TB numbers. You really should investigate other possibilities, such as bad power, because WD Reds are not known for such failure rates.

soulburn · Nov 16, 2016

Ericloewe said:
It's pure marketing crap, there isn't a shred of evidence to back up what they claim.

The only thing I can think of is like I said, they're not rated for the vibrations, but don't get me wrong I agree.

Ericloewe said:
Those are Seagate 3TB numbers. You really should investigate other possibilities, such as bad power, because WD Reds are not known for such failure rates.

I was really surprised, too. These are both in Supermicro servers with proper backplanes, connected to fully supported LSI 2008 or 3008 controllers. Power is from redundant Supermicro PSU's and each system has an APC UPS. I really don't know what else I could be doing wrong. Both of these servers are in different geographical locations.

Ericloewe · Nov 16, 2016

Maybe they're not screwed in tightly? Or not enough screws? I dunno, I'm grasping at straws here.

soulburn · Nov 16, 2016

Ericloewe said:
Maybe they're not screwed in tightly? Or not enough screws? I dunno, I'm grasping at straws here.

No I hear ya. I appreciate the help. They're all in there tightly with all the screws in the caddies. For what it's worth upon touch the servers don't really vibrate anymore than any other servers.

SweetAndLow · Nov 16, 2016

What are their temps and LLC count?

Sent from my Nexus 5X using Tapatalk

soulburn · Nov 16, 2016

SweetAndLow said:
What are their temps and LLC count?

Sent from my Nexus 5X using Tapatalk

I took out the bad drives to RMA them today, however, here are a few of the remaining WD Red 6 TB drives.

Code:

[root@freenas] ~# smartctl -a /dev/da0 | awk '/Temperature_Celsius/{print $0}' | awk '{print $10 "C"}'
32C
[root@freenas] ~# smartctl -a /dev/da1 | awk '/Temperature_Celsius/{print $0}' | awk '{print $10 "C"}'
33C
[root@freenas] ~# smartctl -a /dev/da2 | awk '/Temperature_Celsius/{print $0}' | awk '{print $10 "C"}'
33C
[root@freenas] ~# smartctl -a /dev/da3 | awk '/Temperature_Celsius/{print $0}' | awk '{print $10 "C"}'
32C
[root@freenas] ~#

Code:

da0 = 193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   232
da1 = 193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   200
da2 = 193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   212
da3 = 193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   156

DrKK · Nov 16, 2016

This rate of failure certainly suggests something else is afoot. As others have said, some kind of power bus problem? I don't know what to tell you, other than to tell you we have literally hundreds of thousands of WD red drives deployed on the userbase's FreeNAS's, and this rate of failure is absolutely unheard of.

soulburn · Nov 16, 2016

DrKK said:
This rate of failure certainly suggests something else is afoot. As others have said, some kind of power bus problem? I don't know what to tell you, other than to tell you we have literally hundreds of thousands of WD red drives deployed on the userbase's FreeNAS's, and this rate of failure is absolutely unheard of.

I debated even posting this because I didn't want to cause a crapstorm, but this is my real and true data. As I said before, these are both Supermicro servers, located in different geographical locations, both running APC UPS systems, both are in server rooms with climate control that's kept at around 70F. I really just have no idea as to what it could be, but am more than happy to entertain ideas and discussion as to what could be causing them to fail at such a crazy rate.

Ericloewe · Nov 16, 2016

What kind of failures do they develop, anyway? Mechanical, surface defects or electronic?

And when were they manufactured, approximately?

Robert Trevellyan · Nov 16, 2016

DrKK said:
this rate of failure is absolutely unheard of

My WD Red 2TB failure rate is 25%, but at 1 out of 4 it's meaningless. However, I note that the 6TB model does worse than most in the BackBlaze environment.

DrKK · Nov 16, 2016

soulburn said:
I debating even posting this because I didn't want to cause a crapstorm

I certainly wouldn't say you're inducing a crapstorm. I think we are all legitimately academically curious about what is happening to your drives.

soulburn · Nov 16, 2016

Ericloewe said:
What kind of failures do they develop, anyway? Mechanical, surface defects or electronic?

And when were they manufactured, approximately?

To be honest they just dropped out of the array and I pulled them for RMA so I can't tell you unless it's in the FreeNAS logs somewhere?

Manufacture dates are as follows:

10 JUN 15
10 JUN 15
05 DEC 15
21 SEP 15

David Hood · Nov 16, 2016

I've heard from guys at my work that WD Reds are not very good at all or not reliable. I was shown a site yesterday that may help now and for future HDD purchases. Take a look. https://www.backblaze.com/blog/hard-drive-reliability-stats-q1-2016/

Borja Marcos · Nov 17, 2016

I've got 9 WD Reds. One has given some intermitent problems, the rest are flawless.

One of them is a 2 TB model, installed inside an old Time Capsule and running strong for 2+ years. The other 8 are installed in a pair of HP Microserver Gen8 servers.

In case of mechanical problems, have you purchased all of them from the same dealer? At the same time? Back in the middle 90's I had a terrifying experience with poorly handled hard disks. At that time I designed voice recording systems which are supposed to be left alone and just work for a reasonable amount of time.

One day we begun to suffer disk failures like 2 months after being installed at the customer's location. Which really sucked. No data was lost (the system was designed as a hierarchical storage management and the hard disk was a cache, everything went to tape immediately) but rebuilding required a plane ticket and some time at the customer's premises.

After investigating the issue, I found out that our parts supplier had developed the healthy habit of just tossing the hard disks inside the box with the rest of the parts. No cushioning no nothing.

So I gave an order to the guy who received the orders: Open the box. If there is a single hard disk without proper packaging, close the box and return the order. No matter it's a 10,000 € order with a single hard disk among other parts. Return it and tell the financial department that it's a DoA, do not pay.

First order arrived, was immediately returned. Next day of course someone from the dealer called me, quite surprised. I told him that poorly packed goods in an order were not acceptable and that I could quote him a reasonable engineering fee for parts testing and certification (reasonable as in the 100+ euro/hour for that work). He told me that they had new staff in logistics, they didn't know how to pack them... of course that was none of my business.

Three large orders rejected later, they learned (hooray!) to wrap the disks in some bubble wrap. Know what? Mysterious disk failures didn't happen again.

Moral of this story is: traditionally it's been considered good advice to avoid using disks from the same batch/supplier when setting up storage systems. You don't know wether a box full of disks has been mistreated in transit to your supplier, for example. And I have seen cases of buying a disk at a store, returning it after a failure only to see the replacement fail in the same way...

BigDave · Nov 17, 2016

I became convinced long ago that rough treatment of hard disk
drives during shipment has a very serious effect on failure rates
from all manufacturers, with vibration issues bringing up a
close second.

SweetAndLow · Nov 17, 2016

Between my friends and I we have around 50 WD reds. After burn in they usually never fail unexpectedly. Most are 2-3 years old now.

Sent from my Nexus 5X using Tapatalk

Arwen · Nov 17, 2016

@soulburn

One thing to check is the power connections to the disk backplanes. Some disk backplanes have multiple
power connectors. Using 4 out of 5 may not be enough at high ampere loads, (all disks seeking at once).

Further, if you have all the power connections populated, check the power supplies' rating for EACH of
the backplane's power cables. It's possible that one or more is reduced current compared to the others.

That said, it's un-likely to be your problem. But, this is a more straight forward investigation and does not
require too much time.

soulburn · Nov 18, 2016

Arwen said:
@soulburn

One thing to check is the power connections to the disk backplanes. Some disk backplanes have multiple
power connectors. Using 4 out of 5 may not be enough at high ampere loads, (all disks seeking at once).

Further, if you have all the power connections populated, check the power supplies' rating for EACH of
the backplane's power cables. It's possible that one or more is reduced current compared to the others.

That said, it's un-likely to be your problem. But, this is a more straight forward investigation and does not
require too much time.

Thanks for the ideas. I checked the molex connectors for both servers when I puled the drives, and they're all good. As for the PSU's, these are both Supermicro servers with redundant 1000 and 1400 watt PSU's that are designed for their respective chassis. I'm all out of ideas. I'll keep everyone updated if I figure anything out

Mlovelace · Nov 18, 2016

Well for what's it's worth, I've have 60 3TB WD Red drives in production for just over 3 years now and have had to RMA 4 of them due to failed smart tests. Did you burn-in the drives before putting the arrays into production?

Important Announcement for the TrueNAS Community.

(RANT) Disappointed in WD Reds reliability.

Contributor

Attachments

Server Wrangler

Contributor

Server Wrangler

Contributor

Sweet'NASty

Contributor

FreeNAS Generalissimo

Contributor

Server Wrangler

Pony Wrangler

FreeNAS Generalissimo

Contributor

Cadet

Contributor

FreeNAS Enthusiast

Sweet'NASty

MVP

Contributor

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "(RANT) Disappointed in WD Reds reliability."

Similar threads