WD Red drive failing?

Status
Not open for further replies.

IanWor

Dabbler
Joined
Dec 7, 2013
Messages
30
Hi all, wanted to get the collective intelligence on this one before I go spend more $$$ on drive(s).

I have a WD 3TB drive that is passing short and long tests but has utilization running in the 90's% with read and write in the 3 - 15% range (normal) and the system seems to be running 'slow'

System is 11.1-U6, 32GiB ram, 2 pools - pool1 6x2TB @ raidz2, pool2 3x3TB @ raidz1

I have seen this status for the past week and I am wondering if there are any tests that I can / should run before making a final decision.

Thanks all

Ian
 

Attachments

  • wd-drive-status.jpg
    wd-drive-status.jpg
    201.2 KB · Views: 325

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
How full is your pool?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I have seen drives run slow like that before in my system and it usually means the drive is about to die.
What is the brand / model of the drive and what does the SMART status show?

Edit - Sorry. I see where you said it is a WD Red.
 

IanWor

Dabbler
Joined
Dec 7, 2013
Messages
30
Hi Chris.

Yeah, WD Red NAS drive 3TB. SMART in the CLI is not giving an error, that is what is confusing me.

I manually ran a smart test on the drive from the cli with the command #sudo smartctl -t long /dev/ada0
And here is some of the errors that I am seeing from the command #sudo smartctl -a /dev/ada0

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 45224 -
# 2 Extended offline Completed: read failure 10% 45144 1324893600
# 3 Short offline Completed without error 00% 45019 -
# 4 Short offline Completed without error 00% 44851 -
# 5 Extended offline Completed: read failure 10% 44762 1324893592
# 6 Short offline Completed without error 00% 44683 -
# 7 Short offline Completed without error 00% 44611 -
# 8 Short offline Completed without error 00% 44443 -
# 9 Extended offline Completed without error 00% 44379 -
#10 Short offline Completed without error 00% 44108 -
#11 Short offline Completed without error 00% 44108 -
#12 Extended offline Completed without error 00% 44019 -
#13 Short offline Completed without error 00% 43940 -
#14 Short offline Completed without error 00% 43940 -
#15 Short offline Completed without error 00% 43892 -
#16 Short offline Completed without error 00% 43892 -
#17 Short offline Completed without error 00% 43725 -
#18 Short offline Completed without error 00% 43724 -
#19 Extended offline Completed without error 00% 43660 -
#20 Short offline Completed without error 00% 43557 -
#21 Short offline Completed without error 00% 43389 -

As you can see - some read failures. I have ordered a new replacement drive but I would still like to know what is going on here.

Hum...
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I would still like to know what is going on here.

Hum...
I can't say for sure, but it would be my guess that the drive is repeatedly trying to read data from bad sectors.
If you look at the portion of the report that looks similar to this:
Code:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000b   100   100   016	Pre-fail  Always	   -	   0
  2 Throughput_Performance  0x0005   138   138   054	Pre-fail  Offline	  -	   100
  3 Spin_Up_Time			0x0007   100   100   024	Pre-fail  Always	   -	   0
  4 Start_Stop_Count		0x0012   100   100   000	Old_age   Always	   -	   3
  5 Reallocated_Sector_Ct   0x0033   100   100   005	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x000b   100   100   067	Pre-fail  Always	   -	   0
  8 Seek_Time_Performance   0x0005   128   128   020	Pre-fail  Offline	  -	   18
  9 Power_On_Hours		  0x0012   100   100   000	Old_age   Always	   -	   768
 10 Spin_Retry_Count		0x0013   100   100   060	Pre-fail  Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   3
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   748
193 Load_Cycle_Count		0x0012   100   100   000	Old_age   Always	   -	   748
194 Temperature_Celsius	 0x0002   146   146   000	Old_age   Always	   -	   41 (Min/Max 25/47)
196 Reallocated_Event_Count 0x0032   100   100   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0022   100   100   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0008   100   100   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x000a   200   200   000	Old_age   Always	   -	   0
You might find that there are some number of reallocated sectors, or offline uncorrectable sectors, or current pending sectors...

I don't know why, but drives will often times claim that:
Code:
SMART overall-health self-assessment test result: PASSED
When they have all sorts of problems that are preventing them from being fully functioning.

Your NAS should have a big flashing red spot in the GUI about this. The performance of the system should not have been the first indicator that there was a problem. When a drive fails a long SMART test, it is time to replace it for sure. I usually replace a drive at the first bad sector.
 

IanWor

Dabbler
Joined
Dec 7, 2013
Messages
30
You might find that there are some number of reallocated sectors, or offline uncorrectable sectors, or current pending sectors...

I don't know why, but drives will often times claim that:
Code:
SMART overall-health self-assessment test result: PASSED
When they have all sorts of problems that are preventing them from being fully functioning.

Your NAS should have a big flashing red spot in the GUI about this. The performance of the system should not have been the first indicator that there was a problem. When a drive fails a long SMART test, it is time to replace it for sure. I usually replace a drive at the first bad sector.

That would be the best way forward, but expensive!

Anyhow, I will leave it in for now, new drive should be here Sunday, so I will pull it and replace then. Reslivering here I come!
 

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
Check your load cycle count in SMART. You could have gotten one of the Reds that still has the old idle bug firmware
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Um, what is the question? You clearly have a failing drive as indicated by the fact that an extended test will not pass and fails in the same area. Replace the hard drive. Look at my hard drive troubleshooting guide link below for further information on how to read the SMART data.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
That would be the best way forward, but expensive!
Losing data is more expensive. Your disk is consistently failing SMART self-tests. Stick a fork in it, it's done.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Losing data is more expensive. Your disk is consistently failing SMART self-tests. Stick a fork in it, it's done.
And that drive has over 5 years of runtime on it, unless it has a long warranty I'd say you got your money out of it.

EDIT: Please don't think we are picking on you. We all just want you to replace the drive so you can have a proper operating system again.
 

IanWor

Dabbler
Joined
Dec 7, 2013
Messages
30
Thanks to all!

And lets see..

Yep, got my moneys worth!
Yep, drive is done changing it out new one ordered and spare one going through some tests now
Nope, don't think you are picking on me! Appreciate all of the advice given.

Thanks again all!

Ian
 
Status
Not open for further replies.
Top