WD Red is getting toasty

Status
Not open for further replies.

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
Interesting that one, and only one, of my HDDs is getting hot (da14 below). That is forcing thee server to run with heavy i/o or full fan setting.

Code:
[2018-09-17 09:05:00] monitor_hdd_temp: Start
[2018-09-17 09:05:00] monitor_hdd_temp: Getting list of drives
[2018-09-17 09:05:02] monitor_hdd_temp: Processing: /dev/da2: WDC WD40EFRX-68WT0N0 - WD-WCC4E0NUJ0C4 - temp 33C
[2018-09-17 09:05:02] monitor_hdd_temp: Processing: /dev/da2: WDC WD40EFRX-68WT0N0 - WD-WCC4E0NUJ0C4 - temp 33C
[2018-09-17 09:05:02] monitor_hdd_temp: Processing: /dev/da3: WDC WD40EFRX-68WT0N0 - WD-WCC4E0ARJR2R - temp 32C
[2018-09-17 09:05:02] monitor_hdd_temp: Processing: /dev/da3: WDC WD40EFRX-68WT0N0 - WD-WCC4E0ARJR2R - temp 32C
[2018-09-17 09:05:02] monitor_hdd_temp: Processing: /dev/da4: WDC WD40EFRX-68WT0N0 - WD-WCC4E4FR61KF - temp 33C
[2018-09-17 09:05:02] monitor_hdd_temp: Processing: /dev/da4: WDC WD40EFRX-68WT0N0 - WD-WCC4E4FR61KF - temp 33C
[2018-09-17 09:05:02] monitor_hdd_temp: Processing: /dev/da5: WDC WD40EFRX-68WT0N0 - WD-WCC4E6SEFY6R - temp 34C
[2018-09-17 09:05:02] monitor_hdd_temp: Processing: /dev/da5: WDC WD40EFRX-68WT0N0 - WD-WCC4E6SEFY6R - temp 34C
[2018-09-17 09:05:02] monitor_hdd_temp: Processing: /dev/da6: WDC WD40EFRX-68WT0N0 - WD-WCC4E6UR30JF - temp 34C
[2018-09-17 09:05:03] monitor_hdd_temp: Processing: /dev/da6: WDC WD40EFRX-68WT0N0 - WD-WCC4E6UR30JF - temp 34C
[2018-09-17 09:05:03] monitor_hdd_temp: Processing: /dev/da7: WDC WD40EFRX-68WT0N0 - WD-WCC4E4PHJH37 - temp 33C
[2018-09-17 09:05:03] monitor_hdd_temp: Processing: /dev/da7: WDC WD40EFRX-68WT0N0 - WD-WCC4E4PHJH37 - temp 33C
[2018-09-17 09:05:04] monitor_hdd_temp: Processing: /dev/da8: WDC WD40EFRX-68WT0N0 - WD-WCC4E2XH9F5L - temp 34C
[2018-09-17 09:05:05] monitor_hdd_temp: Processing: /dev/da8: WDC WD40EFRX-68WT0N0 - WD-WCC4E2XH9F5L - temp 34C
[2018-09-17 09:05:05] monitor_hdd_temp: Processing: /dev/da9: WDC WD40EFRX-68N32N0 - WD-WCC7K3TTVV06 - temp 34C
[2018-09-17 09:05:06] monitor_hdd_temp: Processing: /dev/da9: WDC WD40EFRX-68N32N0 - WD-WCC7K3TTVV06 - temp 34C
[2018-09-17 09:05:06] monitor_hdd_temp: Processing: /dev/da10: WDC WD40EFRX-68WT0N0 - WD-WCC4E2DD5135 - temp 35C
[2018-09-17 09:05:06] monitor_hdd_temp: Processing: /dev/da10: WDC WD40EFRX-68WT0N0 - WD-WCC4E2DD5135 - temp 35C
[2018-09-17 09:05:07] monitor_hdd_temp: Processing: /dev/da11: WDC WD40EFRX-68WT0N0 - WD-WCC4E5RAAT4P - temp 34C
[2018-09-17 09:05:07] monitor_hdd_temp: Processing: /dev/da11: WDC WD40EFRX-68WT0N0 - WD-WCC4E5RAAT4P - temp 34C
[2018-09-17 09:05:08] monitor_hdd_temp: Processing: /dev/da12: WDC WD40EFRX-68WT0N0 - WD-WCC4E4DH1DNA - temp 32C
[2018-09-17 09:05:08] monitor_hdd_temp: Processing: /dev/da12: WDC WD40EFRX-68WT0N0 - WD-WCC4E4DH1DNA - temp 32C
[2018-09-17 09:05:08] monitor_hdd_temp: Processing: /dev/da13: WDC WD40EFRX-68N32N0 - WD-WCC7K5XZJTS7 - temp 32C
[2018-09-17 09:05:08] monitor_hdd_temp: Processing: /dev/da13: WDC WD40EFRX-68N32N0 - WD-WCC7K5XZJTS7 - temp 32C
[2018-09-17 09:05:09] monitor_hdd_temp: Processing: /dev/da14: WDC WD4002FFWX-68TZ4N0 - NHG3962K - temp 38C
[2018-09-17 09:05:09] monitor_hdd_temp: Drive /dev/da14 current temperature (38C), exceeded alert temperature (35C).
[2018-09-17 09:05:09] monitor_hdd_temp: Fancontrol: Keeping same fan setting >> HEAVY I/O


It is at 38C, but I have the script set to go to full at 39C, so it is about to make noise :)

Code:
=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red Pro
Device Model:	 WDC WD4002FFWX-68TZ4N0
Serial Number:	NHG3962K
LU WWN Device Id: 5 000cca 243c17fa2
Firmware Version: 83.H0A83
User Capacity:	4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	7200 rpm
Form Factor:	  3.5 inches
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Mon Sep 17 09:12:35 2018 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000b   100   100   016	Pre-fail  Always	   -	   0
  2 Throughput_Performance  0x0005   134   134   054	Pre-fail  Offline	  -	   116
  3 Spin_Up_Time			0x0007   136   136   024	Pre-fail  Always	   -	   484 (Average 484)
  4 Start_Stop_Count		0x0012   100   100   000	Old_age   Always	   -	   99
  5 Reallocated_Sector_Ct   0x0033   100   100   005	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x000b   100   100   067	Pre-fail  Always	   -	   0
  8 Seek_Time_Performance   0x0005   128   128   020	Pre-fail  Offline	  -	   18
  9 Power_On_Hours		  0x0012   099   099   000	Old_age   Always	   -	   9299
 10 Spin_Retry_Count		0x0013   100   100   060	Pre-fail  Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   93
192 Power-Off_Retract_Count 0x0032   097   097   000	Old_age   Always	   -	   3875
193 Load_Cycle_Count		0x0012   097   097   000	Old_age   Always	   -	   3875
194 Temperature_Celsius	 0x0002   153   153   000	Old_age   Always	   -	   39 (Min/Max 15/47)
196 Reallocated_Event_Count 0x0032   100   100   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0022   100   100   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0008   100   100   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x000a   200   200   000	Old_age   Always	   -	   0

SMART Error Log Version: 1
No Errors Logged


Resilver in process.

Just can't find any explanation of why, besides manufacture defect, maybe faulty sensor, but as doesn't show any errors, can't even RMA.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Resilver in process.
Does this mean that you replaced the drive in your system?
It may not be throwing errors yet, but I had a WD Red (or Red Pro, I can't recall) cook off in one of my servers at work. It kept getting hotter until it stopped working entirely. It happened over the weekend and if I recall correctly, the last logged temperature was around 130°C. After that, the two neighboring drives started having bad sectors too. I think the heat affected them. In that system, I think the bearing must have been bad, probably from the factory.
If it is under warranty, I would try to get WD to replace it. I have had good results, no failures so far, from the ones I have been provided under warranty.
The 4TB replacements I have gotten were all white labeled, but they still say WD Red (or Red Pro), just no fancy red label. We have servers at work that came with WD Red drives and same model that came in with WD Red Pro drives.
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
Does this mean that you replaced the drive in your system?

Yes ... 97% and going strong. That drive is the only one getting hot, so the script is running the fans at full just because of it. I just checked and all are at 34C/35C and that one at 40C with the fans in full.

It may not be throwing errors yet,

Not going to wait until fails and impacts other drives, like yours. Taking off the server in 30 mins or so ;)

Obvious answer: da14 is a WD Red Pro and spinning at 7200rpm, all of your others are 5400rpm Reds.

hmmm ... good point. But taking that from the server and installing on a desktop somewhere.

Edit: And if it fails:

Code:
Serial Number Status  Model Number Description Expiration Date
NHG3962K In Limited Warranty WD4002FFWX WD Red Pro 07/19/2021
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Not going to wait until fails and impacts other drives, like yours. Taking off the server in 30 mins or so ;)
Don't do it
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Yes ... 97% and going strong. That drive is the only one getting hot, so the script is running the fans at full just because of it. I just checked and all are at 34C/35C and that one at 40C with the fans in full.
I didn't understand before. This drive is under high write workload because it is a replacement drive being resilvered into the array?
If that is correct, the temperature is fine. Wait and see how it behaves once the resilver is complete.
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Obvious answer: da14 is a WD Red Pro and spinning at 7200rpm, all of your others are 5400rpm Reds.
@melloa
The 7200 RPM WD drives do run hotter on average than other drives. They are rated to around 65°C if I recall.
Also, look at this:
Code:
+------+---------------+----+-----+-----+-----+-------+-------+--------+------+----------+------+-------+----+
|Device|Serial		 |Temp|Power|Start|Spin |ReAlloc|Current|Offline |Seek  |Total	 |High  |Command|Last|
|	  |Number		 |	|On   |Stop |Retry|Sectors|Pending|Uncorrec|Errors|Seeks	 |Fly   |Timeout|Test|
|	  |			   |	|Hours|Count|Count|	   |Sectors|Sectors |	  |		  |Writes|Count  |Age |
+------+---------------+----+-----+-----+-----+-------+-------+--------+------+----------+------+-------+----+
|da4   |Z307		   | 34 | 9266|   57|	0|	  0|	  0|	   0|	 0|  35274575|	 0|	  0|   0|
|da5   |Z307		   | 35 |11844|  100|	0|	  0|	  0|	   0|	 0|  37920600|	 0|	  0|   0|
|da6   |Z307		   | 34 |11485|   59|	0|	  0|	  0|	   0|	 0|  38341343|	 0|	  0|   0|
|da7   |Z305		   | 35 | 9154|   28|	0|	  0|	  0|	   0|	 0|  42243312|	 0|	  0|   0|
|da8   |Z307		   | 35 |12766|   92|	0|	  0|	  0|	   0|	 0|  40012808|	 0|	  0|   0|
|da9   |Z305		   | 35 | 9139|   26|	0|	  0|	  0|	   0|	 0|  41453492|	 0|	  0|   0|
|da10  |Z4Z3		   | 34 |15716|   40|	0|	  0|	  0|	   0|	 0|  52006450|	 2|	  0|   0|
|da11  |Z4Z2		   | 35 |12268|  116|	0|	  0|	  0|	   0|	 0|  76243133|	 7|	  0|   0|
|da12  |W4Z2		   | 34 |15719|   44|	0|	  0|	  0|	   0|	 0|  52514647|	 3|	  0|   0|
|da13  |W4Z2		   | 35 |17901|   89|	0|	  0|	  0|	   0|	 0|  81398477|	 2|	  0|   0|
|da14  |Z4Z3		   | 35 |15924|   42|	0|	  0|	  0|	   0|	 1|  54317367|	 2|	  0|   0|
|da15  |Z4Z3		   | 36 |15718|   42|	0|	  0|	  0|	   0|	 1|  53484792|	 0|	  0|   0|
|da16  |K1JX		   | 40 | 1368|	3|	0|	  0|	  0|	   0|   N/A|	   N/A|   N/A|	N/A|   0|
|da17  |K1JX		   | 40 | 1368|	3|	0|	  0|	  0|	   0|   N/A|	   N/A|   N/A|	N/A|   0|
|da18  |K1JX		   | 39 | 1368|	3|	0|	  0|	  0|	   0|   N/A|	   N/A|   N/A|	N/A|   0|
|da19  |K1JX		   | 38 | 1368|	3|	0|	  0|	  0|	   0|   N/A|	   N/A|   N/A|	N/A|   0|
|da20  |Z301		   | 31 | 2071|   11|	0|	  0|	  0|	   0|	 0|   4313909|	 7|	  0|   0|
|da21  |S300		   | 31 | 2347|   36|	0|	  0|	  0|	   0|	 0|   4391741|	 7|	  0|   0|
|da22  |Z301		   | 29 | 2071|   11|	0|	  0|	  0|	   0|	 0|   4430370|	 8|	  0|   0|
|da23  |Z301		   | 29 | 2088|   23|	0|	  0|	  0|	   0|	 0|   4442934|	 8|	  0|   0|
|da24  |Z301		   | 30 | 2088|   18|	0|	  0|	  0|	   0|	 0|   4540040|	 9|	  0|   0|
|da25  |S300		   | 30 | 2347|   35|	0|	  0|	  0|	   0|	 0|   4425299|	 0|	  0|   0|
|da26  |Z301		   | 30 | 1856|   12|	0|	  0|	  0|	   0|	 0|   4510065|	 8|	  0|   0|
|da27  |W300		   | 29 | 2347|   36|	0|	  0|	  0|	   0|	 0|   4507604|	11|	  1|   0|
|da28  |Z301		   | 29 | 2087|   25|	0|	  0|	  0|	   0|	 0|   4433842|	 7|	  0|   0|
|da29  |Z300		   | 29 | 2347|   35|	0|	  0|	  0|	   0|	 0|   4485386|	21|	  1|   0|
|da30  |Z301		   | 30 | 2071|   15|	0|	  0|	  0|	   0|	 0|   4375462|	 1|	  0|   0|
|da31  |Z301		   | 31 | 2087|   20|	0|	  0|	  0|	   0|	 0|   4566242|	 2|	  0|   0|

This is part of my daily report from my server. See the hot drives? They are all WD and the others are all Seagate.
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
I didn't understand before. This drive is under high write workload because it is a replacement drive being resilvered into the array? If that is correct, the temperature is fine. Wait and see how it behaves once the resilver is complete.

It has been impacting the logic on my script even in idle it spins faster, get hotter, turn my fans on :)

They are rated to around 65°C if I recall.

I run my pool at 39C, so any drive that gets to that temp, it triggers the fans change.

See the hot drives? They are all WD and the others are all Seagate.

Those da16 and da17 would spin up my fans ;)
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Those da16 and da17 would spin up my fans ;)
My fans don't switch to medium until 45°C and go to high at 55°C. The server did high one time when the air conditioning was out and the ambient temp in the room hit about 85°F
What kind of drives are you using?
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
My fans don't switch to medium until 45°C and go to high at 55°C.

I could set to that, but I prefer to spin up the fans after 39C. I like to keep the disks under 40C.

Now they are all running 36C - 38C (37C is normal body temp), and the server has not switched to full after I removed that drive. It might be a good drive, but won't back to the pool, unless I need a temp solution. The volume is a raidz3, so that will be very unlike to happen.

What kind of drives are you using?

WD Red.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
You see the temperature of the drives in my NAS, the temperature stays like that the majority of the time. They edge up a little while it is doing the sync with the backup pool.
Now I am thinking about it, I think I was wrong about the set points.
I will look at it.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
My Chenbro chassis has temperature sensors in the drive backplanes. The fans connect to the backplanes and there is a set of dip switches where you can set the temperature limit. It automatically throttles the fans to maintain the set temperature. I have been very happy with that for keeping the drives cool. The only problem has been with keeping the boot drives cool where they are mounted in the back of the chassis.
upload_2018-9-17_20-6-48.png upload_2018-9-17_20-7-21.png

http://www.chenbro.com/en-global/products/RackmountChassis/4U_Chassis/NR40700

All the Supermicro chassis I have (one 24 bay and two 15 bay) are older and their fans were also attached to the backplanes, but they ran full blast (5000 RPM) all the time. To control the noise level, I put speed reducers on them to make them run about half speed and that is quiet enough for me while keeping the drives cool enough to be healthy, as long as I don't use the 7200 RPM monster drives.
 
Status
Not open for further replies.
Top