Steven Sedory
Explorer
- Joined
- Apr 7, 2014
- Messages
- 96
I have a production box that I use for the SAN of a 2012 R2 Hyper V Cluster. It is made up of 15 15K 600GB SAS drive (Seagate Cheetah ST3600057SS). I bought Manufacture Certified Refurbished Drives, as I could get them for almost $100 less. It may simply be that I got what I paid for..
Anyhow, over the past 10 months that the box has been in production, I've had seven or eight of the drives failed, which I've RMA'd. The last two had the SMART FAILURE PREDICTION THRESHOLD EXCEEDED error, so I just RMA'd those today.
The drives seem to be running at a Drive Temp of 47 C, with the ambient room temp at 25C (this is apparently within the healthy limits. Please see below from the product manual (http://www.seagate.com/staticfiles/support/disc/manuals/enterprise/cheetah/15K.7/SAS/100516226d.pdf)
"The maximum allowable continuous or sustained HDA case temperature for the rated Annualized Failure
Rate (AFR) is 122°F (50°C) The maximum allowable
HDA case temperature is
60°C. Occasional excur sions of HDA case temperatures above 122°F (50°C) or below 41°F (5°C) may occur without impact to the specified AFR. Continual or sustained operation at HDA case temperatures outside these limits may degrade AFR."
The last two drive that I RMA'd were at 41 and 42 C.
Below is the SMART data for the two drives before I removed them:
[root@san1] ~# smartctl -a /dev/da2
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST3600057SS
Revision: 000B
User Capacity: 600,127,266,816 bytes [600 GB]
Logical block size: 512 bytes
Rotation Rate: 15000 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c5005d58270b
Serial number: 6SL1XJ4F0000N2027PB4
Device type: disk
Transport protocol: SAS
Local Time is: Tue Nov 18 14:24:19 2014 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d, ascq=0]
Current Drive Temperature: 41 C
Drive Trip Temperature: 68 C
Elements in grown defect list: 164
Vendor (Seagate) cache information
Blocks sent to initiator = 495170415
Blocks received from initiator = 1215535082
Blocks read from cache and sent to initiator = 105525624
Number of read and write commands whose size <= segment size = 11127391
Number of read and write commands whose size > segment size = 12
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 5242.23
number of minutes until next internal SMART test = 52
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 7754326 0 0 7754326 7754327 253.527 1
write: 0 0 0 0 0 625.068 0
verify: 585 45 0 630 784 0.000 21
Non-medium error count: 2
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed - 5241 - [- - -]
# 2 Background short Completed - 5240 - [- - -]
# 3 Background short Completed - 5239 - [- - -]
# 4 Background short Completed - 5238 - [- - -]
# 5 Background short Completed - 5237 - [- - -]
# 6 Background short Completed - 5236 - [- - -]
# 7 Background short Completed - 5235 - [- - -]
# 8 Background short Completed - 5234 - [- - -]
# 9 Background short Completed - 5233 - [- - -]
#10 Background short Completed - 5232 - [- - -]
#11 Background short Completed - 5231 - [- - -]
#12 Background short Completed - 5230 - [- - -]
#13 Background short Completed - 5229 - [- - -]
#14 Background short Completed - 5228 - [- - -]
#15 Background short Completed - 5227 - [- - -]
#16 Background short Completed - 5226 - [- - -]
#17 Background short Completed - 5225 - [- - -]
#18 Background short Completed - 5224 - [- - -]
#19 Background short Completed - 5223 - [- - -]
#20 Background short Completed - 5222 - [- - -]
Long (extended) Self Test duration: 6400 seconds [106.7 minutes]
[root@san1] ~# smartctl -a /dev/da4
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST3600057SS
Revision: 000B
User Capacity: 600,127,266,816 bytes [600 GB]
Logical block size: 512 bytes
Rotation Rate: 15000 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c5005d586f27
Serial number: 3SL1AMXH00009108K8KE
Device type: disk
Transport protocol: SAS
Local Time is: Tue Nov 18 14:24:24 2014 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d, ascq=0]
Current Drive Temperature: 42 C
Drive Trip Temperature: 68 C
Elements in grown defect list: 2037
Vendor (Seagate) cache information
Blocks sent to initiator = 9449
Blocks received from initiator = 202
Blocks read from cache and sent to initiator = 5947
Number of read and write commands whose size <= segment size = 62
Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 5242.70
number of minutes until next internal SMART test = 52
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 104 0 0 104 104 0.005 0
write: 0 0 0 0 0 0.000 0
Non-medium error count: 0
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
No self-tests have been logged
Any ideas?
Anyhow, over the past 10 months that the box has been in production, I've had seven or eight of the drives failed, which I've RMA'd. The last two had the SMART FAILURE PREDICTION THRESHOLD EXCEEDED error, so I just RMA'd those today.
The drives seem to be running at a Drive Temp of 47 C, with the ambient room temp at 25C (this is apparently within the healthy limits. Please see below from the product manual (http://www.seagate.com/staticfiles/support/disc/manuals/enterprise/cheetah/15K.7/SAS/100516226d.pdf)
"The maximum allowable continuous or sustained HDA case temperature for the rated Annualized Failure
Rate (AFR) is 122°F (50°C) The maximum allowable
HDA case temperature is
60°C. Occasional excur sions of HDA case temperatures above 122°F (50°C) or below 41°F (5°C) may occur without impact to the specified AFR. Continual or sustained operation at HDA case temperatures outside these limits may degrade AFR."
The last two drive that I RMA'd were at 41 and 42 C.
Below is the SMART data for the two drives before I removed them:
[root@san1] ~# smartctl -a /dev/da2
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST3600057SS
Revision: 000B
User Capacity: 600,127,266,816 bytes [600 GB]
Logical block size: 512 bytes
Rotation Rate: 15000 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c5005d58270b
Serial number: 6SL1XJ4F0000N2027PB4
Device type: disk
Transport protocol: SAS
Local Time is: Tue Nov 18 14:24:19 2014 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d, ascq=0]
Current Drive Temperature: 41 C
Drive Trip Temperature: 68 C
Elements in grown defect list: 164
Vendor (Seagate) cache information
Blocks sent to initiator = 495170415
Blocks received from initiator = 1215535082
Blocks read from cache and sent to initiator = 105525624
Number of read and write commands whose size <= segment size = 11127391
Number of read and write commands whose size > segment size = 12
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 5242.23
number of minutes until next internal SMART test = 52
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 7754326 0 0 7754326 7754327 253.527 1
write: 0 0 0 0 0 625.068 0
verify: 585 45 0 630 784 0.000 21
Non-medium error count: 2
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed - 5241 - [- - -]
# 2 Background short Completed - 5240 - [- - -]
# 3 Background short Completed - 5239 - [- - -]
# 4 Background short Completed - 5238 - [- - -]
# 5 Background short Completed - 5237 - [- - -]
# 6 Background short Completed - 5236 - [- - -]
# 7 Background short Completed - 5235 - [- - -]
# 8 Background short Completed - 5234 - [- - -]
# 9 Background short Completed - 5233 - [- - -]
#10 Background short Completed - 5232 - [- - -]
#11 Background short Completed - 5231 - [- - -]
#12 Background short Completed - 5230 - [- - -]
#13 Background short Completed - 5229 - [- - -]
#14 Background short Completed - 5228 - [- - -]
#15 Background short Completed - 5227 - [- - -]
#16 Background short Completed - 5226 - [- - -]
#17 Background short Completed - 5225 - [- - -]
#18 Background short Completed - 5224 - [- - -]
#19 Background short Completed - 5223 - [- - -]
#20 Background short Completed - 5222 - [- - -]
Long (extended) Self Test duration: 6400 seconds [106.7 minutes]
[root@san1] ~# smartctl -a /dev/da4
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST3600057SS
Revision: 000B
User Capacity: 600,127,266,816 bytes [600 GB]
Logical block size: 512 bytes
Rotation Rate: 15000 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c5005d586f27
Serial number: 3SL1AMXH00009108K8KE
Device type: disk
Transport protocol: SAS
Local Time is: Tue Nov 18 14:24:24 2014 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d, ascq=0]
Current Drive Temperature: 42 C
Drive Trip Temperature: 68 C
Elements in grown defect list: 2037
Vendor (Seagate) cache information
Blocks sent to initiator = 9449
Blocks received from initiator = 202
Blocks read from cache and sent to initiator = 5947
Number of read and write commands whose size <= segment size = 62
Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 5242.70
number of minutes until next internal SMART test = 52
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 104 0 0 104 104 0.005 0
write: 0 0 0 0 0 0.000 0
Non-medium error count: 0
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
No self-tests have been logged
Any ideas?