mgb
Dabbler
- Joined
 - Aug 10, 2015
 
- Messages
 - 10
 
I'm in the process of burn-in testing a new server and there's 1 SAS HDD (out of 10) that's reporting quite a high number of "Errors Corrected by ECC" compared to the others.
The system:
Chassis: Supermicro CSE-826BE26-R920LPB (Dual Expanders Backplane)
Motherboard: Supermicro X1oDRi-T
HBA: 2x LSI 9207-8i (only 1 is currently connected)
HDDs: 10x HGST 2TB 7K400 SAS2
The procedure:
After running memtest86 and memtest86+ on the server, I followed the [How To] Hard Drive Burn-In Testing (thanks @qwertymodo). You'll notice the ~8TB of data processed which is the result of badblocks default 4 passes.
The suspicious HDD:
Out of the other 9 HDDs, 2 had 0 errors, 2 had 1 error, 3 had less than 10 errors, 2 had less than 30 errors and then this one shows over 5000. Seems extremely high compared to the others.
This is definitely something I should be worried about, right?
Any suggestions on how to more thoroughly test this drive?
I'm thinking I should RMA the drive.
Thanks in advance!
--mgb
	
		
			
		
		
	
			
			The system:
Chassis: Supermicro CSE-826BE26-R920LPB (Dual Expanders Backplane)
Motherboard: Supermicro X1oDRi-T
HBA: 2x LSI 9207-8i (only 1 is currently connected)
HDDs: 10x HGST 2TB 7K400 SAS2
The procedure:
After running memtest86 and memtest86+ on the server, I followed the [How To] Hard Drive Burn-In Testing (thanks @qwertymodo). You'll notice the ~8TB of data processed which is the result of badblocks default 4 passes.
The suspicious HDD:
Code:
root@sysresccd /root % smartctl -q noserial -a /dev/sdh
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-3.14.50-std460-amd64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              HUS724020ALS640
Revision:             A280
Compliance:           SPC-4
User Capacity:        2,000,398,934,016 bytes [2.00 TB]
Logical block size:   512 bytes
LB provisioning type: unreported, LBPME=0, LBPRZ=0
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Wed Oct  7 14:32:51 2015 UTC
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature:     32 C
Drive Trip Temperature:        85 C
Manufactured in week 14 of year 2015
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  4
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  6
Elements in grown defect list: 0
Vendor (Seagate) cache information
  Blocks sent to initiator = 2048408550375424
Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:       5304      354         0      5658       8029       8001.691           0
write:         0        0         0         0          6       8001.596           0
verify:     1535      176         0      1711      26283          0.064           0
Non-medium error count:        0
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -      46                 - [-   -    -]
# 2  Background long   Completed                   -       6                 - [-   -    -]
# 3  Background short  Completed                   -       0                 - [-   -    -]
Long (extended) Self Test duration: 22650 seconds [377.5 minutes]Out of the other 9 HDDs, 2 had 0 errors, 2 had 1 error, 3 had less than 10 errors, 2 had less than 30 errors and then this one shows over 5000. Seems extremely high compared to the others.
This is definitely something I should be worried about, right?
Any suggestions on how to more thoroughly test this drive?
I'm thinking I should RMA the drive.
Thanks in advance!
--mgb