Drives fail smart self test. Pass short smart test.

Status
Not open for further replies.

zlittell

Cadet
Joined
Feb 24, 2014
Messages
8
I have 4x WD RED 3TB drives in a MediaSonic ProBox enclosure. These all run eSATA into an esata card in the pcix slot of a dl380 G5 running 9.2.2.1 off of a usb stick. I get email alerts and errors in the console stating that ada2 and ada3 have failed the smart self test. These drives are just opened out of the box as replacements for DOAs. Granted I could have two more bad drives, but is my luck really this bad. When I run a short test it comes back with nothing. SMART still passes it isn't set to failed, they just fail the self test on the 30minute smartd daemon.

It might just be me but it seems like if I have 0 plugins installed these errors never occur, once I install a plugin the error starts occuring. This could be completely in my head though.

What could cause this? Also I still have TLER enabled on the drives can this be a cause of this issue.

Edit:
One thing to note is the two with no issue are from when re4 first released. Could this be an idle park time issue. Surely that wouldn't cause the self check to fail would it?
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Could be head parking.. The drives could be truly dying.. Do you test them before putting them in?
 

zlittell

Cadet
Joined
Feb 24, 2014
Messages
8
This is a brand new build. You could even consider it a test build since it will probably be wiped and rebuilt from scratch before going "production". So all testing is really occuring now. Should I just do an extended test on these things. Is that the best way to test them?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Testing is like religion, everyone has their own requirements. When I test a new disk I do:

1. Short SMART test
2. Long SMART test
3. badblocks test with random data
4. Short SMART test
5. Long SMART test

Others do FAR FAR more, but I find this has been good enough for "me'.
 

zlittell

Cadet
Joined
Feb 24, 2014
Messages
8
I will run a couple long tests on each drive when I get home. I already ran 2 or 3 short tests the other day all passing.
 

zlittell

Cadet
Joined
Feb 24, 2014
Messages
8
Alright lots of output to give you guys here:

ADA2 output after extended test:
Code:
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p3 amd64] (local build)                                                         
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org                                                       
                                                                                                                                   
=== START OF INFORMATION SECTION ===                                                                                               
Model Family:    Western Digital Red (AF)                                                                                         
Device Model:    WDC WD30EFRX-68AX9N0                                                                                             
Serial Number:    WD-WCC1T0886065                                                                                                 
LU WWN Device Id: 5 0014ee 25dc04394                                                                                               
Firmware Version: 80.00A80                                                                                                         
User Capacity:    3,000,592,982,016 bytes [3.00 TB]                                                                               
Sector Sizes:    512 bytes logical, 4096 bytes physical                                                                           
Device is:        In smartctl database [for details use: -P show]                                                                 
ATA Version is:  ACS-2 (minor revision not indicated)                                                                             
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)                                                                           
Local Time is:    Tue Feb 25 18:35:29 2014 CST                                                                                     
SMART support is: Available - device has SMART capability.                                                                         
SMART support is: Enabled                                                                                                         
                                                                                                                                   
=== START OF READ SMART DATA SECTION ===                                                                                           
SMART overall-health self-assessment test result: PASSED                                                                           
                                                                                                                                   
General SMART Values:                                                                                                             
Offline data collection status:  (0x00) Offline data collection activity                                                           
                                        was never started.                                                                         
                                        Auto Offline Data Collection: Disabled.                                                   
Self-test execution status:      (  0) The previous self-test routine completed                                                   
                                        without error or no self-test has ever                                                     
                                        been run.                                                                                 
Total time to complete Offline                                                                                                     
data collection:                (38460) seconds.                                                                                   
Offline data collection                                                                                                           
capabilities:                    (0x7b) SMART execute Offline immediate.                                                           
                                        Auto Offline data collection on/off support.                                               
                                        Suspend Offline collection upon new                                                       
                                        command.                                                                                   
                                        Offline surface scan supported.                                                           
                                        Self-test supported.                                                                       
                                        Conveyance Self-test supported.                                                           
                                        Selective Self-test supported.                                                             
SMART capabilities:            (0x0003) Saves SMART data before entering                                                           
                                        power-saving mode.                                                                         
                                        Supports SMART auto save timer.                                                           
Error logging capability:        (0x01) Error logging supported.                                                                   
                                        General Purpose Logging supported.                                                         
Short self-test routine                                                                                                           
recommended polling time:        (  2) minutes.                                                                                   
Extended self-test routine                                                                                                         
recommended polling time:        ( 386) minutes.                                                                                   
Conveyance self-test routine       
recommended polling time:        (  5) minutes.                                                                                   
SCT capabilities:              (0x70bd) SCT Status supported.                                                                     
                                        SCT Error Recovery Control supported.                                                     
                                        SCT Feature Control supported.                                                             
                                        SCT Data Table supported.                                                                 
                                                                                                                                   
SMART Attributes Data Structure revision number: 16                                                                               
Vendor Specific SMART Attributes with Thresholds:                                                                                 
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE                                   
  1 Raw_Read_Error_Rate    0x002f  100  253  051    Pre-fail  Always      -      0                                           
  3 Spin_Up_Time            0x0027  180  180  021    Pre-fail  Always      -      5991                                       
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      49                                         
  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0                                           
  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0                                           
  9 Power_On_Hours          0x0032  100  100  000    Old_age  Always      -      263                                         
10 Spin_Retry_Count        0x0032  100  253  000    Old_age  Always      -      0                                           
11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0                                           
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      29                                         
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      24                                         
193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      24                                         
194 Temperature_Celsius    0x0022  118  114  000    Old_age  Always      -      32                                         
196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0                                           
197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0                                           
198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0                                           
199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0                                           
200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      0                                           
                                                                                                                                   
SMART Error Log Version: 1                                                                                                         
No Errors Logged                                                                                                                   
                                                                                                                                   
SMART Self-test log structure revision number 1                                                                                   
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error                                   
# 1  Conveyance offline  Completed without error      00%      256        -                                                     
# 2  Conveyance offline  Completed without error      00%      256        -                                                     
# 3  Short offline      Completed without error      00%      255        -                                                     
# 4  Short offline      Completed without error      00%      255        -                                                     
# 5  Extended offline    Completed without error      00%      252        -                                                     
# 6  Conveyance offline  Completed without error      00%      242        -                                                     
# 7  Short offline      Completed without error      00%      240        -                                                     
# 8  Extended offline    Aborted by host              90%      240        -                                                     
                                                                                                                                   
SMART Selective self-test log data structure revision number 1                                                                     
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS                                                                                       
    1        0        0  Not_testing                                                                                               
    2        0        0  Not_testing                                                                                               
    3        0        0  Not_testing                                                                                               
    4        0        0  Not_testing                                                                                               
    5        0        0  Not_testing   
Selective self-test flags (0x0):                                                                                                   
  After scanning selected spans, do NOT read-scan remainder of disk.                                                               
If Selective self-test is pending on power-up, resume after 0 minute delay.                                                                            
 

zlittell

Cadet
Joined
Feb 24, 2014
Messages
8
ADA3 output after extended test:
Code:
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p3 amd64] (local build)                                                         
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org                                                       
                                                                                                                                   
=== START OF INFORMATION SECTION ===                                                                                               
Model Family:    Western Digital Red (AF)                                                                                         
Device Model:    WDC WD30EFRX-68EUZN0                                                                                             
Serial Number:    WD-WMC4N1459780                                                                                                 
LU WWN Device Id: 5 0014ee 60418d132                                                                                               
Firmware Version: 80.00A80                                                                                                         
User Capacity:    3,000,592,982,016 bytes [3.00 TB]                                                                               
Sector Sizes:    512 bytes logical, 4096 bytes physical                                                                           
Rotation Rate:    5400 rpm                                                                                                         
Device is:        In smartctl database [for details use: -P show]                                                                 
ATA Version is:  ACS-2 (minor revision not indicated)                                                                             
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)                                                                           
Local Time is:    Tue Feb 25 18:37:50 2014 CST                                                                                     
SMART support is: Available - device has SMART capability.                                                                         
SMART support is: Enabled                                                                                                         
                                                                                                                                   
=== START OF READ SMART DATA SECTION ===                                                                                           
Error SMART Status command failed                                                                                                 
Please get assistance from                                                                                                         
http://smartmontools.sourceforge.net/                                                                                             
Register values returned from SMART Status command are:                                                                           
CMD=0xb0                                                                                                                           
FR =0xda                                                                                                                           
NS =0xffff                                                                                                                         
SC =0xff                                                                                                                           
CL =0xff                                                                                                                           
CH =0xff                                                                                                                           
RETURN =0x0000                                                                                                                     
SMART overall-health self-assessment test result: FAILED!                                                                         
No failed Attributes found.                                                                                                       
                                                                                                                                   
General SMART Values:                                                                                                             
Offline data collection status:  (0x00) Offline data collection activity                                                           
                                        was never started.                                                                         
                                        Auto Offline Data Collection: Disabled.                                                   
Self-test execution status:      (  0) The previous self-test routine completed                                                   
                                        without error or no self-test has ever                                                     
                                        been run.                                                                                 
Total time to complete Offline                                                                                                     
data collection:                (41520) seconds.                                                                                   
Offline data collection                                                                                                           
capabilities:                    (0x7b) SMART execute Offline immediate.                                                           
                                        Auto Offline data collection on/off support.                                               
                                        Suspend Offline collection upon new                                                       
                                        command.                                                                                   
                                        Offline surface scan supported.                                                         
                                    Self-test supported.                                                                       
                                        Conveyance Self-test supported.                                                           
                                        Selective Self-test supported.                                                             
SMART capabilities:            (0x0003) Saves SMART data before entering                                                           
                                        power-saving mode.                                                                         
                                        Supports SMART auto save timer.                                                           
Error logging capability:        (0x01) Error logging supported.                                                                   
                                        General Purpose Logging supported.                                                         
Short self-test routine                                                                                                           
recommended polling time:        (  2) minutes.                                                                                   
Extended self-test routine                                                                                                         
recommended polling time:        ( 417) minutes.                                                                                   
Conveyance self-test routine                                                                                                       
recommended polling time:        (  5) minutes.                                                                                   
SCT capabilities:              (0x703d) SCT Status supported.                                                                     
                                        SCT Error Recovery Control supported.                                                     
                                        SCT Feature Control supported.                                                             
                                        SCT Data Table supported.                                                                 
                                                                                                                                   
SMART Attributes Data Structure revision number: 16                                                                               
Vendor Specific SMART Attributes with Thresholds:                                                                                 
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE                                   
  1 Raw_Read_Error_Rate    0x002f  100  253  051    Pre-fail  Always      -      0                                           
  3 Spin_Up_Time            0x0027  177  177  021    Pre-fail  Always      -      6108                                       
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      11                                         
  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0                                           
  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0                                           
  9 Power_On_Hours          0x0032  100  100  000    Old_age  Always      -      64                                         
10 Spin_Retry_Count        0x0032  100  253  000    Old_age  Always      -      0                                           
11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0                                           
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      7                                           
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      4                                           
193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      16                                         
194 Temperature_Celsius    0x0022  118  114  000    Old_age  Always      -      32                                         
196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0                                           
197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0                                           
198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0                                           
199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0                                           
200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      0                                           
                                                                                                                                   
SMART Error Log Version: 1                                                                                                         
No Errors Logged                                                                                                                   
                                                                                                                                   
SMART Self-test log structure revision number 1                                                                                   
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error                                   
# 1  Conveyance offline  Completed without error      00%        57        -                                                     
# 2  Conveyance offline  Completed without error      00%        57        -                                                     
# 3  Short offline      Completed without error      00%        56        -                                 
# 4  Short offline      Completed without error      00%        55        -                                                     
# 5  Extended offline    Completed without error      00%        53        -                                                     
# 6  Conveyance offline  Completed without error      00%        42        -                                                     
                                                                                                                                   
SMART Selective self-test log data structure revision number 1                                                                     
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS                                                                                       
    1        0        0  Not_testing                                                                                               
    2        0        0  Not_testing                                                                                               
    3        0        0  Not_testing                                                                                               
    4        0        0  Not_testing                                                                                               
    5        0        0  Not_testing                                                                                               
Selective self-test flags (0x0):                                                                                                   
  After scanning selected spans, do NOT read-scan remainder of disk.                                                               
If Selective self-test is pending on power-up, resume after 0 minute delay.                                                                                                                                                                                      
 

zlittell

Cadet
Joined
Feb 24, 2014
Messages
8
I have attached the file that contains console messages from the day it ran while these tests were running.
I find it strange that it only is popping on ada3. It started fine. I installed a plugin and it gave the message about ada2. I reformatted the usb stick and tried again. The entire setup time it didn't make a peep installed a plugin and it complained about both ada2 and ada3. Now I reformatted the usb stick again and setup everything, again with no issue then installed LOTS of plugins and it complained about both ada2 and ada3 and now just ada3. Hmmmmmm
 

Attachments

  • Console debug messages.txt
    42.4 KB · Views: 299

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Run something else.. Not within FreeNAS.. I keep Hiren's on USB for aiding in issues like these.. Check bios settings etc..
 

zlittell

Cadet
Joined
Feb 24, 2014
Messages
8
I think I am going to go ahead and remove the drives from the enclosure and put them in a old pc. This will allow me to test the drives. I received temporary drive offline alerts at 7pm the other night which is when I was getting outputs to post here. I now am reading online about people not liking esata pm setups. I am thinking this maybe a card issue. Since there was originally no intention of using freenas no compatibility testing was done. I am wondering if I can somehow use the 200i card with a really long sff breakout cable.
 

zlittell

Cadet
Joined
Feb 24, 2014
Messages
8
thinking of going this route:
LSI SAS 3801E
http://www.newegg.com/Product/Product.aspx?Item=N82E16816118076
into a small pc box or rackmount chassis filled with the drives (no mb cpu etc just drives and psu) with this in the back:
koutech io-sff480 4port sas/sata 6G to external sff-8088 mini-sas
http://www.newegg.com/Product/Product.aspx?Item=N82E16816104028

eliminate cheap esata card and port multiplier box and replace with this should solve my issues. I will run tests on the drives just to make sure. I just find it strange that while working on the machine every drive went inaccessible and came back without me noticing and only indication was lots of email alerts.

has anyone used the lsi sas 3801e it was recommended as a top card for opensolaris with zfs build.
at first I was going to get the ibm serveraid m1015 and cross flash it but then I would have to get the internal sff to the other enclosure with either a cable snaked out the back or another PCI slot convertor. I may still go this route just for the fact that it is a proven card with freenas and they are both roughly the same price.
 
Status
Not open for further replies.
Top