SMART long test will not complete

Status
Not open for further replies.

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
Hello,

I recently got a new Seagate 4TB NAS HDD, which I am testing to verify that it is a good disk. A short test seems to complete fine with no errors, but the offline test is not completing. Here is the short test I run that completes OK:
Code:
smartctl -t short /dev/sdc -c


The long test, which is not working, is:
Code:
smartctl -t long /dev/sdc -c

...and getting the following output after waiting the time specified:
Code:
<snip>
Self-test execution status:      (  41) The self-test routine was interrupted
                                        by the host with a hard or soft reset.
<snip>
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   006    Pre-fail  Always       -       18760
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       13
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       919
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       26
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       4
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   253   000    Old_age   Always       -       4295032833
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   073   069   045    Old_age   Always       -       27 (Min/Max 22/28)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       13
194 Temperature_Celsius     0x0022   027   040   000    Old_age   Always       -       27 (0 16 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   253   000    Old_age   Always       -       0
 
SMART Error Log Version: 1
ATA Error Count: 1
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
 
Error 1 occurred at disk power-on lifetime: 1 hours (0 days + 1 hours)
  When the command that caused the error occurred, the device was in an unknown state.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 00 00 00 00 00  Error: ABRT
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 00 00 00 00 00 00 ff      00:12:29.109  NOP [Abort queued commands]
  b0 d4 00 82 4f c2 a0 00      00:12:19.098  SMART EXECUTE OFF-LINE IMMEDIATE
  b0 d0 01 00 4f c2 a0 00      00:12:18.995  SMART READ DATA
  ec 00 01 00 00 00 a0 00      00:12:18.982  IDENTIFY DEVICE
  b0 d5 01 e0 4f c2 a0 00      00:02:26.334  SMART READ LOG
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      00%        25         -
# 2  Extended offline    Aborted by host               90%         3         -
# 3  Extended captive    Interrupted (host reset)      90%         1         -
 
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


The strange things I'm seeing are the "The self-test routine was interrupted by the host with a hard or soft reset," the "Interrupted (host reset)," status of the last test.

Any ideas what might be happening here? I am suspecting a bum cable or a bad controller.

Thanks for your help.
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
Forgot to mention: I am running Windows 8 Pro x64. I disabled hard disk sleep as well, although I suspect that only affects spindown.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The extended offline test takes many hours to run(even longer if the drive is busy). Part of your snipped area will tell you how many minutes the test should take. Normally I recommend people simply start the test and leave the machine on overnight to do the test.

You did have an error at 1 power-on hour. But you were also running a test so they two may be related and it may not be a sign of something being wrong. In any case, if you do a short and long test and have no errors then your disk is probably okay. In any case, both tests passing means you do not qualify for an RMA anyway.
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
Thanks Cyberjock. I did let the long test run twice for the recommended time which was about 9 hours.
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
Welp, either I didn't let it run long enough before (which means that the recommended time was wrong) or it was the cable/controller. I ran it again and waited much longer this time after changing the cable (and maybe controller, I'm not sure which controller it was going to) and it completed successfully. The drive wasn't busy last time so I know that wasn't a factor. Anyway, thanks for the help.

The cable that I used was one that came with a consumer motherboard. Maybe I am running into the implications of all of the warnings on this forum about using consumer grade components.

Now I'm going to read up on a post I recently found about properly thrashing this drive before installing it permanently. :D

Here's my results if anyone cares to peruse them and notice anything I might not have been as alarmed about as I should be:
Code:
smartctl 6.2 2013-07-26 r3841 [i686-w64-mingw32-win8.1(64)] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Device Model:    ST4000VN000-1H4168
Serial Number:    <snip>
LU WWN Device Id: 5 000c50 064ccbeac
Firmware Version: SC43
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:    512 bytes logical, 4096 bytes physical
Rotation Rate:    5900 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:  ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Jan 07 18:37:54 2014 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  107) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (  1) minutes.
Extended self-test routine
recommended polling time:        ( 528) minutes.
Conveyance self-test routine
recommended polling time:        (  2) minutes.
SCT capabilities:              (0x10bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  100  100  006    Pre-fail  Always      -      19272
  3 Spin_Up_Time            0x0003  093  093  000    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      13
  5 Reallocated_Sector_Ct  0x0033  100  100  010    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000f  100  253  030    Pre-fail  Always      -      1997
  9 Power_On_Hours          0x0032  100  100  000    Old_age  Always      -      46
10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      4
184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0
187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0
188 Command_Timeout        0x0032  100  253  000    Old_age  Always      -      4295032833
189 High_Fly_Writes        0x003a  100  100  000    Old_age  Always      -      0
190 Airflow_Temperature_Cel 0x0022  073  068  045    Old_age  Always      -      27 (Min/Max 22/32)
191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      1
193 Load_Cycle_Count        0x0032  100  100  000    Old_age  Always      -      13
194 Temperature_Celsius    0x0022  027  040  000    Old_age  Always      -      27 (0 16 0 0 0)
197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x003e  200  253  000    Old_age  Always      -      0
 
SMART Error Log Version: 1
ATA Error Count: 1
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
 
Error 1 occurred at disk power-on lifetime: 1 hours (0 days + 1 hours)
  When the command that caused the error occurred, the device was in an unknown state.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 00 00 00 00 00  Error: ABRT
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 00 00 00 00 00 00 ff      00:12:29.109  NOP [Abort queued commands]
  b0 d4 00 82 4f c2 a0 00      00:12:19.098  SMART EXECUTE OFF-LINE IMMEDIATE
  b0 d0 01 00 4f c2 a0 00      00:12:18.995  SMART READ DATA
  ec 00 01 00 00 00 a0 00      00:12:18.982  IDENTIFY DEVICE
  b0 d5 01 e0 4f c2 a0 00      00:02:26.334  SMART READ LOG
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error      00%        35        -
# 2  Short offline      Completed without error      00%        26        -
# 3  Extended offline    Interrupted (host reset)      00%        25        -
# 4  Extended offline    Aborted by host              90%        3        -
# 5  Extended captive    Interrupted (host reset)      90%        1        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
WOW! The recommended time for a long test is 528 minutes! That's 8.8 hours! That's the longest I've ever seen!

Remember, that 8.8 hours is a relatively good number if you have zero disk usage. Any disk usage temporarily halts the test, meaning you'll have to wait even longer...

But, your drive passed both a short and long test(with a 9 hour difference). So the 8.8 hours quoted seems to be how long you actually had to wait...

  1. Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
  2. # 1 Extended offline Completed without error 00% 35 -
  3. # 2 Short offline Completed without error 00% 26 -
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
Good forensics, there, finding that it completed in 9 hours! Maybe that is telling me that the initial problem was the cable or controller.

This is the Seagate 4TB NAS drive. Maybe that's why it took so long, because it's 4TB. Or, because it's only 5900 RPM.

The WD Red drives are slower at 5400 RPM!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Some manufacturers do aggressive testing and basically run the drive test as fast as they can. Other's are more leisurely and do so many reads per minute and that's it. My Greens are 3TB and take about 350 minutes or so.
 

Satam

Dabbler
Joined
Jan 23, 2014
Messages
40
Remember, that 8.8 hours is a relatively good number if you have zero disk usage. Any disk usage temporarily halts the test, meaning you'll have to wait even longer...
So, I figure it is not recommended to take a disk offline before doing a smart long test since it's not necessary? I have a zpool made out of one RAIDZ2 vdev and one of the disks is failing.

Unfortunately it's now been already the second of my two Seagate Barracuda 7200.14 (ST3000DM001) which has decided to go into retirement and be sent back to its maker. First one I had replaced with a "repaired" disk even before I set up the FreeNAS server. I had both disks for a while before I bought an extra four 3TB WD Reds. I hope the repaired ones live at least as long as the ones they are replacing.
 
Status
Not open for further replies.
Top