Device is causing slow I/O on pool and some sector problem as well

Joined
Jan 15, 2021
Messages
8
I recently installed an old drive into my TrueNAS system and use it for Transmission download folder.
And I get a lot of error each day.
It is all on the new installed drive and have IO slow and bad sector error.
On the disk status page, it shown healthy.
When I run the below command to display the result, there is a 60 sec or so delay which does not show in the other disks.

I did a long smart test and here is the output for

Code:
smartctl -a /dev/ada4


Code:
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-1E6166
Serial Number:    W1F4CHVT
LU WWN Device Id: 5 000c50 03d2be4bc
Firmware Version: SC48
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Dec  8 16:12:04 2021 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121) The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                (  592) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 359) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3081) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   115   091   006    Pre-fail  Always       -       89721040
  3 Spin_Up_Time            0x0003   091   090   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   085   085   020    Old_age   Always       -       15623
  5 Reallocated_Sector_Ct   0x0033   099   099   010    Pre-fail  Always       -       656
  7 Seek_Error_Rate         0x000f   060   051   030    Pre-fail  Always       -       648691413103
  9 Power_On_Hours          0x0032   037   037   000    Old_age   Always       -       55315
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       344
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       3761
188 Command_Timeout         0x0032   100   084   000    Old_age   Always       -       1382 1383 1383
189 High_Fly_Writes         0x003a   001   001   000    Old_age   Always       -       236
190 Airflow_Temperature_Cel 0x0022   066   053   045    Old_age   Always       -       34 (Min/Max 23/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       130
193 Load_Cycle_Count        0x0032   010   010   000    Old_age   Always       -       181259
194 Temperature_Celsius     0x0022   034   047   000    Old_age   Always       -       34 (0 9 0 0 0)
197 Current_Pending_Sector  0x0012   001   001   000    Old_age   Always       -       28144
198 Offline_Uncorrectable   0x0010   001   001   000    Old_age   Offline      -       28144
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       13959h+06m+30.228s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       52074124230
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       63767549312

SMART Error Log Version: 1
ATA Error Count: 7436 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 7436 occurred at disk power-on lifetime: 41145 hours (1714 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 80 ff ff ff 4f 00      00:03:07.786  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.761  READ DMA EXT
  25 00 10 ff ff ff 4f 00      00:03:07.746  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.745  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.711  READ DMA EXT

Error 7435 occurred at disk power-on lifetime: 41145 hours (1714 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 80 ff ff ff 4f 00      00:03:07.786  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.761  READ DMA EXT
  25 00 10 ff ff ff 4f 00      00:03:07.746  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.745  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.711  READ DMA EXT

Error 7434 occurred at disk power-on lifetime: 41145 hours (1714 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 80 ff ff ff 4f 00      00:03:07.761  READ DMA EXT
  25 00 10 ff ff ff 4f 00      00:03:07.746  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.745  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.711  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.678  READ DMA EXT

Error 7433 occurred at disk power-on lifetime: 41145 hours (1714 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 80 ff ff ff 4f 00      00:03:07.761  READ DMA EXT
  25 00 10 ff ff ff 4f 00      00:03:07.746  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.745  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.711  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.678  READ DMA EXT

Error 7432 occurred at disk power-on lifetime: 41145 hours (1714 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 80 ff ff ff 4f 00      00:03:07.678  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.644  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.611  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.578  READ DMA EXT
  25 00 80 ff ff ff 4f 00      00:03:07.544  READ DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     55310         4068066632
# 2  Extended offline    Interrupted (host reset)      90%     55306         -
# 3  Conveyance offline  Completed without error       00%     47737            3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


I think this is a SMR drive and the error is probably due to cache flush?
I am thinking to do a ATA secure erase on the drive. Would that help?
Is there a way to instruct the drive to write to the platter directly without using the cache?
Is there a way to disable resilvering just for this disk?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Your last smart test failed at 55310 runtime hours
I would (but am not hopeful) take the disk out to an alternative machine and erase it then throw every test you can at it with the expectation that it will fail. You want to run the equivalent of badblocks on it for several days and only if it passes should you use it.

Oh and if it is SMR then throw it in the bin (before running the tests)


Actually, having done some research this is a drive with a very high failure rate and is generally considered a piece of crap so throw it away and get something different (not SMR)
 
Joined
Jan 15, 2021
Messages
8
Your last smart test failed at 55310 runtime hours
I would (but am not hopeful) take the disk out to an alternative machine and erase it then throw every test you can at it with the expectation that it will fail. You want to run the equivalent of badblocks on it for several days and only if it passes should you use it.

Oh and if it is SMR then throw it in the bin (before running the tests)


Actually, having done some research this is a drive with a very high failure rate and is generally considered a piece of crap so throw it away and get something different (not SMR)
Thanks for the information.
I was trying to rsync the files from the crappy disk to another disk but it was giving me so much errors.
The transfer rate was like 17kB/s. At this point, I think my data is lost.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
yup I think so too.
It was a single disk pool so hopefully contained nothing too important
 
Top