Brand new NAS drive with sector errors.

shadex

Dabbler
Joined
Aug 31, 2022
Messages
10
I just bought WD NAS HDD on Newegg: SMART reading indicate that HDD has 3100 unreadable sector. It's brand new :\. Should I send it back? See pix below:

 

shadex

Dabbler
Joined
Aug 31, 2022
Messages
10
This seems to answer my question:
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
The drive may be fine, but it seems unusual. What is the model number of the drive?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Run a long smart test, then paste the output of smartctl –a /dev/ada1 and smartctl –a /dev/ada2 between [CODE][/CODE] tags, however it seems strange that both disks are experiencing the same error. Something is fishy.
Also, this could be useful.

Anyway, a bad sector should be enough for a RMA.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Shoot - I didn't notice there were two drives with similar reports - I was wondering of the OP has an SMR drive and was going to get that issue out pof the way before suggesting that he test and run bad blocks ...
 

shadex

Dabbler
Joined
Aug 31, 2022
Messages
10
Shoot - I didn't notice there were two drives with similar reports - I was wondering of the OP has an SMR drive and was going to get that issue out pof the way before suggesting that he test and run bad blocks ...

Standard red (SMA drive I guess).
 

shadex

Dabbler
Joined
Aug 31, 2022
Messages
10
Run a long smart test, then paste the output of smartctl –a /dev/ada1 and smartctl –a /dev/ada2 between [CODE][/CODE] tags, however it seems strange that both disks are experiencing the same error. Something is fishy.
Also, this could be useful.

Anyway, a bad sector should be enough for a RMA.

Yeah. Doing that right now. Sigh. I guess going SSD is even more attractive than HDD.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Standard red (SMA drive I guess).
Well, SMR drives are known problematical and not recommended for use with TrueNAS, so I suggest that you return that drive irrespective and get a CMR drive - if you want to go with HD Reds, get a Red+ which will be CMR - look at the listing in the resource here at https://www.truenas.com/community/resources/list-of-known-smr-drives.141/ for confirmation.

Anyway, please report the results of your long tests here in code tags as suggsted by Davvo - also better tell us about the rest of your hardware so that you can get some suggestions about what may be going on overall.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Yeah. Doing that right now. Sigh. I guess going SSD is even more attractive than HDD.
SSDs often die without crying for help, HDDs warn you. Plus, they are cheaper per TB.
You were just unlucky to not notice the warnings in the forum and picked the wrong type.
It happens :smile:
 

shadex

Dabbler
Joined
Aug 31, 2022
Messages
10
Yeah. Doing that right now. Sigh. I guess going SSD is even more attractive than HDD.
WD NAS NSA STANDARD looks fine:
Code:
root@truenas[~]# smartctl -a /dev/ada1
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p1 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red (SMR)
Device Model:     WDC WD40EFAX-68JH4N1
Serial Number:    WD-WX92D428JTLF
LU WWN Device Id: 5 0014ee 26a986163
Firmware Version: 83.00A83
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Sep 13 17:19:56 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (18780) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  42) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3039) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   100   253   021    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       2
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       4
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       5
194 Temperature_Celsius     0x0022   117   112   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         2         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing



HGST: PASSED, but with read errors.

Code:
root@truenas[~]# smartctl -a /dev/ada2
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p1 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi/HGST Ultrastar 7K4000
Device Model:     Hitachi HUS724040ALE641
Serial Number:    PBKP4XGT
LU WWN Device Id: 5 000cca 23df3e4c4
Firmware Version: MJAOA5F0
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Sep 13 17:20:55 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121) The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                (   24) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  39) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   135   135   054    Pre-fail  Offline      -       84
  3 Spin_Up_Time            0x0007   127   127   024    Pre-fail  Always       -       572 (Average 642)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       666
  5 Reallocated_Sector_Ct   0x0033   032   032   005    Pre-fail  Always       -       993
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   119   119   020    Pre-fail  Offline      -       35
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       1102
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       126
192 Power-Off_Retract_Count 0x0032   099   099   000    Old_age   Always       -       1711
193 Load_Cycle_Count        0x0012   099   099   000    Old_age   Always       -       1711
194 Temperature_Celsius     0x0002   166   166   000    Old_age   Always       -       36 (Min/Max 23/49)
196 Reallocated_Event_Count 0x0032   054   054   000    Old_age   Always       -       1007
197 Current_Pending_Sector  0x0022   001   001   000    Old_age   Always       -       3072
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 350 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 350 occurred at disk power-on lifetime: 3 hours (0 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 80 e8 02 c2 0f  Error: UNC at LBA = 0x0fc202e8 = 264372968

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 40 68 02 c2 40 00      03:10:31.251  READ FPDMA QUEUED
  60 00 38 e8 00 c2 40 00      03:10:31.250  READ FPDMA QUEUED
  60 00 30 e8 ff c1 40 00      03:10:31.249  READ FPDMA QUEUED
  60 00 28 e8 fe c1 40 00      03:10:31.249  READ FPDMA QUEUED
  60 00 20 e8 fd c1 40 00      03:10:31.243  READ FPDMA QUEUED

Error 349 occurred at disk power-on lifetime: 3 hours (0 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 f8 a8 9c c1 0f  Error: WP at LBA = 0x0fc19ca8 = 264346792

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 a8 80 dc 5a 40 00      03:10:14.436  WRITE FPDMA QUEUED
  60 08 a0 78 c7 ee 40 00      03:10:14.243  READ FPDMA QUEUED
  60 c8 98 a0 9d c1 40 00      03:10:14.163  READ FPDMA QUEUED
  60 00 90 a0 9c c1 40 00      03:10:14.163  READ FPDMA QUEUED
  60 00 88 a0 9b c1 40 00      03:10:14.162  READ FPDMA QUEUED

Error 348 occurred at disk power-on lifetime: 3 hours (0 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 c0 a8 9c c1 0f  Error: UNC at LBA = 0x0fc19ca8 = 264346792

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 f8 68 9d c1 40 00      03:10:10.559  READ FPDMA QUEUED
  60 00 f0 68 9c c1 40 00      03:10:10.559  READ FPDMA QUEUED
  60 00 e8 68 9b c1 40 00      03:10:10.559  READ FPDMA QUEUED
  60 00 e0 68 9a c1 40 00      03:10:10.559  READ FPDMA QUEUED
  60 00 d8 68 99 c1 40 00      03:10:10.559  READ FPDMA QUEUED

Error 347 occurred at disk power-on lifetime: 3 hours (0 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 e0 a8 9c c1 0f  Error: UNC at LBA = 0x0fc19ca8 = 264346792

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 88 99 c1 40 00      03:10:06.972  READ FPDMA QUEUED
  60 00 f8 88 98 c1 40 00      03:10:06.972  READ FPDMA QUEUED
  60 00 f0 88 97 c1 40 00      03:10:06.971  READ FPDMA QUEUED
  60 e0 e8 88 9d c1 40 00      03:10:06.971  READ FPDMA QUEUED
  60 00 e0 88 9c c1 40 00      03:10:06.971  READ FPDMA QUEUED

Error 346 occurred at disk power-on lifetime: 3 hours (0 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 60 86 c1 0f  Error: UNC at LBA = 0x0fc18660 = 264341088

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 b8 68 85 c1 40 00      03:10:00.220  READ FPDMA QUEUED
  60 00 b0 68 84 c1 40 00      03:10:00.220  READ FPDMA QUEUED
  60 00 a8 68 83 c1 40 00      03:10:00.220  READ FPDMA QUEUED
  60 00 a0 68 82 c1 40 00      03:10:00.209  READ FPDMA QUEUED
  60 60 98 68 86 c1 40 00      03:10:00.209  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      1100         525257536
# 2  Short offline       Completed without error       00%      1098         -
# 3  Short offline       Completed without error       00%      1085         -
# 4  Short offline       Completed without error       00%      1061         -
# 5  Short offline       Completed without error       00%      1037         -
# 6  Short offline       Completed without error       00%      1013         -
# 7  Short offline       Completed without error       00%       989         -
# 8  Short offline       Completed without error       00%       965         -
# 9  Short offline       Completed without error       00%       941         -
#10  Short offline       Completed without error       00%       917         -
#11  Short offline       Completed without error       00%       893         -
#12  Extended offline    Completed: read failure       90%       887         525257536
#13  Short offline       Completed without error       00%       886         -
#14  Short offline       Completed without error       00%       870         -
#15  Short offline       Completed without error       00%       845         -
#16  Short offline       Completed without error       00%       821         -
#17  Short offline       Completed without error       00%       805         -
#18  Short offline       Completed without error       00%       791         -
#19  Short offline       Completed without error       00%       788         -
#20  Vendor (0xb0)       Completed without error       00%     57842         -
#21  Vendor (0x71)       Completed without error       00%     57842         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing



That's all I need to know. I'll send it back tmw.
 

shadex

Dabbler
Joined
Aug 31, 2022
Messages
10
Well, SMR drives are known problematical and not recommended for use with TrueNAS, so I suggest that you return that drive irrespective and get a CMR drive - if you want to go with HD Reds, get a Red+ which will be CMR - look at the listing in the resource here at https://www.truenas.com/community/resources/list-of-known-smr-drives.141/ for confirmation.

Anyway, please report the results of your long tests here in code tags as suggsted by Davvo - also better tell us about the rest of your hardware so that you can get some suggestions about what may be going on overall.

The test I performed was selected for long test.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
The test I performed was selected for long test.
Yes, thanks. I read it.
So you’re returning the WD red / good move!
If the HGST were mine, I’d run bad blocks on it. Perhaps someone with specific HGST experience will chip in with other advice here?
It’s not clear if there’s something else going on with your system. Please give us a full description of your hardware and advise your Truenas version.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
To me the HGST looks like it's dying, but I have no experience with them.
The WD seems fine, but as @Redcoat said you want to send it back and use a CMR if you can.
SMR will cause you nightmares while resilvering.

Also, suggested reading:
 
Top