HDD Failure - read / write errors

Ruff.Hi

Patron
Joined
Apr 21, 2015
Messages
271
I got a report that a drive in my server was throwing read / write errors.

I swapped the drive out and the new (84hrs old) drive started to throw errors too.

I don't think this is the drive ... I think it might be the hardware connecting this drive to the motherboard. I am using a mini SAS (SFF-8643) cable to connect my 8 drives. Could I have a bad connection here? Wouldn't that throw a different error?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Wouldn't that throw a different error?
We have not seen any errors... You didn't share details with us. Here are some resources that might help you:

Forum Guidelines
https://www.ixsystems.com/community/threads/forum-guidelines.45124/

Hard Drive Troubleshooting Guide (All Versions of FreeNAS)
https://www.ixsystems.com/community...bleshooting-guide-all-versions-of-freenas.17/

Building, Burn-In, and Testing your FreeNAS system
https://www.ixsystems.com/community/resources/building-burn-in-and-testing-your-freenas-system.38/

GitHub repository for FreeNAS scripts, including disk burnin
https://www.ixsystems.com/community...for-freenas-scripts-including-disk-burnin.28/
 

Ruff.Hi

Patron
Joined
Apr 21, 2015
Messages
271
Fair enough. Thanks for those links. I have consumed most of them and do have a phalanx of scripts that run.

Code:
########## ZPool status report for DuffleBag ##########

  pool: DuffleBag
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
    repaired.
  scan: scrub repaired 0 in 0 days 04:08:18 with 0 errors on Tue Dec 15 02:32:56 2020
config:

    NAME                                            STATE     READ WRITE CKSUM
    DuffleBag                                       DEGRADED     0     0     0
      raidz2-0                                      DEGRADED     0     0     0
        gptid/e99f6068-12d3-11eb-a6fd-ac1f6ba054d6  ONLINE       0     0     0
        gptid/66b03e04-fc59-11e9-ad4a-0cc47aac270a  ONLINE       0     0     0
        gptid/68a8eef6-fc59-11e9-ad4a-0cc47aac270a  FAULTED      6   160     0  too many errors
        gptid/6a92d558-fc59-11e9-ad4a-0cc47aac270a  ONLINE       0     0     0
        gptid/6c98c37a-fc59-11e9-ad4a-0cc47aac270a  ONLINE       0     0     0
        gptid/28db87eb-0e11-11ea-8458-ac1f6ba054d6  ONLINE       0     0     0
        gptid/70c0ff97-fc59-11e9-ad4a-0cc47aac270a  ONLINE       0     0     0
        gptid/72e3ca68-fc59-11e9-ad4a-0cc47aac270a  ONLINE       0     0     0



And here is the SMART output for the hdd in question ...
Code:
########## SMART status report for da4 drive (Western Digital Red: WD-WCC7K4ACHE3Z) ##########
Current Date / Time - Wednesday 16 2020 05:00:03
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p14 amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   181   165   021    Pre-fail  Always       -       5933
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       72
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   075   075   000    Old_age   Always       -       18732
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       72
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       69
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       11
194 Temperature_Celsius     0x0022   128   119   000    Old_age   Always       -       22
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p14 amd64] (local build)

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     18727         -
# 2  Short offline       Completed without error       00%     18705         -
# 3  Short offline       Completed without error       00%     18679         -
# 4  Short offline       Completed without error       00%     18655         -
# 5  Short offline       Completed without error       00%     18631         -
# 6  Short offline       Completed without error       00%     18607         -
# 7  Short offline       Completed without error       00%     18583         -
# 8  Extended offline    Completed without error       00%     18576         -
# 9  Short offline       Completed without error       00%     18559         -
#10  Short offline       Completed without error       00%     18535         -
#11  Short offline       Completed without error       00%     18511         -
#12  Short offline       Completed without error       00%     18487         -
#13  Short offline       Completed without error       00%     18463         -
#14  Short offline       Completed without error       00%     18439         -
#15  Short offline       Completed without error       00%     18415         -
#16  Short offline       Completed without error       00%     18391         -
#17  Short offline       Completed without error       00%     18367         -
#18  Short offline       Completed without error       00%     18343         -
#19  Short offline       Completed without error       00%     18319         -
#20  Short offline       Completed without error       00%     18295         -
#21  Extended offline    Completed without error       00%     18288         -
 
Top