Help with the following error message:

runevn

Explorer
Joined
Apr 4, 2019
Messages
63
Hi,

This morning I received this message from my FreeNAS. Any ideas on what is happening? Or how I should investigate and resolve the issue?

Code:
kernel log messages:
>       (da4:mps0:0:20:0): READ(16). CDB: 88 00 00 00 00 01 50 c1 7e 60 00 00 00 40 00 00 length 32768 SMID 248 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
>       (da4:mps0:0:20:0): READ(16). CDB: 88 00 00 00 00 01 50 c1 7e a0 00 00 00 40 00 00 length 32768 SMID 836 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
> (da4:mps0:0:20:0): READ(16). CDB: 88 00 00 00 00 01 50 c1 7e 60 00 00 00 40 00 00
> (da4:mps0:0:20:0): CAM status: CCB request completed with an error
> (da4:mps0:0:20:0): Retrying command
> (da4:mps0:0:20:0): READ(16). CDB: 88 00 00 00 00 01 50 c1 7e a0 00 00 00 40 00 00
> (da4:mps0:0:20:0): CAM status: CCB request completed with an error
> (da4:mps0:0:20:0): Retrying command
> (da4:mps0:0:20:0): READ(16). CDB: 88 00 00 00 00 01 50 c1 7e 20 00 00 00 40 00 00
> (da4:mps0:0:20:0): CAM status: SCSI Status Error
> (da4:mps0:0:20:0): SCSI status: Check Condition
> (da4:mps0:0:20:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da4:mps0:0:20:0): Info: 0x150c17e20
> (da4:mps0:0:20:0): Error 5, Unretryable error
> GEOM_ELI: g_eli_read_done() failed (error=5) gptid/4bbd87aa-d4ba-11e9-9716-000c29ee2069.eli[READ(offset=2890563010560, length=32768)]

-- End of security output --


I'm running a FreeNAS-11.2-U6 system with 6x 6TB WD Reds in one RAIDZ2 pool.

Please let me know if you need more information.

Thanks in advance.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Post the complete output of zpool status and smartctl -x /dev/da4 in code tags.
 

runevn

Explorer
Joined
Apr 4, 2019
Messages
63
Sure, here is the output:
Code:
root@nas[~]# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:05 with 0 errors on Tue Nov 19 03:45:05 2019
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors

  pool: storagepool
 state: ONLINE
  scan: resilvered 12K in 0 days 00:00:01 with 0 errors on Mon Nov  4 14:31:02 2019
config:

        NAME                                                STATE     READ WRITE CKSUM
        storagepool                                         ONLINE       0     0     0
          raidz2-0                                          ONLINE       0     0     0
            gptid/5613e975-aee3-11e9-9fbd-000c29ee2069.eli  ONLINE       0     0     0
            gptid/5831da4a-aee3-11e9-9fbd-000c29ee2069.eli  ONLINE       0     0     0
            gptid/5a699763-aee3-11e9-9fbd-000c29ee2069.eli  ONLINE       0     0     0
            gptid/4bbd87aa-d4ba-11e9-9716-000c29ee2069.eli  ONLINE       0     0     0
            gptid/5eb3d5de-aee3-11e9-9fbd-000c29ee2069.eli  ONLINE       0     0     0
            gptid/60b6bd83-aee3-11e9-9fbd-000c29ee2069.eli  ONLINE       0     0     0

errors: No known data errors


Code:
root@nas[~]# smartctl -x /dev/da4
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD60EFRX-68MYMN1
Serial Number:    WD-WX11DC4FKX7J
LU WWN Device Id: 5 0014ee 2b6406f5c
Firmware Version: 82.00A82
User Capacity:    6,001,175,126,016 bytes [6.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5700 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Nov 20 10:54:09 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                ( 4424) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 698) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   197   051    -    0
  3 Spin_Up_Time            POS--K   200   193   021    -    8958
  4 Start_Stop_Count        -O--CK   093   093   000    -    7255
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   100   253   000    -    0
  9 Power_On_Hours          -O--CK   048   048   000    -    38067
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   253   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    56
192 Power-Off_Retract_Count -O--CK   200   200   000    -    29
193 Load_Cycle_Count        -O--CK   198   198   000    -    7552
194 Temperature_Celsius     -O---K   125   109   000    -    27
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   253   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    1
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb6  GPL,SL  VS       1  Device vendor specific log
0xb7       GPL,SL  VS      40  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 136 (device log contains only the most recent 24 errors)
        CR     = Command Register
        FEATR  = Features Register
        COUNT  = Count (was: Sector Count) Register
        LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
        LH     = LBA High (was: Cylinder High) Register    ]   LBA
        LM     = LBA Mid (was: Cylinder Low) Register      ] Register
        LL     = LBA Low (was: Sector Number) Register     ]
        DV     = Device (was: Device/Head) Register
        DC     = Device Control Register
        ER     = Error register
        ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 136 [15] occurred at disk power-on lifetime: 38059 hours (1585 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 01 50 c1 7e 20 40 00  Error: UNC at LBA = 0x150c17e20 = 5649825312

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 40 00 00 00 01 50 c1 7e 20 40 00  1d+10:10:24.299  READ FPDMA QUEUED
  60 00 40 00 08 00 01 50 c1 07 90 40 00  1d+10:10:23.488  READ FPDMA QUEUED
  60 00 40 00 00 00 01 50 c1 07 50 40 00  1d+10:10:23.488  READ FPDMA QUEUED
  60 00 40 00 08 00 01 50 c1 07 10 40 00  1d+10:10:23.485  READ FPDMA QUEUED
  60 00 40 00 00 00 01 50 c1 06 d0 40 00  1d+10:10:23.485  READ FPDMA QUEUED

Error 135 [14] occurred at disk power-on lifetime: 36411 hours (1517 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 01 7a f1 aa 38 40 00  Error: UNC at LBA = 0x17af1aa38 = 6357625400

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 d8 00 40 00 01 7a f1 b0 b0 40 00  2d+13:21:20.864  READ FPDMA QUEUED
  60 01 00 00 38 00 01 7a f1 af b0 40 00  2d+13:21:20.863  READ FPDMA QUEUED
  60 01 00 00 18 00 01 7a f1 ae b0 40 00  2d+13:21:20.863  READ FPDMA QUEUED
  60 01 00 00 30 00 01 7a f1 ad b0 40 00  2d+13:21:20.863  READ FPDMA QUEUED
  60 01 00 00 28 00 01 7a f1 ac b0 40 00  2d+13:21:20.863  READ FPDMA QUEUED

Error 134 [13] occurred at disk power-on lifetime: 33124 hours (1380 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 01 00 00 00 3d 77 a7 40 00  Error: UNC 1 sectors at LBA = 0x003d77a7 = 4028327

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  25 00 00 00 01 00 00 00 3d 77 a7 40 00     02:14:16.046  READ DMA EXT
  e1 00 00 00 0f 00 00 00 00 00 00 00 00     02:14:16.046  IDLE IMMEDIATE
  ef 00 02 00 00 00 00 00 00 00 00 00 00     02:14:16.046  SET FEATURES [Enablewrite cache]
  ef 00 90 00 06 00 00 00 00 00 00 00 00     02:14:16.046  SET FEATURES [Disable SATA feature]
  e1 00 00 00 02 00 00 00 00 00 00 00 00     02:14:16.046  IDLE IMMEDIATE

Error 133 [12] occurred at disk power-on lifetime: 33124 hours (1380 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 01 00 00 00 3d 77 a6 40 00  Error: UNC at LBA = 0x003d77a6 = 4028326

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 01 00 00 00 00 00 3d 77 a6 40 00     02:14:11.557  READ FPDMA QUEUED
  25 00 00 00 01 00 00 00 3d 77 a5 40 00     02:14:07.812  READ DMA EXT
  e1 00 00 00 0f 00 00 00 00 00 00 00 00     02:14:07.812  IDLE IMMEDIATE
  ef 00 02 00 00 00 00 00 00 00 00 00 00     02:14:07.811  SET FEATURES [Enablewrite cache]
  ef 00 90 00 06 00 00 00 00 00 00 00 00     02:14:07.811  SET FEATURES [Disable SATA feature]

Error 132 [11] occurred at disk power-on lifetime: 33124 hours (1380 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 01 00 00 00 3d 77 a5 40 00  Error: UNC 1 sectors at LBA = 0x003d77a5 = 4028325

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  25 00 00 00 01 00 00 00 3d 77 a5 40 00     02:14:07.812  READ DMA EXT
  e1 00 00 00 0f 00 00 00 00 00 00 00 00     02:14:07.812  IDLE IMMEDIATE
  ef 00 02 00 00 00 00 00 00 00 00 00 00     02:14:07.811  SET FEATURES [Enablewrite cache]
  ef 00 90 00 06 00 00 00 00 00 00 00 00     02:14:07.811  SET FEATURES [Disable SATA feature]
  e1 00 00 00 02 00 00 00 00 00 00 00 00     02:14:07.811  IDLE IMMEDIATE

Error 131 [10] occurred at disk power-on lifetime: 33124 hours (1380 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 01 00 00 00 3d 77 a4 40 00  Error: UNC at LBA = 0x003d77a4 = 4028324

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 01 00 00 00 00 00 3d 77 a4 40 00     02:14:03.355  READ FPDMA QUEUED
  25 00 00 00 01 00 00 00 3d 77 a3 40 00     02:13:59.614  READ DMA EXT
  e1 00 00 00 0f 00 00 00 00 00 00 00 00     02:13:59.614  IDLE IMMEDIATE
  ef 00 02 00 00 00 00 00 00 00 00 00 00     02:13:59.614  SET FEATURES [Enablewrite cache]
  ef 00 90 00 06 00 00 00 00 00 00 00 00     02:13:59.614  SET FEATURES [Disable SATA feature]

Error 130 [9] occurred at disk power-on lifetime: 33124 hours (1380 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 01 00 00 00 3d 77 a3 40 00  Error: UNC 1 sectors at LBA = 0x003d77a3 = 4028323

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  25 00 00 00 01 00 00 00 3d 77 a3 40 00     02:13:59.614  READ DMA EXT
  e1 00 00 00 0f 00 00 00 00 00 00 00 00     02:13:59.614  IDLE IMMEDIATE
  ef 00 02 00 00 00 00 00 00 00 00 00 00     02:13:59.614  SET FEATURES [Enablewrite cache]
  ef 00 90 00 06 00 00 00 00 00 00 00 00     02:13:59.614  SET FEATURES [Disable SATA feature]
  e1 00 00 00 02 00 00 00 00 00 00 00 00     02:13:59.614  IDLE IMMEDIATE

Error 129 [8] occurred at disk power-on lifetime: 33124 hours (1380 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 01 00 00 00 3d 77 a2 40 00  Error: UNC at LBA = 0x003d77a2 = 4028322

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 01 00 00 00 00 00 3d 77 a2 40 00     02:13:55.125  READ FPDMA QUEUED
  25 00 00 00 01 00 00 00 3d 77 a1 40 00     02:13:51.378  READ DMA EXT
  e1 00 00 00 0f 00 00 00 00 00 00 00 00     02:13:51.378  IDLE IMMEDIATE
  ef 00 02 00 00 00 00 00 00 00 00 00 00     02:13:51.378  SET FEATURES [Enablewrite cache]
  ef 00 90 00 06 00 00 00 00 00 00 00 00     02:13:51.378  SET FEATURES [Disable SATA feature]

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     38051         -
# 2  Short offline       Completed without error       00%     38027         -
# 3  Short offline       Completed without error       00%     38022         -
# 4  Short offline       Completed without error       00%     37998         -
# 5  Short offline       Completed without error       00%     37975         -
# 6  Short offline       Completed without error       00%     37951         -
# 7  Short offline       Completed without error       00%     37927         -
# 8  Short offline       Completed without error       00%     37903         -
# 9  Short offline       Completed without error       00%     37879         -
#10  Extended offline    Completed without error       00%     37863         -
#11  Short offline       Completed without error       00%     37831         -
#12  Short offline       Completed without error       00%     37807         -
#13  Short offline       Completed without error       00%     37791         -
#14  Short offline       Completed without error       00%     37766         -
#15  Short offline       Completed without error       00%     37741         -
#16  Short offline       Completed without error       00%     37717         -
#17  Short offline       Completed without error       00%     37693         -
#18  Short offline       Completed without error       00%     37669         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    27 Celsius
Power Cycle Min/Max Temperature:     20/28 Celsius
Lifetime    Min/Max Temperature:      2/43 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (335)

Index    Estimated Time   Temperature Celsius
 336    2019-11-20 02:57    28  *********
 ...    ..(175 skipped).    ..  *********
  34    2019-11-20 05:53    28  *********
  35    2019-11-20 05:54    27  ********
 ...    ..(137 skipped).    ..  ********
 173    2019-11-20 08:12    27  ********
 174    2019-11-20 08:13    28  *********
 ...    ..(160 skipped).    ..  *********
 335    2019-11-20 10:54    28  *********

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            2  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4       152835  Vendor specific
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
da4 isn't looking good. Run an extended offline test and post the results once it's finished.
 

runevn

Explorer
Joined
Apr 4, 2019
Messages
63
da4 isn't looking good. Run an extended offline test and post the results once it's finished.

@Jailer, thanks for your supply.

First of all, when you say Run an extended offline test do you mean that I should pull out the disk from the pool and then run a test on another machine?

And second, just so that I, myself, can debug in the future - what data/information are you looking at when determine that da4 isn't looking good?

Once again, thanks for your reply!
 

Jessep

Patron
Joined
Aug 19, 2018
Messages
379
"Device Error Count: 136 (device log contains only the most recent 24 errors)"

CAM status can be related to cabling.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
First of all, when you say Run an extended offline test do you mean that I should pull out the disk from the pool and then run a test on another machine?
No run it on your freenas machine. smartctl -t long /dev/da4
what data/information are you looking at when determine that da4 isn't looking good?
"Device Error Count: 136 (device log contains only the most recent 24 errors)"
 

runevn

Explorer
Joined
Apr 4, 2019
Messages
63
When running: smartctl -t long /dev/da4 I get the following output

Code:
root@nas[~]# smartctl -x /dev/da4
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD60EFRX-68MYMN1
Serial Number:    WD-WX11DC4FKX7J
LU WWN Device Id: 5 0014ee 2b6406f5c
Firmware Version: 82.00A82
User Capacity:    6,001,175,126,016 bytes [6.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5700 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Dec 28 11:20:51 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 121) The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                ( 4424) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 698) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   197   051    -    9
  3 Spin_Up_Time            POS--K   210   193   021    -    8483
  4 Start_Stop_Count        -O--CK   093   093   000    -    7258
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   047   047   000    -    38979
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   253   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    58
192 Power-Off_Retract_Count -O--CK   200   200   000    -    31
193 Load_Cycle_Count        -O--CK   198   198   000    -    7567
194 Temperature_Celsius     -O---K   126   109   000    -    26
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    1
198 Offline_Uncorrectable   ----CK   100   253   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    1
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb6  GPL,SL  VS       1  Device vendor specific log
0xb7       GPL,SL  VS      40  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 140 (device log contains only the most recent 24 errors)
        CR     = Command Register
        FEATR  = Features Register
        COUNT  = Count (was: Sector Count) Register
        LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
        LH     = LBA High (was: Cylinder High) Register    ]   LBA
        LM     = LBA Mid (was: Cylinder Low) Register      ] Register
        LL     = LBA Low (was: Sector Number) Register     ]
        DV     = Device (was: Device/Head) Register
        DC     = Device Control Register
        ER     = Error register
        ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 140 [19] occurred at disk power-on lifetime: 38645 hours (1610 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 27 cd 27 78 40 00  Error: UNC at LBA = 0x27cd2778 = 667756408

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 40 00 10 00 00 27 cd 27 c8 40 00 25d+20:01:56.555  READ FPDMA QUEUED
  60 00 40 00 08 00 00 27 cd 27 88 40 00 25d+20:01:56.555  READ FPDMA QUEUED
  60 00 40 00 00 00 00 27 cd 27 48 40 00 25d+20:01:56.554  READ FPDMA QUEUED
  60 00 10 00 00 00 02 66 cc 5a 48 40 00 25d+20:01:55.918  READ FPDMA QUEUED
  60 00 40 00 00 00 00 27 cd 17 c8 40 00 25d+20:01:55.802  READ FPDMA QUEUED

Error 139 [18] occurred at disk power-on lifetime: 38494 hours (1603 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 27 d2 a4 50 40 00  Error: UNC at LBA = 0x27d2a450 = 668116048

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 20 00 10 00 00 27 d2 a7 a8 40 00 19d+13:20:50.980  READ FPDMA QUEUED
  60 01 00 00 00 00 00 27 d2 a6 a8 40 00 19d+13:20:50.980  READ FPDMA QUEUED
  60 00 90 00 58 00 00 27 d2 a5 b8 40 00 19d+13:20:50.973  READ FPDMA QUEUED
  60 01 00 00 08 00 00 27 d2 a4 b8 40 00 19d+13:20:50.973  READ FPDMA QUEUED
  60 01 00 00 50 00 00 27 d2 a3 b8 40 00 19d+13:20:50.973  READ FPDMA QUEUED

Error 138 [17] occurred at disk power-on lifetime: 38494 hours (1603 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 27 d0 89 b8 40 00  Error: UNC at LBA = 0x27d089b8 = 667978168

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 08 00 00 00 00 27 d0 8c 48 40 00 19d+13:20:46.742  READ FPDMA QUEUED
  60 00 78 00 08 00 00 27 d0 89 68 40 00 19d+13:20:46.739  READ FPDMA QUEUED
  60 00 18 00 00 00 00 27 d0 86 38 40 00 19d+13:20:46.736  READ FPDMA QUEUED
  60 00 40 00 08 00 00 27 d0 81 38 40 00 19d+13:20:46.732  READ FPDMA QUEUED
  60 00 28 00 08 00 00 27 d0 5d 78 40 00 19d+13:20:46.731  READ FPDMA QUEUED

Error 137 [16] occurred at disk power-on lifetime: 38494 hours (1603 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 27 cf f0 30 40 00  Error: UNC at LBA = 0x27cff030 = 667938864

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 40 00 00 00 00 27 cf ef f8 40 00 19d+13:20:42.220  READ FPDMA QUEUED
  60 00 c0 00 08 00 00 27 cf ee b8 40 00 19d+13:20:42.218  READ FPDMA QUEUED
  60 01 00 00 00 00 00 27 cf ed b8 40 00 19d+13:20:42.218  READ FPDMA QUEUED
  60 00 c0 00 00 00 00 27 cf eb b8 40 00 19d+13:20:42.217  READ FPDMA QUEUED
  60 00 80 00 00 00 00 27 cf ea b8 40 00 19d+13:20:42.215  READ FPDMA QUEUED

Error 136 [15] occurred at disk power-on lifetime: 38059 hours (1585 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 01 50 c1 7e 20 40 00  Error: UNC at LBA = 0x150c17e20 = 5649825312

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 40 00 00 00 01 50 c1 7e 20 40 00  1d+10:10:24.299  READ FPDMA QUEUED
  60 00 40 00 08 00 01 50 c1 07 90 40 00  1d+10:10:23.488  READ FPDMA QUEUED
  60 00 40 00 00 00 01 50 c1 07 50 40 00  1d+10:10:23.488  READ FPDMA QUEUED
  60 00 40 00 08 00 01 50 c1 07 10 40 00  1d+10:10:23.485  READ FPDMA QUEUED
  60 00 40 00 00 00 01 50 c1 06 d0 40 00  1d+10:10:23.485  READ FPDMA QUEUED

Error 135 [14] occurred at disk power-on lifetime: 36411 hours (1517 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 01 7a f1 aa 38 40 00  Error: UNC at LBA = 0x17af1aa38 = 6357625400

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 d8 00 40 00 01 7a f1 b0 b0 40 00  2d+13:21:20.864  READ FPDMA QUEUED
  60 01 00 00 38 00 01 7a f1 af b0 40 00  2d+13:21:20.863  READ FPDMA QUEUED
  60 01 00 00 18 00 01 7a f1 ae b0 40 00  2d+13:21:20.863  READ FPDMA QUEUED
  60 01 00 00 30 00 01 7a f1 ad b0 40 00  2d+13:21:20.863  READ FPDMA QUEUED
  60 01 00 00 28 00 01 7a f1 ac b0 40 00  2d+13:21:20.863  READ FPDMA QUEUED

Error 134 [13] occurred at disk power-on lifetime: 33124 hours (1380 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 01 00 00 00 3d 77 a7 40 00  Error: UNC 1 sectors at LBA = 0x003d77a7 = 4028327

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  25 00 00 00 01 00 00 00 3d 77 a7 40 00     02:14:16.046  READ DMA EXT
  e1 00 00 00 0f 00 00 00 00 00 00 00 00     02:14:16.046  IDLE IMMEDIATE
  ef 00 02 00 00 00 00 00 00 00 00 00 00     02:14:16.046  SET FEATURES [Enablewrite cache]
  ef 00 90 00 06 00 00 00 00 00 00 00 00     02:14:16.046  SET FEATURES [Disable SATA feature]
  e1 00 00 00 02 00 00 00 00 00 00 00 00     02:14:16.046  IDLE IMMEDIATE

Error 133 [12] occurred at disk power-on lifetime: 33124 hours (1380 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 01 00 00 00 3d 77 a6 40 00  Error: UNC at LBA = 0x003d77a6 = 4028326

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 01 00 00 00 00 00 3d 77 a6 40 00     02:14:11.557  READ FPDMA QUEUED
  25 00 00 00 01 00 00 00 3d 77 a5 40 00     02:14:07.812  READ DMA EXT
  e1 00 00 00 0f 00 00 00 00 00 00 00 00     02:14:07.812  IDLE IMMEDIATE
  ef 00 02 00 00 00 00 00 00 00 00 00 00     02:14:07.811  SET FEATURES [Enablewrite cache]
  ef 00 90 00 06 00 00 00 00 00 00 00 00     02:14:07.811  SET FEATURES [Disable SATA feature]

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     38962         603552656
# 2  Short offline       Completed: read failure       90%     38938         603552656
# 3  Short offline       Completed: read failure       90%     38914         603552656
# 4  Short offline       Completed: read failure       90%     38890         603552656
# 5  Extended offline    Completed: read failure       90%     38882         603552656
# 6  Short offline       Completed: read failure       90%     38866         603552656
# 7  Short offline       Completed: read failure       90%     38842         603552656
# 8  Short offline       Completed: read failure       90%     38818         603552657
# 9  Short offline       Completed: read failure       90%     38794         603552656
#10  Short offline       Completed: read failure       90%     38770         603552656
#11  Short offline       Completed: read failure       90%     38746         603552656
#12  Short offline       Completed: read failure       90%     38722         667918776
#13  Short offline       Completed without error       00%     38698         -
#14  Short offline       Completed without error       00%     38674         -
#15  Short offline       Completed: read failure       90%     38650         667918777
#16  Short offline       Completed: read failure       90%     38626         667918776
#17  Short offline       Completed without error       00%     38602         -
#18  Short offline       Completed: read failure       90%     38578         667918776

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    26 Celsius
Power Cycle Min/Max Temperature:     24/26 Celsius
Lifetime    Min/Max Temperature:      2/43 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (51)

Index    Estimated Time   Temperature Celsius
  52    2019-12-28 03:23    26  *******
 ...    ..(476 skipped).    ..  *******
  51    2019-12-28 11:20    26  *******

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            2  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4        75016  Vendor specific
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
1 Raw_Read_Error_Rate POSR-K 200 197 051 - 9
197 Current_Pending_Sector -O--CK 200 200 000 - 1
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 1

You have read errors and a pending sector (write). That disk is no good. You should replace it before it causes problems.
 

runevn

Explorer
Joined
Apr 4, 2019
Messages
63
@sretalla thanks for your respons. I will replace it then. Should I retire the disk completely or can I do something to bring it back into service?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Should I retire the disk completely or can I do something to bring it back into service?
Once a disk is going in that direction, it should not be relied upon for anything important to you.

You could risk using it as an additional backup or something that won't matter if it dies. Usually using an OS other than FreeNAS with a filesystem like ext4 will hide the problem (but risks your data integrity).
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
@sretalla thanks for your respons. I will replace it then. Should I retire the disk completely or can I do something to bring it back into service?
It has failed several long and short smart tests. The disk is toast and I wouldn't use it.
 
Top