CDB Errors

swingarm

Dabbler
Joined
Apr 19, 2019
Messages
44
First off my setup:

Case: U-Nas NSC-800
Motherboard: ASRock Rack C236 WSI LGA1151 Mini-ITX
CPU: Intel 7100T i3
Memory: 16GB(2x8) Crucial Ballistix DDR4 Non-ECC
Boot: 2-Sandisk 60GB SSD's
Storage1: 6-4TB Drives(3-WD WD4002FYYZ, 2-WD Se WD4000F9YZ, and 1-WD Re WD4000FYYZ)
Storage2: WD 1TB USB External Drive externally powered
Card: LSI 9211-8i with P20 IT Mode
PSU: 320W 1U
Freenas: 11.2-U4.1

Error I keep getting:
Code:
Jun  1 16:53:17 Server (da1:mps0:0:7:0): Retrying command (per sense data)
Jun  1 16:53:18 Server (da1:mps0:0:7:0): WRITE(10). CDB: 2a 00 00 ce 71 20 00 01 00 00
Jun  1 16:53:18 Server (da1:mps0:0:7:0): CAM status: SCSI Status Error
Jun  1 16:53:18 Server (da1:mps0:0:7:0): SCSI status: Check Condition
Jun  1 16:53:18 Server (da1:mps0:0:7:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)


The error is occurring on the 2 drives I have on the LSI card. No matter which 2 drives I use I get I get the above error always on one of them, either da0 or da1. Smart says all of the drives are good. I have tried 2 LSI cards. I have tried a fan on both of the cards. 2 motherboards(the other one is a ASRock Z170 Gamers Mini-ITX board). I have tried 2 different SFF-8087 cables. I have tried 2 power supplies, the original 320W and a ATX 450W. I have tried a new Freenas installation and then uploading the config multiple times. All of the drive and cpu temps are ~40C or a little less. Drives work fine on the built-in SATA controller.

zpool status:

Code:
zpool status
  pool: Stuff
state: ONLINE
  scan: resilvered 504K in 0 days 00:00:01 with 0 errors on Sat Jun  1 14:27:25 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        Stuff                                           ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/0bb68bd6-622c-11e9-b921-d05099747709  ONLINE       0     0     0
            ada3p2                                      ONLINE       0     0     0
          mirror-1                                      ONLINE       0     0     0
            gptid/0fc260e1-622c-11e9-b921-d05099747709  ONLINE       0     0     0
            gptid/11d42ffe-622c-11e9-b921-d05099747709  ONLINE       0     0     0
          mirror-2                                      ONLINE       0     0     0
            gptid/1444461f-622c-11e9-b921-d05099747709  ONLINE       0     0     0
            gptid/171f74a0-622c-11e9-b921-d05099747709  ONLINE       0     0     0

errors: No known data errors



smartctl -a /dev/da0:

Code:
root@Server:~ # smartctl -a /dev/da0
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Gold
Device Model:     WDC WD4002FYYZ-01B7CB0
Serial Number:    N8GYBAXY
LU WWN Device Id: 5 000cca 244cd5960
Firmware Version: 01.01M02
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jun  1 18:40:13 2019 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  113) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 571) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   136   136   054    Pre-fail  Offline      -       108
  3 Spin_Up_Time            0x0007   150   150   024    Pre-fail  Always       -       368 (Average 343)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       200
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       9738
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       196
192 Power-Off_Retract_Count 0x0032   093   093   000    Old_age   Always       -       8802
193 Load_Cycle_Count        0x0012   093   093   000    Old_age   Always       -       8802
194 Temperature_Celsius     0x0002   100   100   000    Old_age   Always       -       60 (Min/Max 17/60)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       203

SMART Error Log Version: 1
ATA Error Count: 203 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 203 occurred at disk power-on lifetime: 9735 hours (405 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 00 b8 51 54 40 00      00:03:23.598  WRITE FPDMA QUEUED
  61 00 28 b8 56 54 40 00      00:03:23.597  WRITE FPDMA QUEUED
  61 00 20 b8 55 54 40 00      00:03:23.597  WRITE FPDMA QUEUED
  61 00 18 b8 54 54 40 00      00:03:23.597  WRITE FPDMA QUEUED
  61 00 10 b8 53 54 40 00      00:03:23.597  WRITE FPDMA QUEUED

Error 202 occurred at disk power-on lifetime: 9735 hours (405 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 18 b8 51 54 40 00      00:03:23.349  WRITE FPDMA QUEUED
  61 00 40 b8 56 54 40 00      00:03:23.347  WRITE FPDMA QUEUED
  61 00 38 b8 55 54 40 00      00:03:23.347  WRITE FPDMA QUEUED
  61 00 30 b8 54 54 40 00      00:03:23.347  WRITE FPDMA QUEUED
  61 00 28 b8 53 54 40 00      00:03:23.347  WRITE FPDMA QUEUED

Error 201 occurred at disk power-on lifetime: 9735 hours (405 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 10 b8 50 54 40 00      00:03:23.098  WRITE FPDMA QUEUED
  61 00 48 b8 58 54 40 00      00:03:23.097  WRITE FPDMA QUEUED
  61 00 40 b8 56 54 40 00      00:03:23.097  WRITE FPDMA QUEUED
  61 00 38 b8 55 54 40 00      00:03:23.097  WRITE FPDMA QUEUED
  61 00 30 b8 54 54 40 00      00:03:23.097  WRITE FPDMA QUEUED

Error 200 occurred at disk power-on lifetime: 9735 hours (405 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 08 b8 50 54 40 00      00:03:22.348  WRITE FPDMA QUEUED
  61 00 48 b8 59 54 40 00      00:03:22.347  WRITE FPDMA QUEUED
  61 00 40 b8 58 54 40 00      00:03:22.347  WRITE FPDMA QUEUED
  61 00 38 b8 56 54 40 00      00:03:22.347  WRITE FPDMA QUEUED
  61 00 30 b8 55 54 40 00      00:03:22.347  WRITE FPDMA QUEUED

Error 199 occurred at disk power-on lifetime: 9735 hours (405 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 08 b8 58 54 40 00      00:03:22.030  WRITE FPDMA QUEUED
  61 00 58 b8 57 54 40 00      00:03:22.028  WRITE FPDMA QUEUED
  61 00 50 b8 56 54 40 00      00:03:22.028  WRITE FPDMA QUEUED
  61 00 48 b8 55 54 40 00      00:03:22.028  WRITE FPDMA QUEUED
  61 00 40 b8 54 54 40 00      00:03:22.028  WRITE FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      9718         -
# 2  Short offline       Completed without error       00%      9694         -
# 3  Short offline       Completed without error       00%      9670         -
# 4  Short offline       Completed without error       00%      9646         -
# 5  Short offline       Completed without error       00%      9604         -
# 6  Short offline       Completed without error       00%      9558         -
# 7  Short offline       Completed without error       00%      9534         -
# 8  Short offline       Completed without error       00%      9510         -
# 9  Short offline       Completed without error       00%      9454         -
#10  Short offline       Completed without error       00%      9290         -
#11  Short offline       Completed without error       00%      9140         -
#12  Short offline       Completed without error       00%      8972         -
#13  Short offline       Completed without error       00%      8804         -
#14  Short captive       Completed without error       00%      8629         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



smartctl -a /dev/da1

Code:
root@Server:~ # smartctl -a /dev/da1
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Se
Device Model:     WDC WD4000F9YZ-09N20L0
Serial Number:    WD-WCC5D0023623
LU WWN Device Id: 5 0014ee 208ea4f01
Firmware Version: 01.01A01
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jun  1 18:42:37 2019 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (42300) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 457) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   158   149   021    Pre-fail  Always       -       11091
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1489
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   048   048   000    Old_age   Always       -       38513
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1160
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   199   199   000    Old_age   Always       -       836
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       652
194 Temperature_Celsius     0x0022   099   099   000    Old_age   Always       -       53
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   184   000    Old_age   Always       -       159
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 15969 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 15969 occurred at disk power-on lifetime: 16506 hours (687 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 0c 00 00 00 00  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 0c 00 00 00 00 00      01:00:46.084  SET FEATURES [Set transfer mode]
  e5 00 00 00 00 00 00 00      01:00:46.084  CHECK POWER MODE
  ec 00 00 00 00 00 00 00      01:00:46.083  IDENTIFY DEVICE
  ef 03 0c 00 00 00 00 00      01:00:45.834  SET FEATURES [Set transfer mode]
  e5 00 00 00 00 00 00 00      01:00:45.834  CHECK POWER MODE

Error 15968 occurred at disk power-on lifetime: 16506 hours (687 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 0c 00 00 00 00  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 0c 00 00 00 00 00      01:00:45.834  SET FEATURES [Set transfer mode]
  e5 00 00 00 00 00 00 00      01:00:45.834  CHECK POWER MODE
  ec 00 00 00 00 00 00 00      01:00:45.833  IDENTIFY DEVICE
  ef 03 0c 00 00 00 00 00      01:00:45.584  SET FEATURES [Set transfer mode]
  e5 00 00 00 00 00 00 00      01:00:45.584  CHECK POWER MODE

Error 15967 occurred at disk power-on lifetime: 16506 hours (687 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 0c 00 00 00 00  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 0c 00 00 00 00 00      01:00:45.584  SET FEATURES [Set transfer mode]
  e5 00 00 00 00 00 00 00      01:00:45.584  CHECK POWER MODE
  ec 00 00 00 00 00 00 00      01:00:45.584  IDENTIFY DEVICE
  ef 03 0c 00 00 00 00 00      01:00:45.418  SET FEATURES [Set transfer mode]
  e5 00 00 00 00 00 00 00      01:00:45.418  CHECK POWER MODE

Error 15966 occurred at disk power-on lifetime: 16506 hours (687 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 0c 00 00 00 00  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 0c 00 00 00 00 00      01:00:45.418  SET FEATURES [Set transfer mode]
  e5 00 00 00 00 00 00 00      01:00:45.418  CHECK POWER MODE
  ec 00 00 00 00 00 00 00      01:00:45.334  IDENTIFY DEVICE
  ef 03 0c 00 00 00 00 00      01:00:45.085  SET FEATURES [Set transfer mode]
  e5 00 00 00 00 00 00 00      01:00:45.085  CHECK POWER MODE

Error 15965 occurred at disk power-on lifetime: 16506 hours (687 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 0c 00 00 00 00  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 0c 00 00 00 00 00      01:00:45.085  SET FEATURES [Set transfer mode]
  e5 00 00 00 00 00 00 00      01:00:45.085  CHECK POWER MODE
  ec 00 00 00 00 00 00 00      01:00:45.085  IDENTIFY DEVICE
  ef 03 0c 00 00 00 00 00      01:00:45.085  SET FEATURES [Set transfer mode]
  e5 00 00 00 00 00 00 00      01:00:45.085  CHECK POWER MODE

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     38493         -
# 2  Short offline       Completed without error       00%     38469         -
# 3  Short offline       Completed without error       00%     38445         -
# 4  Short offline       Completed without error       00%     38421         -
# 5  Short offline       Completed without error       00%     38378         -
# 6  Short offline       Completed without error       00%     38332         -
# 7  Short offline       Completed without error       00%     38308         -
# 8  Short offline       Completed without error       00%     38284         -
# 9  Short offline       Completed without error       00%     38228         -
#10  Short offline       Completed without error       00%     38063         -
#11  Short offline       Completed without error       00%     37914         -
#12  Short offline       Completed without error       00%     37746         -
#13  Short offline       Completed without error       00%     37578         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



I have spent about a month trying to get this card playing nice which is odd because it seems to work for people on here.
 
Last edited by a moderator:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
The error is occurring on the 2 drives I have on the LSI card.
Why do you only have two drives on the LSI card? What kind of cable are you using? This type of error is usually caused by using a reverse breakout cable when you should be using a forward breakout cable. They look the same but are wired differently.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
PS. Your SMART test logs only show short tests. Be sure to run a long test on all the drives.
 

swingarm

Dabbler
Joined
Apr 19, 2019
Messages
44
1. Only running 2 drives because Freenas was running fine on the built-in ports and just put the extra on the LSI, never thought about using more drives on there because this problem cropped up first.

2. Didn't even know about there were 2 types of breakout cables, good thing I finally decided to post my problem here. Guess I will be buying one.

3. Yeah I thought about running a long test after posting, started it about 45 minutes ago.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

swingarm

Dabbler
Joined
Apr 19, 2019
Messages
44
All the pool drives passed the ths SMART long test. get the forward breakout cable tomorrow.
 

swingarm

Dabbler
Joined
Apr 19, 2019
Messages
44
Well got my cable in and I hooked up the two drives again and after some testing I thought "hey, it's working". I then decided to add 2 more drives to it and started getting the errors again. I then tried the last pair of drives and still got the errors. I know the 2nd LSI card I tried could be bad or the 3rd breakout cable I tried could be bad or I have multiple hard drives that are bad(each of which passed the SMART long test). I didn't try the other PSU yesterday because I frankly just lost total interest in trying to get this fixed, I don't think it's a power issue anyways. To know me is to know I LOVE figuring things out to get them working right again no matter if it makes sense or not, it takes A LOT to get me to lose interest and not want to work on it anymore.

I went with my plan B which was to use all of the motherboard sata ports for the main pool and run the OS off of an external SSD USB drive. I'm not to worried about it failing because I have a config file saved and if there's one positive thing out of this it's total trust in that config file because I've used it off a fresh install many times now.

Also during all of this off the new motherboard I had this problem on cold boots and restarts about 1/3 of the time the motherboard bios would skip the LSI bios which caused problems about 3/4 of the way through the Freenas boot. I tried many bios settings but it kept doing it till the end.

In 2+ months I may try tinkering with it again but for now on to other things.
 
Last edited:

ethereal

Guru
Joined
Sep 10, 2012
Messages
762

swingarm

Dabbler
Joined
Apr 19, 2019
Messages
44
Ok, I hooked up the 450W ATX PSU I had laying around with the LSI card with 4 drives off of it(remaining 2 drives off the motherboard sata), only difference this time is that I'm using a single external USB SSD as a boot drive rather then 2 internal.

Still getting the errors. Does this setup really need a 500W or more PSU? Also this seems odd to me because I calculated my setup to ~400W according to that link.

If I wanted to I could get a 500W 1U Flex PSU for ~$120 but that's no gurantee it would fix the problem. I'm just going back to the plan B setup which seems to working good as well as more simple.
 

Sphinxicus

Dabbler
Joined
Nov 3, 2016
Messages
32
Ok, I hooked up the 450W ATX PSU I had laying around with the LSI card with 4 drives off of it(remaining 2 drives off the motherboard sata), only difference this time is that I'm using a single external USB SSD as a boot drive rather then 2 internal.

Still getting the errors. Does this setup really need a 500W or more PSU? Also this seems odd to me because I calculated my setup to ~400W according to that link.

If I wanted to I could get a 500W 1U Flex PSU for ~$120 but that's no gurantee it would fix the problem. I'm just going back to the plan B setup which seems to working good as well as more simple.

For what its worth - I have received the same error twice on my new build that i am still running badblocks on. All drives have passed long SMART tests. When badblocks is done i will look to swap-out cables. My PSU is definitely not under-powered as it's a 1000w but there could be an issue with the PCI-E port on your motherboard. Have you tried popping in a cheap SATA PCI-E card to see if you have problems with that too? I suspect that that would draw less power than the LSI card. Perhaps the motherboard isn't supplying enough power to the PCI-E slot or there is just a general issue with it?
 

swingarm

Dabbler
Joined
Apr 19, 2019
Messages
44
I do have a Adaptec 1430SA card(4 sata ports) that I could use, maybe I'll try it sometime here in the next week or two.

A comment on your last sentence I have tried 2 different ITX motherboards(both ASRock though) and the odds of both PCI-E slots being underpowered or something wrong with both of them seem kind of high.
 

swingarm

Dabbler
Joined
Apr 19, 2019
Messages
44
Well I had to do some other work on the Freenas server so I thought I would try the Adaptec card. It actually runs great in Freenas with no CDB errors but the big gotcha with it is it takes about 20-25 minutes to boot through the Adaptec 1430SA's bios. I tried different settings in the motherboard bios and Adaptec bios but it still takes a looooooooooooong time to boot.

So I believe there's something going on between the motherboard and LSI card which is odd because I can't find anyone else with this combo having this problem.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

swingarm

Dabbler
Joined
Apr 19, 2019
Messages
44
I can't fix the 20-25 minute boot with it installed but on the plus side it was not getting CDB errors, using the built-in 8 port sata controller on the motherboard instead.

I'm just so bummed that the SAS2008 couldn't work for me, either a defective product(even though I've tried multiples of items) or some sort of incompatibility.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
199 UDMA_CRC_Error_Count 0x0032 200 184 000 Old_age Always - 159
If it is these errors you are having, you have a cabling problem, not a problem with the controller or the drive.
 

swingarm

Dabbler
Joined
Apr 19, 2019
Messages
44
I did try some other cables but only on 2 of the 8 HDD slots, since they are cheap enough I'll get some new ones and see what happens.
 

leonbusch

Cadet
Joined
Oct 24, 2019
Messages
8
Did you solve the problem or at least found out what it was?
I have the same problem for years already and already switched cables, used multiple SATA-Adapters, recommended and unrecommended ones. I have recently bought and installed this card;

Fujitsu 9211-8i IT-mode P20 D2607-A21 LSISAS2008RAID controller card ZFS FreeNAS
Fujitsu 9211-8i IT-mode P20 D2607-A21 LSISAS2008RAID controller card


but it still brings up error those error messages to the point that it completely freezes the system even if this is one of the most recommended cards on this forum. It obviously only freezes if a whole bunch of data gets written to the drives.

I have 7 drives installed:
1x boot ssd, no problem
2x dinosaur HD204UI, no problem
4x WD40EFRX, all the problems

No error with non of the drives when connected directly to the motherboard.

I am using an AsRock Motherboard with j3160.

So two variables are equal:
AsRock Motherboard
WDXXEFRX

Could be that those PCI-e lanes are underpowered, but is it possible to power devices with more power draw with independent 12v and limitless power and does it make any difference? It would be interesting to see and to isolate the problem. Until now I have not seen any reports of similar issues with AsRock motherboards and WD-Drives.

I hope someone still comes by and has an idea on what to (economically) do.

cheers
leon
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,949
Long SAS Cables and SATA drives?
Apparently you are meant to only use 0.5m SAS cables with SATA drives due to signalling issues.

Just a thought
 
Top