LSI HBA SCSI Status Error

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
For three months I've been dealing with an issue with my LSI 2308 onboard HBA. It sometimes records MILLIONS of checksum errors, sometimes has a CAM error and locks up the server install, and often takes the disks on my LSI controller offline.
I've highlighted this in multiple threads, which I'll reference, in case you're really bored. To summarize:
I've replaced SATA cables
I've tested the drives / pool on the primary controller and it generally worked - though CAM still locked up randomly
I sent the motherboard to Supermicro who tested and said the controller is fine.
I've updated to IT firmware 20.00.07.00

My first thread:
https://forums.freenas.org/index.php?threads/volume-disappeared-after-improper-shutdown.61567/

Questioning the disks:
https://forums.freenas.org/index.php?threads/failing-disk-or-sata-cable.61725/#post-441746

Follow-up:
https://forums.freenas.org/index.php?threads/new-user-woes.62019/

I re-assembled the entire machine and started it up today. The drives failed to populate at all. Because of the issues, the left side of my GUI won't generate, and just spins bars. I'm at a serious loss. This is my log output, littered with CAM status errors and SCSI status errors...
Code:
Apr 20 14:06:52 FreeNAS (da2:mps0:0:2:0): CAM status: SCSI Status Error
Apr 20 14:06:52 FreeNAS (da2:mps0:0:2:0): SCSI status: Check Condition
Apr 20 14:06:52 FreeNAS (da2:mps0:0:2:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Apr 20 14:06:52 FreeNAS (da2:mps0:0:2:0): Retrying command (per sense data)
Apr 20 14:06:52 FreeNAS	(da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 08 00 00 20 00 length 16384 SMID 904 terminated ioc 804b loginfo 31120303 scsi 0 state c xfer 0
Apr 20 14:06:52 FreeNAS (da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 08 00 00 20 00
Apr 20 14:06:52 FreeNAS (da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:52 FreeNAS (da2:mps0:0:2:0): Retrying command
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 08 00 00 20 00
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): CAM status: SCSI Status Error
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): SCSI status: Check Condition
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): Retrying command (per sense data)
Apr 20 14:06:53 FreeNAS	(da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 08 00 00 20 00 length 16384 SMID 916 terminated ioc 804b loginfo 31120303 scsi 0 state c xfer 0
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 08 00 00 20 00
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): Error 5, Retries exhausted
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 30 00 00 08 00
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): CAM status: SCSI Status Error
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): SCSI status: Check Condition
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): Retrying command (per sense data)
Apr 20 14:06:53 FreeNAS	(da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 38 00 00 20 00 length 16384 SMID 912 terminated ioc 804b loginfo 31120303 scsi 0 state c xfer 0
Apr 20 14:06:53 FreeNAS	(da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 26 90 00 00 00 10 00 00 length 8192 SMID 901 terminated ioc 804b loginfo 31120303 scsi 0 state c xfer 0
Apr 20 14:06:53 FreeNAS	(da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 28 90 00 00 00 10 00 00 length 8192 SMID 911 terminated ioc 804b log(da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 38 00 00 20 00
Apr 20 14:06:53 FreeNAS info 31120303 scsi 0 state c xfer 0
Apr 20 14:06:53 FreeNAS	(da2:mps0:0:2:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00 length 8192 SMID 900 terminated ioc 804b loginfo 31120303 scsi(da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): Retrying command
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 26 90 00 00 00 10 00 00
Apr 20 14:06:53 FreeNAS 0 state c xfer 0
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): Retrying command
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 28 90 00 00 00 10 00 00
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): Retrying command
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): Retrying command
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): CAM status: SCSI Status Error
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): SCSI status: Check Condition
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): Retrying command (per sense data)
Apr 20 14:06:53 FreeNAS	(da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 38 00 00 20 00 length 16384 SMID 906 terminated ioc 804b loginfo 31120303 scsi 0 state c xfer 0
Apr 20 14:06:53 FreeNAS	(da2:mps0:0:2:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00 length 8192 SMID 895 terminated ioc 804b loginfo 31120303 scsi 0 state c xfer 0
Apr 20 14:06:53 FreeNAS	(da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 28 90 00 00 00 10 00 00 length 8192 SMID 874 terminated ioc 804b log(da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 38 00 00 20 00
Apr 20 14:06:53 FreeNAS info 31120303 scsi 0 state c xfer 0
Apr 20 14:06:53 FreeNAS	(da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 26 90 00 00 00 10 00 00 length 8192 SMID 878 terminated ioc 804b log(da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:53 FreeNAS info 31120303 scsi 0 state c xfer 0
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): Retrying command
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): Retrying command
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 28 90 00 00 00 10 00 00
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): Retrying command
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 26 90 00 00 00 10 00 00
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:53 FreeNAS (da2:mps0:0:2:0): Retrying command
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): CAM status: SCSI Status Error
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): SCSI status: Check Condition
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): Retrying command (per sense data)
Apr 20 14:06:54 FreeNAS	(da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 38 00 00 20 00 length 16384 SMID 919 terminated ioc 804b loginfo 31120303 scsi 0 state c xfer 0
Apr 20 14:06:54 FreeNAS	(da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 28 90 00 00 00 10 00 00 length 8192 SMID 917 terminated ioc 804b loginfo 31120303 scsi 0 state c xfer 0
Apr 20 14:06:54 FreeNAS	(da2:mps0:0:2:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00 length 8192 SMID 920 terminated ioc 804b loginfo 31120303 scsi(da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 38 00 00 20 00
Apr 20 14:06:54 FreeNAS 0 state c xfer 0
Apr 20 14:06:54 FreeNAS	(da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 26 90 00 00 00 10 00 00 length 8192 SMID 899 terminated ioc 804b log(da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): Retrying command
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 28 90 00 00 00 10 00 00
Apr 20 14:06:54 FreeNAS info 31120303 scsi 0 state c xfer 0
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): Retrying command
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): Error 5, Retries exhausted
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 26 90 00 00 00 10 00 00
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): Retrying command
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 26 90 00 00 00 10 00 00
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): CAM status: SCSI Status Error
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): SCSI status: Check Condition
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): Retrying command (per sense data)
Apr 20 14:06:54 FreeNAS	(da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 26 90 00 00 00 10 00 00 length 8192 SMID 909 terminated ioc 804b loginfo 31120303 scsi 0 state c xfer 0
Apr 20 14:06:54 FreeNAS	(da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 28 90 00 00 00 10 00 00 length 8192 SMID 907 terminated ioc 804b loginfo 31120303 scsi 0 state c xfer 0
Apr 20 14:06:54 FreeNAS	(da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 38 00 00 20 00 length 16384 SMID 923 terminated ioc 804b loginfo 31120303 sc(da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 26 90 00 00 00 10 00 00
Apr 20 14:06:54 FreeNAS si 0 state c xfer 0
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): Error 5, Retries exhausted
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 28 90 00 00 00 10 00 00
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): Retrying command
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 38 00 00 20 00
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): Retrying command
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 03 a3 81 28 90 00 00 00 10 00 00
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): CAM status: SCSI Status Error
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): SCSI status: Check Condition
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): Error 6, Retries exhausted
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): Invalidating pack
Apr 20 14:06:54 FreeNAS	(da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 38 00 00 20 00 length 16384 SMID 914 terminated ioc 804b loginfo 31120303 scsi 0 state c xfer 0
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 06 a6 b9 38 00 00 20 00
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): CAM status: CCB request completed with an error
Apr 20 14:06:54 FreeNAS (da2:mps0:0:2:0): Error 5, Retries exhausted
Apr 20 14:07:00 FreeNAS ZFS: vdev state changed, pool_guid=1987451638458025078 vdev_guid=11380074299853842333
Apr 20 14:07:01 FreeNAS mps0: mpssas_prepare_remove: Sending reset for target ID 4
Apr 20 14:07:01 FreeNAS da4 at mps0 bus 0 scbus0 target 4 lun 0
Apr 20 14:07:01 FreeNAS da4: <ATA ST8000VN0022-2EL SC61> s/n ZA19RMDB detached
Apr 20 14:07:01 FreeNAS GEOM_MIRROR: Device swap1: provider da4p1 disconnected.
Apr 20 14:07:01 FreeNAS (da4:mps0:0:4:0): Periph destroyed
Apr 20 14:07:02 FreeNAS	(pass0:mps0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 921 terminated ioc 804b loginfo 31110d00 scsi 0 state c xfer 0
Apr 20 14:07:02 FreeNAS mps0: Unfreezing devq for target ID 4
Apr 20 14:07:03 FreeNAS ZFS: vdev state changed, pool_guid=1987451638458025078 vdev_guid=2760554092486413937
Apr 20 14:07:04 FreeNAS mps0: mpssas_prepare_remove: Sending reset for target ID 1
Apr 20 14:07:04 FreeNAS da1 at mps0 bus 0 scbus0 target 1 lun 0
Apr 20 14:07:04 FreeNAS da1: <ATA ST8000VN0022-2EL SC61> s/n ZA19RMP0 detached
Apr 20 14:07:04 FreeNAS GEOM_MIRROR: Device swap2: provider da1p1 disconnected.
Apr 20 14:07:04 FreeNAS (da1:mps0:0:1:0): Periph destroyed
Apr 20 14:07:04 FreeNAS	(pass0:mps0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 926 terminated ioc 804b loginfo 31110d00 scsi 0 state c xfer 0
Apr 20 14:07:04 FreeNAS	(pass0:mps0:0:0:0): INQUIRY. CDB: 12 00 00 00 40 00 length 64 SMID 918 terminated ioc 804b loginfo 31110d00 scsi 0 state c xfer 0
Apr 20 14:07:06 FreeNAS	(pass2:mps0:0:2:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 928 terminated ioc 804b loginfo 31110d00 scsi 0 state c xfer 0
Apr 20 14:07:06 FreeNAS mps0: Unfreezing devq for target ID 1
Apr 20 14:07:08 FreeNAS	(pass0:mps0:0:0:0): INQUIRY. CDB: 12 00 00 00 40 00 length 64 SMID 908 terminated ioc 804b loginfo 31110d00 scsi 0 state c xfer 0
Apr 20 14:07:09 FreeNAS ZFS: vdev state changed, pool_guid=1987451638458025078 vdev_guid=12391365849909013340
Apr 20 14:07:09 FreeNAS mps0: mpssas_prepare_remove: Sending reset for target ID 0
Apr 20 14:07:09 FreeNAS da0 at mps0 bus 0 scbus0 target 0 lun 0
Apr 20 14:07:09 FreeNAS da0: <ATA ST8000VN0022-2EL SC61> s/n ZA19R9BM detached
Apr 20 14:07:09 FreeNAS (da0:mps0:0:0:0): Periph destroyed
Apr 20 14:07:09 FreeNAS mps0: Unfreezing devq for target ID 0
Apr 20 14:07:11 FreeNAS	(pass2:mps0:0:2:0): INQUIRY. CDB: 12 00 00 00 40 00 length 64 SMID 925 terminated ioc 804b loginfo 31170000 scsi 0 state c xfer 0
Apr 20 14:07:11 FreeNAS	(pass2:mps0:0:2:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 929 terminated ioc 804b loginfo 31170000 scsi 0 state c xfer 0
Apr 20 14:07:11 FreeNAS	(pass2:mps0:0:2:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 915 terminated ioc 804b loginfo 31170000 scsi 0 state c xfer 0
Apr 20 14:07:13 FreeNAS	(pass3:mps0:0:3:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 937 terminated ioc 804b loginfo 31170000 scsi 0 state 0 xfer 0
Apr 20 14:07:13 FreeNAS	(pass2:mps0:0:2:0): INQUIRY. CDB: 12 00 00 00 40 00 length 64 SMID 930 terminated ioc 804b loginfo 31170000 scsi 0 state 0 xfer 0
Apr 20 14:07:13 FreeNAS	(pass2:mps0:0:2:0): INQUIRY. CDB: 12 00 00 00 40 00 length 64 SMID 931 terminated ioc 804b loginfo 31170000 scsi 0 state 0 xfer 0
Apr 20 14:07:13 FreeNAS ZFS: vdev state changed, pool_guid=1987451638458025078 vdev_guid=5839067863432692929
Apr 20 14:07:13 FreeNAS mps0: mpssas_prepare_remove: Sending reset for target ID 3
Apr 20 14:07:13 FreeNAS mps0: mpssas_prepare_remove: Sending reset for target ID 2
Apr 20 14:07:13 FreeNAS da3 at mps0 bus 0 scbus0 target 3 lun 0
Apr 20 14:07:13 FreeNAS da3: <ATA ST8000VN0022-2EL SC61> s/n ZA19MY4K detached
Apr 20 14:07:13 FreeNAS da2 at mps0 bus 0 scbus0 target 2 lun 0
Apr 20 14:07:13 FreeNAS da2: <ATA ST8000VN0022-2EL SC61> s/n ZA19RMRK detached
Apr 20 14:07:13 FreeNAS GEOM_MIRROR: Device swap2: provider da2p1 disconnected.
Apr 20 14:07:13 FreeNAS GEOM_MIRROR: Device swap1: provider da3p1 disconnected.
Apr 20 14:07:13 FreeNAS GEOM_MIRROR: Device swap2: provider destroyed.
Apr 20 14:07:13 FreeNAS GEOM_MIRROR: Device swap2 destroyed.
Apr 20 14:07:13 FreeNAS GEOM_ELI: Device mirror/swap2.eli destroyed.
Apr 20 14:07:13 FreeNAS GEOM_ELI: Detached mirror/swap2.eli on last close.
Apr 20 14:07:13 FreeNAS GEOM_MIRROR: Device swap1: provider destroyed.
Apr 20 14:07:13 FreeNAS GEOM_MIRROR: Device swap1 destroyed.
Apr 20 14:07:13 FreeNAS GEOM_ELI: Device mirror/swap1.eli destroyed.
Apr 20 14:07:13 FreeNAS GEOM_ELI: Detached mirror/swap1.eli on last close.
Apr 20 14:07:13 FreeNAS mps0: Unfreezing devq for target ID 3
Apr 20 14:07:13 FreeNAS mps0: Unfreezing devq for target ID 2
Apr 20 14:07:13 FreeNAS (da3:mps0:0:3:0): Periph destroyed
Apr 20 14:07:13 FreeNAS (da2:mps0:0:2:0): Periph destroyed
Apr 20 14:07:13 FreeNAS ZFS: vdev state changed, pool_guid=1987451638458025078 vdev_guid=13734883185564571814
Apr 20 14:07:17 FreeNAS mps0: IOC Fault 0x40002651, Resetting
Apr 20 14:07:17 FreeNAS mps0: Reinitializing controller,
Apr 20 14:07:17 FreeNAS mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Apr 20 14:07:17 FreeNAS mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Apr 20 14:07:17 FreeNAS mps0: mps_reinit finished sc 0xfffffe000157d000 post 2 free 1
Apr 20 14:07:27 FreeNAS mps0: IOC Fault 0x40002651, Resetting
Apr 20 14:07:27 FreeNAS mps0: Reinitializing controller,
Apr 20 14:07:27 FreeNAS mps0: Portenable NULL reply
Apr 20 14:07:27 FreeNAS mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Apr 20 14:07:27 FreeNAS mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Apr 20 14:07:27 FreeNAS mps0: mps_reinit finished sc 0xfffffe000157d000 post 2 free 1
Apr 20 14:07:31 FreeNAS mps0: IOC Fault 0x40005871, Resetting
Apr 20 14:07:31 FreeNAS mps0: Reinitializing controller,
Apr 20 14:07:31 FreeNAS mps0: Portenable NULL reply
Apr 20 14:07:31 FreeNAS mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Apr 20 14:07:31 FreeNAS mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Apr 20 14:07:31 FreeNAS mps0: mps_reinit finished sc 0xfffffe000157d000 post 2 free 1

 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
I just booted to UEFI shell and manually updated the Controller Address, based upon the sticker printed inside the motherboard. Then I went into BIOS to check things out. I took pictures of the SAS controller version, then went into the disk screen and viewed all of my disks. It initially showed all five, but as I browsed, it started showing only two. Definitely fishy. Here are my pictures, just cause I took them:
HyB9xwFZhT3dfb5kOwPImcI9tydAFQlTSLhJyCR_VkXwblKNY_HrVH-tvMLP4Dv8uxUUIBUIUZ91uDUQu4tT_JfQ855mX7W5vMhQkPowitvpjILy6KpLY8TjSL7DJKsQ1NZxmNiejU5pXyZU7zQqJIVkMqX3QXZrPJ8foup7r9odi3wLpE14yXhu9m_-avkgJNU6El9mJw_rUTB3hmbm4uta0BtUKyj3NqEUipin14hVjF18jfjnWqYrQKZ_yFK0otliuvZGtOZsRDJerK7e1zFi_RlXpgXK3uywcmX-n59wQoBTuar_Sm2BxUv8vc3_aTQKZxuZjGjcgP5trzDaoIEeLLAt--ilphmg31EGygDWrmlVWQ7vKnJKgNn3Xz6v4BTj3m6NPEuUCoE54HOq5zFdaA__qL3RRaPNcG9KQIXH-LM_6Hkmd6uAqpl1YsMoS59hPB9DPmyZvfvcaI-8BFtHZGt6RYrNLK1wLjJMiS9K4AZvAaXwKVq6JWV7z-54bzcl-QEqWUrIyOKmUaCwZPDZkOd7SNiTL7Ss8xEJuFSYw2iNQNamBp-yITvAw6OabpY0s4Qw9BtJx34pkDA13FpdcTMjIE198JWpXnvbvV0ho-Ytkw3YI9TZca-cGxb25ciBrnQX74F9xOiHzfzeJBYwRlSy1QwXbg=w1203-h902-no

ihHFlzQR47dsBzeYaYy6I67DtfdcY2kjjnnD8w5r5p3P69phRd8Le9-HSF0NIR81rLUtAFYs5FzIXGqgqxrmDWnDiNgjfGmmYQgKl8cC3X1h2e-DWx13JFGm0H6tarA6KVQ_WYBlyAtCeD1IsnUC0mgt7FmDMkXwdgOxunJKDSgdpXXyfCwUjylGeAKISZEzvHdfCi5ACBRkP9ZQF3pKIwbNi8E-Qh3tRoNbIXVBAfzDyfCVjBgxKfrK-HqmePMXnUGvWQ-ZI0zJfnSptpUiY62Q5-Kntyb3_OcryJx6Y94kuaMSHWrJqv_OIg7mvTKVvcrQmUsRziIDx4rEgbNRSRqbkVedA421BbZVtX-5GrM58vNrw-Ei-HOaYjA5t0l0kBd__Yej1shClzeERX_OkqZDM7iN63190onKoBknHBNlsSjhW4U_Hy9kcDYbS2xk9W6C80UMX-zBQHmaoOj9CPkcy5xSXL_pAnx3LJjo7fOWNn-7vMzvrKFx5M821XRjFf6zsH9m1en2yiw54-tmMK3nwTCU_o4SFgJy1Y-e7XeYZI2ohGz8LTpgIUJvd2gEedXuCf7CgXodPqPQey0ArxdMKuvK-t_52HSGICkH2YMwyuYeOjYzHCH3UZ1oerIGqZCytIuAPC-ldx1jQhF-6iRB5d58S76g7g=w1203-h902-no

All five disks said the same basic stuff:
o5LvKRiSTJsZqIGd1eldP2mhxaZ-6dSV8dPKsqsRcpXYFOX0jGBkoGyY--INz5Urag_-ZdPdTtkASqAgVG5Hdmnztylll0hEBjm2lS-pcb_ceuCt_In5glupBCb1VXr3umzG-QuVMDyCjLV22vsxWYQrFutW1Eq3PEfnCZ7OsWtXl4rcxPzGehtJfsPaGiNeXcM77Uv99jS4R8iPvjvrr0xLM9mTO7xwmkSDGpQUK-Bj5y8RbLsO9lZnQOn3GlnD3M35A-q6pp88K9yL0s8m6RhC9fAkl9kFHmx3Em7LNfGyygXkM4RxW6_eYdZpwHI_Wg3TsFdF-4BZKOASmo6lrELrE6KXHRtBixLpr7MHcAY_r_PDKZC2cEdunL6BeOnykJfUrcGq8tcT9n_W5asAgjfJS7Eduq62pZ2yBVpeJWeqjgNdywE5ryKRXwgNsmmBXYaudWeQwMGAHsSmdUA8_YunbVIaIvTD0fCchOJeFHLV4NdmX7Aa8qo6hKbA_DRaVnWupXYRREhtt93E1_5VH5c9u2dzC1A_VKa0dFzCsmLWWz9MMq_gWIs2dD-_OZsabSkmf0e30Mezw1F64f8m0wKTBBS61lazpVh1pq9aioRNEu8Q4YyEgLlrwmL-N6TVV4-ehLl9FgzOWAMtmuAVlbww6Vjo5z2n_w=w1203-h902-no
 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
When I rebooted the machine, FreeNAS detected all five disks, and my pool is currently detected. However, I've got a monitor hooked up, and it's showing "CCB request completed with an error":
zrB3Pc9CQywrEQVx8kmpHNuEmyoIWA_ZaqCiafyW_-OLDrwZ18HI4fvyVPAXn-zwifwhoQ7bAsGPROTcD0xr_dHZju9a8fiKEHBUMo5QJF3dTQrstG1OPn0tPDVzRGSUAk3RPBzwC37-CC2EWn5KeaRq2yRkST_jjbKA15lE7xR_8HGi22dL87tNNZG3DDPr9z344pdgzGuk5iXrs5t1tjrUzVrm4XpWUeUxQSLSGGk0Zne4neGflnP8IBJH7854QjU6RpIVlIwau-2qc9i0tQrAJEWw0Ninui5SLLsmg6swX6AJm5dfbv3sh49zymyUvubx29dtk-1Cu3fO4-QB8aoqF7gZswGS5R1B-mJjWsfnrd4ZcfAh31eL4VcW6w0D7CqCae14_5wofKQUwJTY4yryHqZ9QByOP4CTJiHomWadDZPfLyr6c3forzz9L0LcPqGlnETueMckn0a6xJS66Y3NsgjTmQkqV3hpagKPqXFA4scTk9IcPFtZKnYJeo2wivtXdyZGNJv8cazGFIsz-BeHt5gveOeOn1DE78AG0IZIkqh9UFt3Z6guAnEdcVHQOzpHQTMTLWUlvgbX4XpfuT7oxBqX3nLA3elC-dAtUAZF6OjCcNvRbA038Hsx0Clg4P-VxMnawWAqkny1IrOmi7WJILy9Ylv8cA=w1203-h902-no
 

Agi

Dabbler
Joined
Feb 26, 2016
Messages
14
I'm not sure how it's implemented on your system (as it's onboard) but I used to get a bunch of errors like this. My HBA was overheating. So I recommend you ensure it's properly cooled.
Also this could still be a cable/power issue, perhaps try swapping the data cable from da3 to another drive and see if the error follows the cable, or sticks with the drive.

How are the drives connected to the sas controller, are they on backplanes, or just using sata breakouts to the individual drives?
Are you getting any other errors other than the drive located at da3?
 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
They're hooked into the individual sata ports on the motherboard. I could partially believe in the overheating theory, but I'll get errors like this from a cold boot.
I've got a 500w Corsair ps, and reports show the system uses about 190-200w. It has limited connectors, so I'm using two power cables to operate the six drives - with the factory connectors. I hadn't considered power until now, or at least it was always overshadowed by other thoughts.
Is there a way to test for underpowered drives? I've invested enough in this system, and would really like to avoid throwing new components at it without verification. Would or could a new PS go brown rapidly? Everything seemed to be working until the loss of power incident.
I was up for four hours last night reading about different firmware versions, and that 16 was considered the best before, but that 20.00.7 seems to make people happy.
I'm generally good at diagnosing issues, but I'm looking firstly at the HBA, not considering peripherals.
Thanks.
 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
I definitely see where an underpowered drive could cause that drive to go offline. All of my driver's will occasionally go offline, or I'll lose... three...
 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
It also just occurred to me that I tried the drives on the sata controller and had much better luck. I still got some cam status errors, though, but the system stayed up for weeks.
 

Agi

Dabbler
Joined
Feb 26, 2016
Messages
14
It also just occurred to me that I tried the drives on the sata controller and had much better luck. I still got some cam status errors, though, but the system stayed up for weeks.

This here makes it more indicative of a cabling problem then. The fact you're getting errors, even on a different controller essentially rules out the controllers. By swapping the controllers you're disturbing/remaking connections and affecting the signal. Cables are the most common answer in many of these threads, just have a search. So it's why I'm pointing you in this direction.

If you're only getting errors currently on da3, swap the data cable from that drive to another drive and see if it follows the cable. If it stays with the drive then you can know for certain it's either the drive itself, or the power to that drive.

Although your PSU isn't the holy grail of power supply around here, Corsair is still a solid brand and I've never had problems with them in builds before. There is no easy way to diagnose other than to dump another supply in (measuring ripple on a power supply/mins/maxes requires some expensive equipment).


I know you said you've only had this problem since a power cut, I think you shouldn't chase your tail to make this theory work. Instead you should diagnose methodically as it will likely result in answers faster. Somewhere there is an excellent hard drive diagnosis guide on here, outlining what steps to take to determine what's going on.

Edit: found it https://forums.freenas.org/index.ph...bleshooting-guide-all-versions-of-freenas.17/
 
Last edited:

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
Thanks again. This has been frustrating, but I'll get to the bottom of it. I've replaced cases once, but will specifically Target that drive and see if it makes a difference. I read another post with similar problems and it was suggested to eliminate one drive at a time and see if that solves it. I've also seen updated from people who switched to a Linux-based on and the Linux driver, and got rid of errors. But I built this system for FreeNAS, and I'd like to stay here if I can. I'll for the become up and isolate that drive (can't see serial numbers in my case, so a bit tricky). Thanks for your input.
 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
I just read through that hard drive thread, and it gives me hope. I'll try to identify the drive throwing worth and start by replacing it's cable.I After that, I'll disassemble and take additional steps.
 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190

TFAiSO

Dabbler
Joined
Jul 25, 2017
Messages
44
Hi ere109, I am experiencing the exact same issues. Did you resolve the issue? All good now? If so, what did you do to fix it?
I'm quite stressed now as my system is down for maintenance...
 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
Hi ere109, I am experiencing the exact same issues. Did you resolve the issue? All good now? If so, what did you do to fix it?
I'm quite stressed now as my system is down for maintenance...
It ended up being a bad SAS controller. Before confirming that, I had first flashed the newest firmware, and tested my drives on a different controller. I sent the board back, and although they claimed they couldn't replicate the issue, Supermicro replaced it. The new board functioned flawlessly. In order to better preserve the new board, I bought some 40mm fans and mounted them directly on the heatsinks for the LSI controller. If your board is under warranty, it's a big hassle, but worth a try to send it in. If you're out of warranty, go to Ebay and buy a used PCI-e SAS card for $20, and hook your drives up to it - much cheaper than messing around trying to repair onboard.
 

Koguni

Cadet
Joined
Sep 17, 2021
Messages
7
It ended up being a bad SAS controller. Before confirming that, I had first flashed the newest firmware, and tested my drives on a different controller. I sent the board back, and although they claimed they couldn't replicate the issue, Supermicro replaced it. The new board functioned flawlessly. In order to better preserve the new board, I bought some 40mm fans and mounted them directly on the heatsinks for the LSI controller. If your board is under warranty, it's a big hassle, but worth a try to send it in. If you're out of warranty, go to Ebay and buy a used PCI-e SAS card for $20, and hook your drives up to it - much cheaper than messing around trying to repair onboard.
May be it's some unlucky combo of MPT2SAS driver in FreeBSD, certain HDD and certain controller.

Got exactly the same error with two WD DC HC310 6TB (HGST HUS726T6TALE6L4) disks and LSI 9207-8i (SAS2308) controller. Exactly with one 'lower' ID disk and no errors on second one (they are zmirror), started almost immediate after boot. Replaced cables, changed interface on HBA, no effect. Attached it to onboard SATA, no errors. Then, second disk starts generating error. Reattached to onboard SATA, no errors. Third disk is another older HGST, it's still on HBA, no errors. Updated drives firmware and planning to replace 9207 with 9205, then reattach them back to HBA next month, then 9205 will be delivered.

Errors in drive log:

Code:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   130   130   054    Pre-fail  Offline      -       100
  3 Spin_Up_Time            0x0007   147   147   024    Pre-fail  Always       -       407 (Average 411)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       532
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       19874
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       531
192 Power-Off_Retract_Count 0x0032   097   097   000    Old_age   Always       -       4545
193 Load_Cycle_Count        0x0012   097   097   000    Old_age   Always       -       4545
194 Temperature_Celsius     0x0002   150   150   000    Old_age   Always       -       40 (Min/Max 21/57)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       5

Error 5 occurred at disk power-on lifetime: 19817 hours (825 days + 17 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 20 00 06 7a 88 40 00      03:32:40.422  WRITE FPDMA QUEUED
  61 0f 18 66 7a 88 40 00      03:32:40.422  WRITE FPDMA QUEUED
  61 20 10 46 7a 88 40 00      03:32:40.422  WRITE FPDMA QUEUED
  61 20 08 26 7a 88 40 00      03:32:40.422  WRITE FPDMA QUEUED
  61 20 00 df 11 88 40 00      03:32:40.421  WRITE FPDMA QUEUED

Error 4 occurred at disk power-on lifetime: 19816 hours (825 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 20 10 c5 91 88 40 00      02:37:38.648  WRITE FPDMA QUEUED
  61 04 20 05 92 88 40 00      02:37:38.647  WRITE FPDMA QUEUED
  61 20 18 e5 91 88 40 00      02:37:38.647  WRITE FPDMA QUEUED
  61 20 08 a5 91 88 40 00      02:37:38.647  WRITE FPDMA QUEUED
  61 20 00 85 91 88 40 00      02:37:38.647  WRITE FPDMA QUEUED

Error 3 occurred at disk power-on lifetime: 19815 hours (825 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 20 18 86 7b 88 40 00      02:12:36.218  WRITE FPDMA QUEUED
  61 08 20 a6 7b 88 40 00      02:12:36.217  WRITE FPDMA QUEUED
  61 20 10 66 7b 88 40 00      02:12:36.217  WRITE FPDMA QUEUED
  61 20 08 46 7b 88 40 00      02:12:36.217  WRITE FPDMA QUEUED
  61 20 00 26 7b 88 40 00      02:12:36.217  WRITE FPDMA QUEUED

Error 2 occurred at disk power-on lifetime: 19814 hours (825 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 0e 08 bf 3f 39 40 00      01:02:40.652  WRITE FPDMA QUEUED
  61 20 00 9f 3f 39 40 00      01:02:40.651  WRITE FPDMA QUEUED
  61 08 00 1b 3a 39 40 00      01:02:40.651  WRITE FPDMA QUEUED
  61 03 00 66 0b c8 40 00      01:02:40.650  WRITE FPDMA QUEUED
  61 01 00 4c 20 e8 40 00      01:02:40.650  WRITE FPDMA QUEUED
 
Top