LSI SAS3008 HBA issues with one drive port (scbusX target 2)

logan893

Dabbler
Joined
Dec 31, 2015
Messages
44
I'm having some issues with an LSI SAS3008 9340-8i HBA (bought used a while back, relatively new to me). I'm quite sure it is an IBM M1215. I have reflashed the card with the TrueNAS IT FW 16.00.12.00. I also cover PCIe pins B5 and B6 to avoid SMBus errors which otherwise prevents the use of all RAM slots, but removing this mod doesn't change the behavior below.

I was using it for only 4 drives (with "port 0") until recently without issue and now when expanding to 8 drives I run into issues with one specific port. And now re-testing with only the problematic port, it's present with even just a single drive connected to the problematic port.

Narrowing it down, I've arrived at:
Out of ports 0-3 (SAS connector marked "Port 1"), all but port 2 work fine. Port 2 here would also be known as "scbusX target 2" in FreeBSD.
Ports 4-7 (SAS connector marked "Port 0") all work perfectly.

(I don't know why the "Port 0" and "Port 1" markings and their internal port numbering appear swapped compared with how the drives are enumerated in FreeBSD. It all seems backwards, but perhaps this is related to the M1215 FW vs standard LSI FW.)

Port 2 (a.k.a. target 2 or SAS connector marked "Port 1", 3rd drive) seems to not work reliably at all, though with some drives it seems able to detect the drive but not read from it. If a drive is detected then I get a bunch of errors when trying to read anything. Putting an SSD (Samsung SM863, I have several that work fine on other ports) on the problematic port, it is consistently not detected nor given any drive name in TrueNAS. If however I put an HDD in the slot (tried both 3.5" and 2.5" HDDs, 5.4k and 7.2k speed), it is sometimes detected (especially if connected during boot) but even if detected the system ends up giving me errors and the drive is impossible to read from or interact with (see below for typical errors when trying to run any reads from within TrueNAS).

The problem stays with only this one specific port and connector of the HBA even when swapping around cables and drives.


Do I have a bad port on the HBA or could there be something else wrong, perhaps something went wrong with the flashing?

Found a thread here that is a bit worrying but with no conclusion, and my issue seems more limited than this. The FreeBSD bug mentioned is still open, though. https://www.truenas.com/community/threads/misadventures-with-lsi-sas3008-cards.94677/

I had TrueNAS 12.0-U8 currently as a VM in Proxmox 7.1-10 with PCI passthrough for the HBA.
Tried spinning up the installers for both TrueNAS 12.0-U8 and TrueNAS Scale 22.02.0.1 on baremetal and they too behave the same with drive detection, so it is not only FreeBSD.

Here is the HBA firmware output, if that matters.

root@truenas[~]# sas3flash -list
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02)
Copyright 2008-2017 Avago Technologies. All rights reserved.

Adapter Selected is a Avago SAS: SAS3008(C0)

Controller Number : 0
Controller : SAS3008(C0)
PCI Address : 00:01:00:00
SAS Address : 500605b-0-0ce2-d7a0
NVDATA Version (Default) : 0e.01.00.07
NVDATA Version (Persistent) : 0e.01.00.07
Firmware Product ID : 0x2221 (IT)
Firmware Version : 16.00.12.00
NVDATA Vendor : LSI
NVDATA Product ID : SAS9300-8i
BIOS Version : 08.37.00.00
UEFI BSD Version : 18.00.00.00
FCODE Version : N/A
Board Name : SAS9300-8i
Board Assembly : N/A
Board Tracer Number : N/A

Finished Processing Commands Successfully.
Exiting SAS3Flash.
root@truenas[~]#

Errors I see if the drive on the affected port is at all detected and I try to read from it.

May 13 23:52:47 truenas (da2:mpr0:0:2:0): READ(10). CDB: 28 00 00 00 08 00 00 08 00 00
May 13 23:52:47 truenas (da2:mpr0:0:2:0): CAM status: SCSI Status Error
May 13 23:52:47 truenas (da2:mpr0:0:2:0): SCSI status: Check Condition
May 13 23:52:47 truenas (da2:mpr0:0:2:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
May 13 23:52:47 truenas (da2:mpr0:0:2:0): Retrying command (per sense data)
May 13 23:52:47 truenas (da2:mpr0:0:2:0): READ(10). CDB: 28 00 00 00 08 00 00 08 00 00
May 13 23:52:47 truenas (da2:mpr0:0:2:0): CAM status: SCSI Status Error
May 13 23:52:47 truenas (da2:mpr0:0:2:0): SCSI status: Check Condition
May 13 23:52:47 truenas (da2:mpr0:0:2:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
May 13 23:52:47 truenas (da2:mpr0:0:2:0): Retrying command (per sense data)
May 13 23:52:47 truenas (da2:mpr0:0:2:0): READ(10). CDB: 28 00 00 00 08 00 00 08 00 00
May 13 23:52:47 truenas (da2:mpr0:0:2:0): CAM status: SCSI Status Error
May 13 23:52:47 truenas (da2:mpr0:0:2:0): SCSI status: Check Condition
May 13 23:52:47 truenas (da2:mpr0:0:2:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
May 13 23:52:47 truenas (da2:mpr0:0:2:0): Retrying command (per sense data)
May 13 23:52:47 truenas (da2:mpr0:0:2:0): READ(10). CDB: 28 00 00 00 08 00 00 08 00 00
May 13 23:52:47 truenas (da2:mpr0:0:2:0): CAM status: SCSI Status Error
May 13 23:52:47 truenas (da2:mpr0:0:2:0): SCSI status: Check Condition
May 13 23:52:47 truenas (da2:mpr0:0:2:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
May 13 23:52:47 truenas (da2:mpr0:0:2:0): Retrying command (per sense data)
May 13 23:52:47 truenas (da2:mpr0:0:2:0): READ(10). CDB: 28 00 00 00 08 00 00 08 00 00
May 13 23:52:47 truenas (da2:mpr0:0:2:0): CAM status: SCSI Status Error
May 13 23:52:47 truenas (da2:mpr0:0:2:0): SCSI status: Check Condition
May 13 23:52:47 truenas (da2:mpr0:0:2:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
May 13 23:52:47 truenas (da2:mpr0:0:2:0): Error 5, Retries exhausted
 

neofusion

Contributor
Joined
Apr 2, 2022
Messages
159
The HBA is certainly suspect.
I experienced something similar with a Fujitsu CP400i and after ruling out the cable and drives eventually tried a different HBA (same maker and model, slightly different revision). After the swap+flash everything worked perfectly.

I suppose you could try reflashing, maybe also making sure you have as up-to-date tools as possible.
 

demon

Contributor
Joined
Dec 6, 2014
Messages
117
Could the cabling be the issue? Or possibly the backplane? The iuCRC errors make me think you might have a flaky miniSAS cable causing the problem.
 

logan893

Dabbler
Joined
Dec 31, 2015
Messages
44
Could the cabling be the issue? Or possibly the backplane? The iuCRC errors make me think you might have a flaky miniSAS cable causing the problem.
I'd say that I've ruled out any cable or backplane issues. Problem stays with the specific drive slot. Three cables and three backplanes work flawlessly with all four drive slots from "port 0", and all three sets have issues only with the one and same drive slot on "port 1".
 
Top