Freenas 11.3 SAS3008 probable driver issue.

Feris

Cadet
Joined
May 17, 2016
Messages
3
Hello
I have made a clean install of version 11.3 on new box, after some time of running i have try to create new pool.
System throw lots of "mprsas_send_abort" and drives discover messages on console than completly hang.
After reboot i was able to repeat operation few times without issue but after short period of time ( like 10 minutes) problem reoccure.

I have suspect hardware issue but version 11.2U7 installed few days ago and currently under burn-in is working completely fine.

HBA is AOC-S3008L-L8e with newest firmware available from Supermicro.
Code:
        Adapter Selected is a Avago SAS: SAS3008(C0)

        Controller Number              : 0
        Controller                     : SAS3008(C0)
        PCI Address                    : 00:01:00:00
        SAS Address                    : 5003048-0-2322-2501
        NVDATA Version (Default)       : 0e.01.30.28
        NVDATA Version (Persistent)    : 0e.01.30.28
        Firmware Product ID            : 0x2221 (IT)
        Firmware Version               : 16.00.01.00
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : LSI3008-IT
        BIOS Version                   : 08.37.00.00
        UEFI BSD Version               : 18.00.00.00
        FCODE Version                  : N/A
        Board Name                     : LSI3008-IT
        Board Assembly                 : N/A
        Board Tracer Number            : N/A


Rest of hardware is:
Code:
Intel E3-v1230 v6
X11SSH-LN4F
32GB of ECC RAM
6x WD RED 6TB
 

vshaulsk

Cadet
Joined
Sep 6, 2018
Messages
9
I had a similar issue with my LSI 9300-8i card. Was running 11.2 since release with no issues, but as soon as I upgraded to 11.3 the system started crashing/rebooting. Sometimes it would take minutes and sometimes it would take hours. My dell IDRAC was reporting failures on the pcie slot of the card. For the time being I removed the card and installed LSI 9271, which has resolved my problems.

The card had version 5 firmware, which I have updated over the weekend. Additionally, I purchased another LSI 9300-8I with the latest firmware.
Need to put the cards back into the system and see if I get the same faults.

One thing which is strange: I have another card in the system LSI 9300-8e, which has the same old rev 5.00 firmware........... no problem with this card.
 

RickM

Cadet
Joined
Jan 24, 2020
Messages
3
I have a similar issue with version 11.3 with a SAS3008 PCI-Express Fusion-MPT SAS-3. I'm running firmware version 15.3IT. It crashes every 2 to 3 days. I did not have any issues running 11.2. Below is the SuperMicro Server Info:

Board Product Name: X11SPH-nCTF
Board Part Num: X11SPH-nCTF
Product PartNum: SSG-5029P-E1CTR12L
Machine model: Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
 

stoffix

Dabbler
Joined
Apr 26, 2013
Messages
20
I have an LSI 9300-8i on the latest firmware, (think it was 16.00.10.00) running with no issues on FreeNas 11.3.
Maybe it’s a firmware/driver issue?
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442

RickM

Cadet
Joined
Jan 24, 2020
Messages
3
I do have version 16 installed. This is the latest version provided to me by SuperMicro.

Adapter Selected is a Avago SAS: SAS3008(C0)
Controller Number : 0
Controller : SAS3008(C0)
NVDATA Version (Default) : 0e.01.30.28
NVDATA Version (Persistent) : 0e.01.30.28
Firmware Product ID : 0x2221 (IT)
Firmware Version : 16.00.01.00
NVDATA Vendor : LSI
NVDATA Product ID : LSI3008-IT
BIOS Version : 08.37.00.00
UEFI BSD Version : 18.00.00.00
FCODE Version : N/A
Board Name : LSI3008-IT
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Folks, please definitely open a trouble ticket and ideally link this forum thread. The more people who chime in with "me too" the more likely it'll be given a little more priority, in my experience.
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442

willuz

Cadet
Joined
Feb 12, 2020
Messages
8
I'm running two servers with 4 LSI 9305 adapters (SAS3224 chips on firmware 16) in each server. The mprsas_send_abort errors only occur on SSD's. None of the spinning discs have reported this error. The second server has only spinning discs and has not reported any errors at all.
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
The mprsas_send_abort errors only occur on SSD's. None of the spinning discs have reported this error.

The server that's having this problem has SSDs on my 9305. My MB with a built in 3008 controller only has spinning disks and doesn't have this problem.
 

Ctiger256

Dabbler
Joined
Nov 18, 2016
Messages
40
My server with similar issues is all SSD. I have three servers with only spinning disks--same controller--and they have no issues.
 

willuz

Cadet
Joined
Feb 12, 2020
Messages
8
So I'm running a test overnight on the server with mixed disks. I detached the SSD pool so it's inactive and I'm replicating the spinning disk pool to the backup server to make sure there is maximum throughput. After 2 hours there have been no errors.

Interesting note is that the spinning disk pool does have SSD L2ARC and SLog disks but they have not had any issues.
 

willuz

Cadet
Joined
Feb 12, 2020
Messages
8
So I'm running a test overnight on the server with mixed disks. I detached the SSD pool so it's inactive and I'm replicating the spinning disk pool to the backup server to make sure there is maximum throughput. After 2 hours there have been no errors.

Interesting note is that the spinning disk pool does have SSD L2ARC and SLog disks but they have not had any issues.

The spinning disk pool ran flawlessly overnight and moved 13TB of data without a single error. This was only a read test since there was no writing to the pool so the SSD cache/log disks were not used. It was crashing every 2 to 3 hours and just ran 24 hours without a crash when the SSD pool was detached. A full test would require me to write to the pool but since there are only 3 SSD's instead of the 30 in the SSD pool there's only a 10% chance of failing in the same time frame.

I have deleted the SSD pool and recreated it. Hopefully I will have more luck with a pool that was created in 11.3.
 

Ctiger256

Dabbler
Joined
Nov 18, 2016
Messages
40
Just had another crash on my all SSD 11.3 system described in the other thread. The full log is attached, and here's what happens right before the crash:

Code:
Feb 15 07:17:55 bob     (pass5:mpr0:0:22:0): LOG SENSE. CDB: 4d 00 2f 00 00 00 00 00 40 00 length 64 SMID 748 Aborting command 0xfffffe000125e340
Feb 15 07:17:55 bob mpr0: Sending reset from mprsas_send_abort for target ID 22
Feb 15 07:17:55 bob     (pass7:mpr0:0:24:0): LOG SENSE. CDB: 4d 00 0d 00 00 00 00 00 40 00 length 64 SMID 727 Aborting command 0xfffffe000125c510
Feb 15 07:17:55 bob mpr0: Sending reset from mprsas_send_abort for target ID 24
Feb 15 07:17:55 bob     (pass2:mpr0:0:19:0): LOG SENSE. CDB: 4d 00 2f 00 00 00 00 00 40 00 length 64 SMID 784 Aborting command 0xfffffe0001261700
Feb 15 07:17:55 bob mpr0: Sending reset from mprsas_send_abort for target ID 19


I've realized I can't downgrade to 11.2 without recreating the pool, which I guess is what I'll have to do if there's no fix for this...
 

Attachments

  • messages.txt
    90.5 KB · Views: 406

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
@Ctiger256 Is it crashing your entire server and forcing a restart? For me, it often just reloads the HBA and renders the disks unusable for about 10 seconds, but I'm also having about 2 spontaneous reboots per day. I thought they were separate problems as I'm on a Ryzen platform and had stability issues with it earlier on 11.2 (that were later resolved).
 

willuz

Cadet
Joined
Feb 12, 2020
Messages
8
The mprsas errors and the reboots are definitely related in my case. As seen above I tested a server that has both an SSD pool and a spinning disk pool. If I detach the SSD pool the spinning disk pool runs flawlessly with no reboots or mprsas errors. I tried deleting and recreating the SSD pool and that didn't help.
 

vshaulsk

Cadet
Joined
Sep 6, 2018
Messages
9
I can confirm that only the SSD pool tied to the LSI 9300-8i HBA is having an issue and in my case causes the entire R720 to restart. I use the R720 w SSD pool for ISCSI shared storage within my proxmox cluster, so the reboot of the SAN is not acceptable. In the mean time I went back to a pair of LSI 9207-8i cards, which do not cause any problems
 

Ctiger256

Dabbler
Joined
Nov 18, 2016
Messages
40
@Ctiger256 Is it crashing your entire server and forcing a restart? For me, it often just reloads the HBA and renders the disks unusable for about 10 seconds, but I'm also having about 2 spontaneous reboots per day. I thought they were separate problems as I'm on a Ryzen platform and had stability issues with it earlier on 11.2 (that were later resolved).

It doesn't always reboot. There are entries in the log that indicate the controller hangs and resets without a restart. I can't isolate why sometimes it leads to a reboot.
 
Top