Unable to write to disks using SAS2008 with an Epyc 3000 system

paxswill

Cadet
Joined
Jul 16, 2019
Messages
5
When originally building the system, I encountered a problem with the mps driver entering a reset loop when trying to use an Dell H200 that I had reflashed to the IT firmware. Thinking it was a problem with how I flashed the card (or the card itself), I bought a Lenovo SAS2008 based card off of eBay that had already been tested and flashed by the seller, but I encountered the same issue. I then tested the cards with Ubuntu on the same hardware, and it worked fine. Figuring it was a product of immature Epyc 3000 support in FreeBSD 11, I decided to try attacking the problem later and directly attached the drives to the four SATA ports on the motherboard.
OS: TrueNAS Core 12.0-RC1
Motherboard: SuperMicro M11SDV-4C-LN4F
CPU: AMD Epyc 3151 (integrated to the motherboard)
RAM: 32GB ECC
Boot drive: 16GB USB stick
Storage: 4x 4TB WD Red (WD40EFRX)
So, with TrueNAS Core 12-RC1 being released, and improved AMD support being one of the features, I decided to try again. Unfortunately the issue is still present, so I filed a bug with FreeBSD (there are detailed debug logs from FreeNAS 11.3-U4.1 attached to that bug report). To make testing easier, I've masked the card from the host system to allow it to be used by VMs (using bhyve PCIe passthrough), as forcing a system reset is annoying and slow. Attaching the card to a VM has revealed something interesting though: while the issue is not present when booted directly into Ubuntu (working defined as able to read and write to disks attached to the card), I am *not* able to access the drive with Debian buster (going to try again with Ubuntu later for confirmation) with this error being printed at boot:
Code:
[  +4.567077] mpt2sas_cm0: _base_spin_on_doorbell_int: failed due to timeout count(10000), int_status(c0000000)!
[  +0.000007] mpt2sas_cm0: doorbell handshake int failed (line=5216)
[  +0.000003] mpt2sas_cm0: _base_get_ioc_facts: handshake failed (r=-14)
[  +0.000056] clocksource: Switched to clocksource tsc
[  +0.000476] mpt2sas_cm0: failure at drivers/scsi/mpt3sas/mpt3sas_scsih.c:10685/_scsih_probe()!
I am also unable to access the card with FreeBSD-CURRENT in a VM. This is suggesting to me that there's a problem somewhere further up than just the mps driver, especially if the error is also present in a Linux VM on a TrueNAS host, perhaps somewhere with how FreeBSD is configuring the PCIe subsystem? This is getting out of my knowledge area though, so does anyone have advice on what else I should try to get this card working with the motherboard?

I have also tried disabling MSI and MSIX interrupts in both FreeBSD/TrueNAS when booted directly into them, as well as disabling MSIX interrupts for the Linux guest (suggested here). Neither had any effect.
 
Top