SOLVED Pool went OFFLINE after installing 10GbE PCI-E NIC

James

Dabbler
Joined
Apr 11, 2021
Messages
33
Hello, I have been running TrueNas via Proxmox for a while without any issues but encountered a significant problem after installing the Supermicro AOC-STG-i2T add-on network adapter. Upon restarting, TrueNas pool went offline {Data not available / Export or Disconnect warnings]. I tried a couple of commands that I found on this forum to import and check the status of my pool, but nothing worked, as the pool itself was unavailable. Further, none of the hard drives were visible.

I just took a chance by removing the adapter, and after a reboot, TrueNas automatically restored my pool. This would have been a nightmare otherwise. So that being the background, could anyone help me understand what went wrong? The adapter is compatible and was recommended by Supermicro for my motherboard X11SCL-F. The BIOS and SAS/HBA adapter (IT-Mode) are all up-to-date. I would really appreciate some guidance and suggestions on how to install/configure the adapter for seamless integration.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
could anyone help me understand what went wrong?
You haven't shared all the details of your system (see forum rules link above), so any help given will be based on guesses.

I would suggest that you may be running into an issue of PCI generation difference... maybe connecting a PCIe v2.1 device (that NIC) may somehow interfere with your HBA (which as far as I know is PCIe v3).
 

James

Dabbler
Joined
Apr 11, 2021
Messages
33
You haven't shared all the details of your system (see forum rules link above), so any help given will be based on guesses.

I would suggest that you may be running into an issue of PCI generation difference... maybe connecting a PCIe v2.1 device (that NIC) may somehow interfere with your HBA (which as far as I know is PCIe v3).
Thanks for your help, sretalla. Please see the details below.

  • Motherboard make and model - Supermicro X11SCL-F
  • CPU make and model - Xeon E-2146G
  • RAM quantity - 90GB
  • Hard drives, quantity, model numbers, and RAID configuration, including boot drives - 4x
  • Hard disk controllers - AOC-S3008L-L8e (Low profile Gen 3 PCI-E x8)
  • Network cards - AOC-STG-i2T / Rev. 2.01 / PCI-E x8 2.1 (2.5GT/s or 5GT/s) interface
As you guessed, the network card is PCIE 2.1 and HBA is a gen 3 device. Is this combination not possible? I thought PCIE cards are backward compatible and can use any slot. I don't think Supermicro has a same generation network adapter available.
 
Last edited:

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
As you guessed, the network card is PCIE 2.1 and HBA is a gen 3 device. Is this combination not possible? I thought PCIE cards are backward compatible and can use any slot. I don't think Supermicro has a same generation network adapter available.
To the best of my knowledge, PCIe is indeed backwards compatible and can use any slot as long as the physical size fits. I currently have an x4 card plugged into an x16 slot just because that's the only slot I have available and it works fine.
 

James

Dabbler
Joined
Apr 11, 2021
Messages
33
To the best of my knowledge, PCIe is indeed backwards compatible and can use any slot as long as the physical size fits. I currently have an x4 card plugged into an x16 slot just because that's the only slot I have available and it works fine.
Thanks, Whatteva. My motherboard has two x8 slots. I had used one of the slots for the HBA adapter, and I recently installed the network adapter on the second x8 slot. I still have one x16 slot available. I don't know why my pool went offline after the installation and restored it automatically after removing the adapter. It would be great to get 10G NIC working on TrueNas.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
Thanks, Whatteva. My motherboard has two x8 slots. I had used one of the slots for the HBA adapter, and I recently installed the network adapter on the second x8 slot. I still have one x16 slot available. I don't know why my pool went offline after the installation and restored it automatically after removing the adapter. It would be great to get 10G NIC working on TrueNas.
Anything suspicious in /var/log/messages ?
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
You shouldn't need your hard drives for the logs as I'm pretty sure it's stored on the boot pool, which is separate from your data pool.
 

James

Dabbler
Joined
Apr 11, 2021
Messages
33
You shouldn't need your hard drives for the logs as I'm pretty sure it's stored on the boot pool, which is separate from your data pool.
Thanks for your suggestion, is there anything particular in the logs I should be looking for? Most of them are related to some smtp outlook or gethostby*.getanswer: asked for "update-master.ixsystems.com IN AAAA",got type "A" type messages. I did find some error messages, which also mentioned "Intel X540" - let me know if this is helpful.

ix0: allocated for 3 rx queues
kernel: ix0: Ethernet address: ab:2f:6c:a8:43:91
ix0: PCI Express Bus: Speed 5.0GT/s Width x4
ix0: Option ROM V1-b1681-p0 eTrack 0x80000628 PHY FW V286
ix0: Error 2 setting up SR-IOV
ix1: <Intel(R) X540-AT2> mem 0xfd200000-0xfd3fffff,0xfd804000-0xfd807fff irq 11 at device 16.1 on pci0
 

James

Dabbler
Joined
Apr 11, 2021
Messages
33
You haven't shared all the details of your system (see forum rules link above), so any help given will be based on guesses.

I would suggest that you may be running into an issue of PCI generation difference... maybe connecting a PCIe v2.1 device (that NIC) may somehow interfere with your HBA (which as far as I know is PCIe v3).
Hi, any thoughts?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I'm still thinking along the lines of some kind of PCIe conflict...

If it's not a different versions not playing nice with each other, maybe it's an interrupt or lane constraint thing.

I note the error regarding SR-IOV... how is your BIOS set for that?
 

James

Dabbler
Joined
Apr 11, 2021
Messages
33
I'm still thinking along the lines of some kind of PCIe conflict...

If it's not a different versions not playing nice with each other, maybe it's an interrupt or lane constraint thing.

I note the error regarding SR-IOV... how is your BIOS set for that?
I didn't change or find any particular settings on the Supermicro BIOS. What should I be looking for? Separately, I plugged in the adapter on a Windows machine, and it worked after installing Intel pro driver. I am still wondering why would TrueNas pool go offline in the absence of a driver, if that indeed caused an issue. To your point, something is interrupting, perhaps related to Proxmox passthrough or IOMMU settings, which I am not aware of.
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I see here the maximum number of PCIe lanes for that CPU is 16...


I also see the way those lanes are divided up amongst the slots is:

PCI-E
1 PCI-E 3.0 x8 (in x16 slot),
2 PCI-E 3.0 x4 (in x8 slot)
M.2
M.2 Interface: 1 PCI-E 3.0 x4



Maybe with both your cards being x8, you're having some kind of contention for the lanes in a way that's causing the HBA to drop when the NIC has its second port come up? (I think it's 4 lanes per port)

Maybe the HBA can run on 4 lanes, so could go in the x8 slot and the NIC can go in the x16 slot...

Just an idea.
 

James

Dabbler
Joined
Apr 11, 2021
Messages
33
I see here the maximum number of PCIe lanes for that CPU is 16...


I also see the way those lanes are divided up amongst the slots is:

PCI-E
1 PCI-E 3.0 x8 (in x16 slot),
2 PCI-E 3.0 x4 (in x8 slot)
M.2
M.2 Interface: 1 PCI-E 3.0 x4



Maybe with both your cards being x8, you're having some kind of contention for the lanes in a way that's causing the HBA to drop when the NIC has its second port come up? (I think it's 4 lanes per port)

Maybe the HBA can run on 4 lanes, so could go in the x8 slot and the NIC can go in the x16 slot...

Just an idea.
Thanks, sretalla. I'll give a try this afternoon and post an update. Appreciate your time and help..
 

James

Dabbler
Joined
Apr 11, 2021
Messages
33
I see here the maximum number of PCIe lanes for that CPU is 16...


I also see the way those lanes are divided up amongst the slots is:

PCI-E
1 PCI-E 3.0 x8 (in x16 slot),
2 PCI-E 3.0 x4 (in x8 slot)
M.2
M.2 Interface: 1 PCI-E 3.0 x4



Maybe with both your cards being x8, you're having some kind of contention for the lanes in a way that's causing the HBA to drop when the NIC has its second port come up? (I think it's 4 lanes per port)

Maybe the HBA can run on 4 lanes, so could go in the x8 slot and the NIC can go in the x16 slot...

Just an idea.
Hi, I wanted to post an update that switching the slots of both adapters (10G on the x16 slot) resolved the conflict. I am happy to report that the 10G card got installed successfully, and the iperf test came back with a speed of 9.2 Gbits/sec. The only gripe is that even the 10G adapter had to be passed through rather than using it as a virtual adapter (being in the same IOMMU group as the HBA adapter), making the second 10G port useless. Thanks a lot for your help.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Pretty weird scenario, but you're deep in the weeds of virtualization. The LGA1xxx platforms probably haven't been tested/designed with some of this stuff in mind...
I suspect you could fix this with a firmware update, so shoot Supermicro an email and see if they take up the issue, since it's all their gear anyway.
 

James

Dabbler
Joined
Apr 11, 2021
Messages
33
Pretty weird scenario, but you're deep in the weeds of virtualization. The LGA1xxx platforms probably haven't been tested/designed with some of this stuff in mind...
I suspect you could fix this with a firmware update, so shoot Supermicro an email and see if they take up the issue, since it's all their gear anyway.
Yes, it is all supermicro. :smile: Just to be clear, are you referring to the possibility of isolating devices from a certain iommu group through firmware update? I thought it was a limitation by design and without an ACS-enabled processor, it is not possible to achieve it.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
More the slot confusion, I don't know nearly enough about virtualization to speculate on IOMMU groups... Just barely enough to get an X540 to show me VFs that I can then passthrough to guests.
 

James

Dabbler
Joined
Apr 11, 2021
Messages
33
More the slot confusion, I don't know nearly enough about virtualization to speculate on IOMMU groups... Just barely enough to get an X540 to show me VFs that I can then passthrough to guests.
I think the issue was partly due to the pcie id mismatch in the proxmox that conflicted with the initial configuration. I will try switching the adapter back into the x8 slot and post an update if it works.

Update: I switched the adapter back into the x8 slot, and it worked without any issues. I think it was possibly due to the passthrough pcie mismatch that took my pool down. Thanks all for your help.
 
Last edited:
Top