nvme missing interrupt

Geek Baba

Explorer
Joined
Sep 2, 2011
Messages
73
Hello! I built a all nvme freenas with the following specs in Sep 2019:
  1. SuperServer 2028U-TN24R4T+
  2. SuperMicro X10DRU-i+
  3. 2 x Intel Xeon E5-2620 v3
  4. 384GB (24 x 16GB) 2RX4 PC4-2133P DDR4 MEMORY
  5. 24 x Intel DC P3600 SSD 800GB NVMe PCIe 3.0
After installing freenas 11.2 I found out that after adding 18th disk, it started to throw "nvme missing interrupt" error. I started to look around and found some know issues with freebsd that was fixed (freebsd bug) so I was hoping that after updating to 11.3 it will go away, however even after updating to freenas 11.3-U1, I still see the issue, I am attaching some of the pictures that I took while it boots up (takes a very long time to boot). I also found that few folks had similar issue when using it virtually which does not apply in my case.

Really lost now as I have looked everywhere and patiently waited for 11.3 but looks like its something else, any help would be very appreciated.

PS: currently its running 16 disks and 2 vdevs and I have not seen any issues, I had to take the other 8 drives out otherwise it keeps throwing the errors and takes 30 mins or more to boot up.
 

Attachments

  • IMG_1907.jpeg
    IMG_1907.jpeg
    342.9 KB · Views: 413
  • IMG_1908.jpeg
    IMG_1908.jpeg
    336.9 KB · Views: 423
  • IMG_1935.jpeg
    IMG_1935.jpeg
    354.2 KB · Views: 412
  • IMG_1936.jpeg
    IMG_1936.jpeg
    346.2 KB · Views: 357
  • IMG_1937.jpeg
    IMG_1937.jpeg
    309 KB · Views: 362
  • IMG_1938.jpeg
    IMG_1938.jpeg
    328.6 KB · Views: 380
Last edited:

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,740

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
24 x Intel DC P3600 SSD 800GB NVMe PCIe 3.0
Those are 4-lane NVME drives (according to this: https://ark.intel.com/content/www/us/en/ark/products/series/81000/intel-ssd-dc-p3600-series.html), so for 24 of them, you would need to have 96 available PCIe lanes in your system...

  1. SuperMicro X10DRU-i+
  2. 2 x Intel Xeon E5-2603 v4

I see each of your CPUs has 40 available PCIe lanes (https://ark.intel.com/content/www/u...-processor-e5-2603-v4-15m-cache-1-70-ghz.html).
So with 2 of those (=40 x 2... 80) you are 16 lanes short in total.

With only 16 NVME drives, you are using 64 lanes, so within your capacity, which is probably why it works.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,740
So you can buy systems that mechanically/electrically take more devices than your CPU can handle? WT...?
 

Geek Baba

Explorer
Joined
Sep 2, 2011
Messages
73
So you can buy systems that mechanically/electrically take more devices than your CPU can handle? WT...?
That was my first reaction but I also noticed I added a "Intel X520-DA2 10Gbe NIC Dual E10G42BTDA" PCI card thats not needed, this system already have 4 10Gig ports, I am going to remove it and check again.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,740
That will probably allow you to use 20 disks. We are stuck with 4, currently. On a completely different platform, though.
 

Geek Baba

Explorer
Joined
Sep 2, 2011
Messages
73
That will probably allow you to use 20 disks. We are stuck with 4, currently. On a completely different platform, though.
I would let you know, I am hoping it to work as the system spec clearly mentions that supports upto 24 nvme drives and motherborad only supports e5-2600 v3/v4 and no processor in that family have more than 40 lanes.

Also what platform are you on thats not supporting more than 4 drives?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,740
The details are in the JIRA issue I linked above.
 

Geek Baba

Explorer
Joined
Sep 2, 2011
Messages
73
Those are 4-lane NVME drives (according to this: https://ark.intel.com/content/www/us/en/ark/products/series/81000/intel-ssd-dc-p3600-series.html), so for 24 of them, you would need to have 96 available PCIe lanes in your system...



I see each of your CPUs has 40 available PCIe lanes (https://ark.intel.com/content/www/u...-processor-e5-2603-v4-15m-cache-1-70-ghz.html).
So with 2 of those (=40 x 2... 80) you are 16 lanes short in total.

With only 16 NVME drives, you are using 64 lanes, so within your capacity, which is probably why it works.
Since E5-2600 v3/v4 only supports upto 40 lanes, this system could never be able to use 24 NVME drives right?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
Since E5-2600 v3/v4 only supports upto 40 lanes, this system could never be able to use 24 NVME drives right?
Unless you have more empty CPU sockets (which I see you don't)... no.

I'm not sure if there are CPUs on that socket platform that support more lanes, but you may find that the lane limit is also related to the chipset on the motherboard.

In general, that many NVME drives will only be possible with a newer generation board and CPU.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
So you can buy systems that mechanically/electrically take more devices than your CPU can handle? WT...?
That's almost always been possible... PCIe x8 and x16 slots often cut back their capacity once you populate other slots (same reason based on lane capacity).
System designers and system builders need to be aligned on how the stuff will be put together.
 

Geek Baba

Explorer
Joined
Sep 2, 2011
Messages
73
Unless you have more empty CPU sockets (which I see you don't)... no.

I'm not sure if there are CPUs on that socket platform that support more lanes, but you may find that the lane limit is also related to the chipset on the motherboard.

In general, that many NVME drives will only be possible with a newer generation board and CPU.
I reached out to Supermicro support and here is their response:
The 2028U has two PLX9765 (PCIe Gen3x16) to support 12x NVMe ports for each CPU.
Your OS kernel might not support it, please try either Centos 7.6 or newer or Ubuntu 18.04 and see if you still have the error messages reporting from kernel. Also, try optimized the BIOS settings.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
Keep in mind that there are NVME products that are x2 rather than x4, so you could conceivably host the 24 lanes required of each CPU in that case, but with the x4 drives you have there will be a lane shortage. Maybe you could convince the drives (or at least some of them) to switch to x2 mode somehow? (Maybe there's something in the BIOS to help with that.
 

Geek Baba

Explorer
Joined
Sep 2, 2011
Messages
73
Keep in mind that there are NVME products that are x2 rather than x4, so you could conceivably host the 24 lanes required of each CPU in that case, but with the x4 drives you have there will be a lane shortage. Maybe you could convince the drives (or at least some of them) to switch to x2 mode somehow? (Maybe there's something in the BIOS to help with that.
I actually looked for the list of NVMe drives and the x2 drives are smaller optane drives (M2), I could not find any enterprise 2.5" drives that were x2. I will continue to work with the Supermicro support to figure out if its possible to do what you suggested (meaning somehow let some drives to switch to x2).
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
The PLX chip should be handling PCIe lane switching duties, so all devices should be allowed x4 to the PLX; lane count off the CPU shouldn't matter here.

Is it possible to try test loading a penguin-based OS just to rule out issues with the FreeBSD NVMe driver?
 

Geek Baba

Explorer
Joined
Sep 2, 2011
Messages
73
The PLX chip should be handling PCIe lane switching duties, so all devices should be allowed x4 to the PLX; lane count off the CPU shouldn't matter here.

Is it possible to try test loading a penguin-based OS just to rule out issues with the FreeBSD NVMe driver?
I am leaning towards the same theory after Supermicro support discussions, I will try to test with latest Ubuntu to confirm this as soon as find some time.
 

Geek Baba

Explorer
Joined
Sep 2, 2011
Messages
73
The PLX chip should be handling PCIe lane switching duties, so all devices should be allowed x4 to the PLX; lane count off the CPU shouldn't matter here.

Is it possible to try test loading a penguin-based OS just to rule out issues with the FreeBSD NVMe driver?
I was finally able to test this using Debian 10 based distribution, works like a charm, moving to Debian now till freebsd nvme drivers are stable and able to scale.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,740
It's running great. Now after the update to TN12.0-U2 even better than before. Insanely fast, perfectly reliable. Some minor issues with the bnxt(4) interface and VLANs but we have a reliable workaround and the problem is most probably in upstream, i.e. FreeBSD.

No we did not connect VMware. This system runs the VMs on local NVMe storage. It replaced VMware ...

Edit: we have 6 Intel SSDPE2KX010T8 in each system now with room for 4 more.
 
Last edited:

NJTech

Cadet
Joined
Feb 21, 2021
Messages
4
It's running great. Now after the update to TN12.0-U2 even better than before. Insanely fast, perfectly reliable. Some minor issues with the bnxt(4) interface and VLANs but we have a reliable workaround and the problem is most probably in upstream, i.e. FreeBSD.

No we did not connect VMware. This system runs the VMs on local NVMe storage. It replaced VMware ...


Wow thats greate to hear man! Im glad it worked out for you. We are going to test some PowerEdge R7525 server with 16 8 NVME drives in two of them, We are going to connect it over to VMware as a data store but i will test out running the VM's local as well. What type of works loads are you running if you dont mind me asking VDI or SQL servers? Also about range of IPOS and performance are you seeing on them? How long have you had it in production? We are looking to host very high performing VDI on these boxes.
 
Top