Pci slot order?

kspare · Sep 14, 2020

With these new meta drives, our pci slots are full.

1 l2arc
2 meta
2 slog
1 40gb dual port network
1 12gb sas

we have both dual and single cpu servers....

so is there an optimal arrangement if the cards to maximize performance?

jgreco · Sep 14, 2020

Sure. Consider the design of the system, and then stuff slots accordingly. This will probably require you to get out your mainboard manual to look at a block diagram of how each system is built.

For example, while you might think "oh my SLOG really needs fastest and mostest PCIe" (heh) the reality is that ZFS probably cannot lock-step things to your SLOG at anywhere near the maximum speed the SLOG device can support. It might actually be your lowest throughput device.

Consider the data flows within the system. Pick the biggest one, which is *probably* the network card in your example, and give it the best slot. You would like something with direct PCIe to the CPU for your most intense workloads. Then work downwards from there.

The throughput available on PCH PCIe based slots will, for example, be shared and somewhat lower than on CPU PCIe based slots. The latency won't be that much worse, though, so attaching your L2ARC devices there could make sense if your L2ARC isn't that stressy and most of your reads are served from primary ARC. However, because SLOG is incredibly sensitive to latency, if you're doing sync writes, you'd probably want to see if you can get SLOG onto CPU PCIe.

There is no one right answer. You need to consider what normally goes on in your systems and optimize for that.

HoneyBadger · Sep 14, 2020

And then just to make things fun, let's add in the inherent issues of a dual-socket system, where you have to decide which PCIe slots/devices are best assigned to CPU0 vs CPU1 (and then watch NUMA screw it up for you regardless)

Ten-thousand-foot view I would say your order of "most to least important" would be:

Network -> SLOG (if sync) -> HBA -> META -> L2ARC

But as @jgreco points out, you would need to know where your bottlenecks exist, and if there's an easier or more impactful way to solve them. The gains from playing PCIe shuffle might be a few percentage points, in theory, under specific circumstances - spending another few percentage points of the relative system build costs on a previously bottlenecking area might give better results. (eg: "This is for sync writes, I'll grab a faster SLOG" or "let's go with denser RAM and bump the total amount by 50%")

kspare · Sep 14, 2020

One particular system is a X9DRH-7TF
I have:
2 RMS-200
2 P3700 800GB for Meta
1 P3700 2TB for L2Arc
1 12gb LSI for Disks
and 256 gb of ram.

We use this for terminal servers via nfs.....how would you do it?

HoneyBadger · Sep 14, 2020

Block diagram is on page 1-10 of the manual. I'll go for CPU1/2 notation since that's what's used there.

Both the onboard X540 10GbE and the SAS2208 are tied to CPU1, but if you're using 40GbE and a 12GB HBA that's sadly two slots on your first socket that are "wasted" so to speak.

My thoughts would be:

CPU1 - 40GbE, RMS-200, HBA
CPU2 - RMS-200, all three P3700s

You could put both SLOG devices on CPU1 but then if a thread runs on CPU2 you get nailed with the QPI latency on both of them. Splitting the two across the sockets means at least one is guaranteed local, and I imagine "consistent performance" is more paramount. If I recall you're also pushing a lot of read traffic here so hopefully the writes aren't as huge of a bottleneck.

That's all theorycrafting through. Ideally you'd be able to benchmark before and after the changes.

kspare · Sep 14, 2020

I ended up doing:
cpu1 - 40gb, hba, 2tb p3700
cpu2 - both rms200 both 800gb p3700

the onboard 10gb is only used to connect to the gui, and the onboard sas2208 is just used for the boot drives, so not much going on there.

kspare · Sep 14, 2020

Heres another board we use:

https://www.supermicro.com/manuals/motherboard/C612/MNL-1585.pdf

all the same hardware.

There are 2 x4 slots, put the rms200s in there?

kspare · Sep 14, 2020

HoneyBadger said:
Block diagram is on page 1-10 of the manual. I'll go for CPU1/2 notation since that's what's used there.

Both the onboard X540 10GbE and the SAS2208 are tied to CPU1, but if you're using 40GbE and a 12GB HBA that's sadly two slots on your first socket that are "wasted" so to speak.

My thoughts would be:

CPU1 - 40GbE, RMS-200, HBA
CPU2 - RMS-200, all three P3700s

You could put both SLOG devices on CPU1 but then if a thread runs on CPU2 you get nailed with the QPI latency on both of them. Splitting the two across the sockets means at least one is guaranteed local, and I imagine "consistent performance" is more paramount. If I recall you're also pushing a lot of read traffic here so hopefully the writes aren't as huge of a bottleneck.

That's all theorycrafting through. Ideally you'd be able to benchmark before and after the changes.

Wouldn't it make sense to split the meta drives to each cpu as well?

HoneyBadger · Sep 14, 2020

kspare said:
Wouldn't it make sense to split the meta drives to each cpu as well?

Probably. It follows the same logic as the "get consistent latency results from the SLOG devices" path - and if they're mirrored, you can probably avoid some extremely rare edge case where a CPU having a catastrophic failure and losing connection to both metadata devices would be a Bad Thing.

Important Announcement for the TrueNAS Community.

Pci slot order?

kspare

Guru

jgreco

Resident Grinch

HoneyBadger

actually does care

kspare

Guru

HoneyBadger

actually does care

kspare

Guru

kspare

Guru

kspare

Guru

HoneyBadger

actually does care

Similar threads

Important Announcement for the TrueNAS Community.

Pci slot order?

Guru

Resident Grinch

actually does care

Guru

actually does care

Guru

Guru

Guru

actually does care

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Pci slot order?"

Similar threads