Pci slot order?

kspare

Guru
Joined
Feb 19, 2015
Messages
508
With these new meta drives, our pci slots are full.

1 l2arc
2 meta
2 slog
1 40gb dual port network
1 12gb sas

we have both dual and single cpu servers....

so is there an optimal arrangement if the cards to maximize performance?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Sure. Consider the design of the system, and then stuff slots accordingly. This will probably require you to get out your mainboard manual to look at a block diagram of how each system is built.

For example, while you might think "oh my SLOG really needs fastest and mostest PCIe" (heh) the reality is that ZFS probably cannot lock-step things to your SLOG at anywhere near the maximum speed the SLOG device can support. It might actually be your lowest throughput device.

Consider the data flows within the system. Pick the biggest one, which is *probably* the network card in your example, and give it the best slot. You would like something with direct PCIe to the CPU for your most intense workloads. Then work downwards from there.

The throughput available on PCH PCIe based slots will, for example, be shared and somewhat lower than on CPU PCIe based slots. The latency won't be that much worse, though, so attaching your L2ARC devices there could make sense if your L2ARC isn't that stressy and most of your reads are served from primary ARC. However, because SLOG is incredibly sensitive to latency, if you're doing sync writes, you'd probably want to see if you can get SLOG onto CPU PCIe.

There is no one right answer. You need to consider what normally goes on in your systems and optimize for that.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
And then just to make things fun, let's add in the inherent issues of a dual-socket system, where you have to decide which PCIe slots/devices are best assigned to CPU0 vs CPU1 (and then watch NUMA screw it up for you regardless)

Ten-thousand-foot view I would say your order of "most to least important" would be:

Network -> SLOG (if sync) -> HBA -> META -> L2ARC

But as @jgreco points out, you would need to know where your bottlenecks exist, and if there's an easier or more impactful way to solve them. The gains from playing PCIe shuffle might be a few percentage points, in theory, under specific circumstances - spending another few percentage points of the relative system build costs on a previously bottlenecking area might give better results. (eg: "This is for sync writes, I'll grab a faster SLOG" or "let's go with denser RAM and bump the total amount by 50%")
 

kspare

Guru
Joined
Feb 19, 2015
Messages
508
One particular system is a X9DRH-7TF
I have:
2 RMS-200
2 P3700 800GB for Meta
1 P3700 2TB for L2Arc
1 12gb LSI for Disks
and 256 gb of ram.

We use this for terminal servers via nfs.....how would you do it?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Block diagram is on page 1-10 of the manual. I'll go for CPU1/2 notation since that's what's used there.

Both the onboard X540 10GbE and the SAS2208 are tied to CPU1, but if you're using 40GbE and a 12GB HBA that's sadly two slots on your first socket that are "wasted" so to speak.

My thoughts would be:

CPU1 - 40GbE, RMS-200, HBA
CPU2 - RMS-200, all three P3700s

You could put both SLOG devices on CPU1 but then if a thread runs on CPU2 you get nailed with the QPI latency on both of them. Splitting the two across the sockets means at least one is guaranteed local, and I imagine "consistent performance" is more paramount. If I recall you're also pushing a lot of read traffic here so hopefully the writes aren't as huge of a bottleneck.

That's all theorycrafting through. Ideally you'd be able to benchmark before and after the changes.
 

kspare

Guru
Joined
Feb 19, 2015
Messages
508
I ended up doing:
cpu1 - 40gb, hba, 2tb p3700
cpu2 - both rms200 both 800gb p3700

the onboard 10gb is only used to connect to the gui, and the onboard sas2208 is just used for the boot drives, so not much going on there.
 

kspare

Guru
Joined
Feb 19, 2015
Messages
508
Block diagram is on page 1-10 of the manual. I'll go for CPU1/2 notation since that's what's used there.

Both the onboard X540 10GbE and the SAS2208 are tied to CPU1, but if you're using 40GbE and a 12GB HBA that's sadly two slots on your first socket that are "wasted" so to speak.

My thoughts would be:

CPU1 - 40GbE, RMS-200, HBA
CPU2 - RMS-200, all three P3700s

You could put both SLOG devices on CPU1 but then if a thread runs on CPU2 you get nailed with the QPI latency on both of them. Splitting the two across the sockets means at least one is guaranteed local, and I imagine "consistent performance" is more paramount. If I recall you're also pushing a lot of read traffic here so hopefully the writes aren't as huge of a bottleneck.

That's all theorycrafting through. Ideally you'd be able to benchmark before and after the changes.

Wouldn't it make sense to split the meta drives to each cpu as well?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Wouldn't it make sense to split the meta drives to each cpu as well?
Probably. It follows the same logic as the "get consistent latency results from the SLOG devices" path - and if they're mirrored, you can probably avoid some extremely rare edge case where a CPU having a catastrophic failure and losing connection to both metadata devices would be a Bad Thing.
 
Top