U.2 NVME FreeNAS build

firesyde424

Contributor
Joined
Mar 5, 2019
Messages
154
We are in the process of working through a new FreeNAS server build for use as a misc, high IO, network storage SAN. We weren't able to find a lot of information on this kind of build so I thought I'd post it here.

  • Dell PowerEdge R7525, 2RU server
    • 2 x AMD Epyc 7542 CPUs - 64 cores total @ 2.9Ghz (Yes, I realize this is massive overkill for a normal FreeNAS, but past experience with disk failures while under high IO load has taught us to go over the top on CPU)
    • 6 x 64GB DDR4 RDIMM - 384GB Total
    • 4x Micron 32GB DDR4 NVDIMM-N Modules
      • Used in two mirrors as SLOG devices
    • 24 x Micron 9300 Pro 15.36TB U.2 NVME
    • 2 x Intel XXV710 Dual Port 10/25Gbe Adapters
    • 2 x 240GB M.2 drives on a BOSS PCI Express card in RAID1
      • Used as a boot drive

Most of this storage will be use via NFS for non-production and dev ESXI container workloads as well as PostgreSQL database storage. The remainder is slated as a place to store our daily backup deltas before they are moved off to mechanical storage.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
4x Micron 32GB DDR4 NVDIMM-N Modules
That stuff works with AMD? I thought it didn't yet.

2 x AMD Epyc 7542 CPUs - 64 cores total @ 2.9Ghz (Yes, I realize this is massive overkill for a normal FreeNAS, but past experience with disk failures while under high IO load has taught us to go over the top on CPU)
I mean, it won't hurt, but it does seem excessive.

  • 2 x 240GB M.2 drives on a BOSS PCI Express card in RAID1
    • Used as a boot drive
I got these quoted at something like 400 bucks with the SSDs the other day. I suspect that a cheap PCIe x8 to 2x M.2 adapter card plus two mid-range NVMe SSDs in M.2 format would work fine. I have heard some noise about Gen 13 Dell servers not booting from NVMe, but I suspect it's more nonsense and clueless users than reality.
Although, now that I think about it, don't you have the rear flex bay option? Two decent SATA SSDs in those would be a better option. Probably cheaper than the BOSS card, too.

One thing I'm wondering about: are the U.2 disks directly-attached to the CPUs? There should be enough lanes for all the SSDs, but that eats up nearly all the available lanes.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
The hardware itself looks good - EPYC gives you tons of PCIe lanes - but there's one big caveat: NVMe hotplug on FreeBSD is still in a strange state. The new nda driver seems to make things "sort-of-work" as compared to the nvd driver used by default in FreeBSD 12.x - details are in this link:


But according to the user here, while a logical detach/attach worked with an nvmecontrol reset afterwards, a physical device removal and reinsertion resulted in no detection even with the reset command issued. The last update was from July though so things may have changed.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Would you even need a slog when using nvme ssd pool?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Would you even need a slog when using nvme ssd pool?
Provided that you have a workload that benefits from low-latency sync writes, yes - and NVDIMMs will still be an order of magnitude faster than regular NAND.
 

firesyde424

Contributor
Joined
Mar 5, 2019
Messages
154
That stuff works with AMD? I thought it didn't yet.


I mean, it won't hurt, but it does seem excessive.


I got these quoted at something like 400 bucks with the SSDs the other day. I suspect that a cheap PCIe x8 to 2x M.2 adapter card plus two mid-range NVMe SSDs in M.2 format would work fine. I have heard some noise about Gen 13 Dell servers not booting from NVMe, but I suspect it's more nonsense and clueless users than reality.
Although, now that I think about it, don't you have the rear flex bay option? Two decent SATA SSDs in those would be a better option. Probably cheaper than the BOSS card, too.

One thing I'm wondering about: are the U.2 disks directly-attached to the CPUs? There should be enough lanes for all the SSDs, but that eats up nearly all the available lanes.


We currently run 5 older R720xd with 60 bay mechanical disk SAS JBODs as either non-production VM storage or backup targets. We have run into issues in the past where even a 20 CPU server @2.8Ghz would max out when faced with the parity calculations of a failed disk while under high load. My thought here is that, with the NVME drives being at least two orders of magnitude faster from an IO perspective, we should also plan for the day when a drive will fail in this system. It may be worth dropping to the 32 CPU variant to save some money.

We already have about a dozen R740xd servers that we use as VMWare vSAN nodes and we have not had any issues booting them from the BOSS card. We did briefly operate a 52 core R740xd with 1TB of RAM as a FreeNAS box in a "why the hell not" scenario, before we installed ESX on it. That was an NVME chassis and it put up some stupid bandwidth\IO numbers when setup in a RAID10 with 12 vdevs. We did not test hotadd but the server doesn't have any raid cards or HBAs other than the BOSS controller itself. I would assume that means the U.2 NVME drives are connected directly to the processors via PCI Express lanes. We did not have any issues booting FreeNAS from the BOSS card.
 

firesyde424

Contributor
Joined
Mar 5, 2019
Messages
154
Would you even need a slog when using nvme ssd pool?
A considerable number of our workloads are very latency sensitive. In our testing, when performing work with sync enabled, we saw almost a 10x improvement in synchronous write IO from our MSSQL and PostgreSQL VMs when using an NVDIMM vs no SLOG device. We noticed an even greater performance margin when using RAM as the SLOG, but the possible risk of data corruption should there be a sudden power outage is generally not worth the additional performance.
 

firesyde424

Contributor
Joined
Mar 5, 2019
Messages
154
The NVDIMMs are not available for purchase from Dell with any AMD chassis, but we were able to confirm with one of our Dell partners who pointed us to the official Dell spec sheet that NVDIMMs are supported. Additionally, if you look through the service manual for the R7525, you will find that the motherboard has an anchoring clip specifically for NVDIMM batteries and there is even a procedure for correctly replacing an NVDIMM.
 

NJTech

Cadet
Joined
Feb 21, 2021
Messages
4
We are in the process of working through a new FreeNAS server build for use as a misc, high IO, network storage SAN. We weren't able to find a lot of information on this kind of build so I thought I'd post it here.

  • Dell PowerEdge R7525, 2RU server
    • 2 x AMD Epyc 7542 CPUs - 64 cores total @ 2.9Ghz (Yes, I realize this is massive overkill for a normal FreeNAS, but past experience with disk failures while under high IO load has taught us to go over the top on CPU)
    • 6 x 64GB DDR4 RDIMM - 384GB Total
    • 4x Micron 32GB DDR4 NVDIMM-N Modules
      • Used in two mirrors as SLOG devices
    • 24 x Micron 9300 Pro 15.36TB U.2 NVME
    • 2 x Intel XXV710 Dual Port 10/25Gbe Adapters
    • 2 x 240GB M.2 drives on a BOSS PCI Express card in RAID1
      • Used as a boot drive

Most of this storage will be use via NFS for non-production and dev ESXI container workloads as well as PostgreSQL database storage. The remainder is slated as a place to store our daily backup deltas before they are moved off to mechanical storage.


Hey how did this project work out? We are in the same exact boat. Looking for some advice here. Thanks!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
I'm kinda curious to know myself... Lots of interesting stuff going on:
  • Epyc (I have an Epyc server at work, but it's running Linux, I'd like to know how FreeBSD gets along with Epyc these days)
  • NVDIMMs (I have some performance-sensitive applications on the horizon and NVDIMMs are a bit too cutting edge for me to just get a few and try them out)
  • Intel 25 Gbe (Network upgrade coming up at work and 25 GbE is cheaper than 10 GbE was a few years ago)
  • 24x NVMe SSDs
One thing I'm wondering about: are the U.2 disks directly-attached to the CPUs? There should be enough lanes for all the SSDs, but that eats up nearly all the available lanes.
To answer this question: They should be. Dell has a number of configurations for the R7525 - in particular, high-speed lanes on the two-socket Epyc platform can be reconfigured for either CPU-CPU communications or for PCIe and Dell allows for both in a single motherboard by using cables for the CPU-CPU option. I think that the 24x NVMe option has some of the U.2 bays connected via those links, allowing direct CPU lanes to all SSDs. I think that's 16 lanes per CPU so the system is something like:
24 x4 PCIe lanes for the SSDs: 96 lanes
2 x48 links between the CPUs: 96 lanes
1 x8 PCIe lanes to the SAS daughtercard: 8 lanes
1 x16 PCIe lanes to the OCP 3.0 NIC slot: 16 lanes
-------------------------------------------------------------------------------
216 lanes used -> 50 PCIe 4.0 lanes available for other cards. That's three x16 slots plus one x8 slot. With some tricks, the eight lanes for the SAS daughtercard can be switched with another x8 slot, since this system doesn't support SAS/SATA (I think). AMD Epyc is just a connectivity monster.
 

Rand

Guru
Joined
Dec 30, 2013
Messages
906
  • NVDIMMs (I have some performance-sensitive applications on the horizon and NVDIMMs are a bit too cutting edge for me to just get a few and try them out)

Actually they have been around for a while now, and might already be on a declining path (deprecated by newer memory tech in the middle future).
Had a discussion with Supermicro recently where they said that they are selling very few of those (-N) and most ppl are looking for Optane Memory now.
O/c Dell/HPE still have them, and o/c are not really comparable (write vs read focus), but in the public eye 32G NVdimm-Ns apparently loose out vs 2556/512G DCPMMs
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
I do mean the newer ones. Now that you mention it, I do remember experimental stuff of varying degrees of crazy around the concept.

I just looked it up and realized that NVDIMM-N refers specifically to the old-school "DRAM plus NAND plus backup battery" stuff. Learn something new every day.
 

firesyde424

Contributor
Joined
Mar 5, 2019
Messages
154
So this worked out really well for the most part. Unfortunately, our testing turned up enough issues with running TrueNAS core on the AMD powered R7525 that we ended up going with an Intel powered R740xd.

We also discovered that hot add support for NVME drives in FreeBSD isn't quite there yet and any changes or drive failures require a reboot. Because of that, we cut the overall size down to 16 drives, with a hot spare.

One thing I will say is make sure that you use TrueNAS core 12.0 or later. Anything earlier than that and you will have issues getting the server to boot if you have more than 12 NVME drives installed.

Another change we made was down to the unexpectedly good performance of the Micron NVME drives. Since we were not able to acquire any NVDIMMs for SLOG, we swapped those out for U.2 Optane drives. In our testing, we discovered that, because the Micron drives were so fast, the Optane drives were bottlenecking sequential read and write speeds. We nearly tripled sequential read speeds when we removed the Optane drives entirely. The Optane drives were still faster from an IO perspective, but we were not anywhere near that which meant the Micron drives were more than adequate.

If there is one downside it is that, because we had to use Intel CPUs, this server is effectively maxed on PCI Express cards due to the lower PCI Express lanes on the Intel CPUs vs AMD CPUs.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
We also discovered that hot add support for NVME drives in FreeBSD isn't quite there yet and any changes or drive failures require a reboot. Because of that, we cut the overall size down to 16 drives, with a hot spare.
I'd heard that one, though I seem to recall that FreeBSD 13 includes a revamp of the NVMe drivers. Haven't investigated the details yet, though.
In our testing, we discovered that, because the Micron drives were so fast, the Optane drives were bottlenecking sequential read and write speeds. We nearly tripled sequential read speeds when we removed the Optane drives entirely
That's interesting, wouldn't have expected that.
 

nasbdh9

Dabbler
Joined
Oct 23, 2020
Messages
17
Please do not mirror NVDIMM devices. It is almost meaningless to do so. For "large capacity" NVDIMM as slog, I prefer to set multiple namespaces. For dual CPUs and four NVDIMM modules, please pay attention to the performance impact of cross-NUMA.
 
Top