How to best use 4 NVMe SSDs on my app/storage server?

SuperWhisk · Nov 5, 2022

I have tried to be thorough in my research, but without directly testing all the permutations I’m not sure which would be the best for my use case.

Hardware
Dell R630 server with 2 CPUs, 64GB ECC RAM (32 per CPU), 8x 1TB SATA HDDs (raidz2), 4x 500GB NVMe SSDs, and 2x 120GB SATA SSD boot drives (mirror)

SATA HDDs are connected via an LSI SAS 3008 based HBA card.
Boot SSDs are using the chipset SATA controller.
NVMe drives are on a PCIe 3.0 16x carrier card, with the slot bifurcated 4x4 so each is directly connected to CPU 1. They are Samsung 970 Evo Plus drives with very little writes on them. The server is on a UPS so I am not concerned about the lack of power loss protection on the SSDs.

Goals
My plan is to run SCALE and create a Linux VM pinnned to the second CPU and associated memory to run various applications such as NextCloud and Unifi Controller under Docker or Kubernetes (I want more control than the current plugin/apps interface will allow).
The HDDs in this server will be my primary NAS storage pool hosting all my data from my other computers over NFS, as well as data in NextCloud, also via NFS.
This pool will be regularly replicated to a TrueNAS CORE machine for backup purposes.

With all that background, what would be my best use of these NVMe drives?

From what I have read, there are four primary options (in no particular order):

SLOG. It seems like 500GB would be way too big for SLOG unless I was regularly doing 100+GB writes at a speed much greater than the SATA drives could handle. I only have a gigabit network with no imminent plans to upgrade that so I don’t think I would be able to saturate the SATA drives on sustained writes to the pool since the IO is spread across multiple drives.
500GB could work for L2ARC at the cost of about 10GB of RAM but I don’t know how much benefit it would provide for similar reasons (or if these consumer drives would just die a quick death in that role).
I could create a second pool with some or all of the SSDs and use that for the VM boot drive to improve application disk performance, especially random IO. I could connect this pool to the VM over local iSCSI, but once again, I am not sure if I would actually see any real benefit from that (if I could even make it work).
Finally, I could use some or all of the SSDs as a ZFS Metadata device for the HDD drive pool, to store metadata and smaller files. This seems like it would be most likely to have a impact, by greatly increasing random IO performance on the pool for all use cases, but 500GB also seems rather large for this purpose on a 8x1TB raidZ2 pool.
I could also do some combination of the above, as I have 4 SSDs to work with here. 1x500GB L2ARC, 3x500GB special vdev mirror with hot spare?

morganL · Nov 5, 2022

Typically it depends on your use case..... what workload is most likely to be performance sensitive?

joeschmuck · Nov 5, 2022

My advice, do not install the NVMe drives in that machine. First build up your system install all the software and see how it all works. So far you have not mentioned an application that requires really fast data access.

Do not be fooled that an L2ARC will give you an improvement, in certain situations it will give you an improvement but most of the time here it does not, most of the time it just adds a slight imperceivable delay.

An SLOG could help too, if you are writing a lot of very fast data. Home users in most situations will not benefit form this.

Making a NVMe pool, well you could but unless you need some real speed, it's a waste of the NVMe's, but that is just my opinion.

So build your system, se how it runs. Check the SWAP file and make sure it isn't using any SWAP Space, or not more than a few K bytes. If it's using a lot of SWAP Space then it is running out of RAM. Add more RAM or reduce your applications.

If I had those NVMe's and the PCIe card, I'd place it into my desktop computer and transfer my OS to it and use that to make my desktop a little faster. Basically use it for something that I could actually see a real world benefit. I use a RAM Disk for when I'm compiling data, nothing faster than that.

A last thought... I know people need to use Scale and help mature the VM experience but if you need to run VM's in a very mature environment, ESXi will do that and it too is free. You make a VM for TrueNAS and give it 16GB RAM and 2 CPU threads and let it run. You then create your other VM's in ESXi and all works great. You could use one 120GB SSD as the boot drive into ESXi, then the four NVMe's as dataset drives where you will store your VM's on, and the HDD's would be passed through via the LSI card to the TrueNAS VM. Pretty simple actually and very stable. I don't think I could ever go back to TrueNAS on metal again, except if I only had limited RAM on the machine. My machine has 64GB which is more than enough to do enough VM's to make me happy.

Good luck on your endeavor.

SuperWhisk · Nov 6, 2022

joeschmuck said:
My advice, do not install the NVMe drives in that machine. First build up your system install all the software and see how it all works. So far you have not mentioned an application that requires really fast data access.

This is a fair point, nothing here needs “really fast data access”. Neither does Windows or desktop Linux for that matter. It’s much more about the random IO performance being terrible on spinning drives and their inability to read two locations in parallel. I certainly don’t have as much experience with arrays of drives to know intuitively if it would be “good enough” at random IO.
I have been running all my computers on NVMe storage for so long that I may have just assumed that the spinning drives would not be performant enough, and I got the SSDs for relatively cheap.

joeschmuck said:
So build your system, see how it runs. Check the SWAP file and make sure it isn't using any SWAP Space, or not more than a few K bytes. If it's using a lot of SWAP Space then it is running out of RAM. Add more RAM or reduce your applications.

This is good to know about keeping an eye on SWAP. I would not have known to look at that!

joeschmuck said:
If I had those NVMe's and the PCIe card, I'd place it into my desktop computer and transfer my OS to it and use that to make my desktop a little faster. Basically use it for something that I could actually see a real world benefit.

I already have a 1TB 970 Evo Plus as the boot drive on my desktop and unfortunately consumer desktops don’t have enough PCIe lanes to use 20x for storage and still have a dedicated GPU. (That’s 16x for the 4x4 carrier card, and 4x for the existing dedicated M.2 storage slot on the motherboard). This card does not have any sort of multiplexer. If it only gets an 8x link, only two drives will be active. My desktop motherboard does support the needed bifurcation of the top 16x slot, but the GPU would be stuck with the lower 4x slot that shares bandwidth with the chipset.
I would need a PCIe multiplexer to convert 3.0 16x down to 4.0 8x, which is utterly cost prohibitive.

joeschmuck said:
A last thought... I know people need to use Scale and help mature the VM experience but if you need to run VM's in a very mature environment, ESXi will do that and it too is free. You make a VM for TrueNAS and give it 16GB RAM and 2 CPU threads and let it run. You then create your other VM's in ESXi and all works great. You could use one 120GB SSD as the boot drive into ESXi, then the four NVMe's as dataset drives where you will store your VM's on, and the HDD's would be passed through via the LSI card to the TrueNAS VM. Pretty simple actually and very stable. I don't think I could ever go back to TrueNAS on metal again, except if I only had limited RAM on the machine. My machine has 64GB which is more than enough to do enough VM's to make me happy.

Good luck on your endeavor.

I actually had ESXi running on this machine previously, with TrueNAS CORE in a VM just as you suggest, but decided to move away from that in part because I could not easily have redundant datastore drives, and also because of the changes Broadcom is making to the licensing and just their general attitude.
I have considered going with XCP-NG, but I really wanted to have everything (boot devices included) backed by redundant ZFS pools, which I could not do with XCP-NG either. I could do Linux LVM, but it doesn’t have the same data integrity protections as ZFS and feels very much “less than”. None of the applications I want to run really need the capabilities of a type 1 hypervisor, hence the plan to run it all in docker containers on a single VM within TrueNAS SCALE.

I guess I’ll need to do some experimenting here…
I don’t have any immediate alternative uses for these SSDs that I already paid for and I don’t really want them to just sit around, nor do I want to sell them on the used market, as they are are all Phoenix controller 970’s, which are no longer possible to get since Samsung did a silent component swap which reduced the performance of the new drives in all but very specific workloads.

Stux · Nov 6, 2022

SuperWhisk said:
They are Samsung 970 Evo Plus drives

You do not want to use these drives as SLOG devices.

I would probably use at least a pair of them as a mirror, and then put my VMs on those, or all four as a pair of mirrors.

BUT I would also use an Optane drive as a SLOG... although I don't know if you can even get Optane NVMe drives anymore :(

Its broadly similar to one of my setups, where I have an Optane M2 drive as a slog, and an intention to use a carrier card to get a few more NVMe drives installed as VM storage one day...

I do not have any experience with metadata vdevs, so shan't comment on that :)

Important Announcement for the TrueNAS Community.

How to best use 4 NVMe SSDs on my app/storage server?

SuperWhisk

Dabbler

morganL

Captain Morgan

joeschmuck

Old Man

SuperWhisk

Dabbler

Stux

MVP

Similar threads