Help! build for VM production storage

a.alipio

Cadet
Joined
Jun 5, 2021
Messages
5
Hi.

I have a dell PE T430 with 128GB ram and PERC H730p on HBA MODE with 8x 3.84TB intel SSD DC S4500.

2x 10Gbpe network interfaces.

I intend to get 2x SSDs 120gb for mirror boot and connect directly on sata ports on motherboard.

My real question is about write cache, should i get a pcie board for nvme? What size is recommended for this setup? My motherboard does not allow pcie bifforcation, so if 2 or more nvme is needed, i will put more m.2 pcie cards, they are cheap.

Any tips on this setup? One large z3 pool is the best choice or is there a better build for size x reliability x speed?

This server is for production xenserver virtualization storage services only. I have another truenas with normal sata drives for backup.

Any sugestions are very wellcome.

Found some info on the forum but most are for home lab and etc...

Thanks.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Hi.

I have a dell PE T430 with 128GB ram and PERC H730p on HBA MODE with 8x 3.84TB intel SSD DC S4500.

2x 10Gbpe network interfaces.

I intend to get 2x SSDs 120gb for mirror boot and connect directly on sata ports on motherboard.

My real question is about write cache, should i get a pcie board for nvme? What size is recommended for this setup? My motherboard does not allow pcie bifforcation, so if 2 or more nvme is needed, i will put more m.2 pcie cards, they are cheap.

Any tips on this setup? One large z3 pool is the best choice or is there a better build for size x reliability x speed?

This server is for production xenserver virtualization storage services only. I have another truenas with normal sata drives for backup.

Any sugestions are very wellcome.

Found some info on the forum but most are for home lab and etc...

Thanks.
You'll have better luck if you replace the H730p with an HBA, see:


For FreeNAS/TrueNAS you want a Host Bus Adapter, not a RAID card configured in HBA mode. There are probably other Dell cards you can select that are genuine HBAs.

For VM storage, you'll get much better performance using mirrors instead of RAIDZ3, see:


Also, regarding a ZIL SLOG device, see:


An Intel Optane device is great for this purpose, though there are good M.2 NVMe choices as well.

Good luck!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Wow, thanks @Spearfoot ... that's like a hat trick, all three of my stickies I was going to reply with. :smile:

My real question is about write cache, should i get a pcie board for nvme?

Those two things have nothing to do with each other.

In ZFS, your write cache is main system memory. This is the single and exclusive option for write cache.

You may be laboring under a common misunderstanding of the SLOG/ZIL as some sort of write cache. It is nothing of the sort; as a log device, it always imposes a performance tax over the fastest possible setup which is to force async writes and do nothing else.

For VM storage, that is risky, because if the filer crashes or other bad stuff happens, you can get VM corruption when a virtual machine believes that something has been written to disk, but it is actually lost in-transit to the main pool because the filer crashed. This is exceedingly rare, of course, but if your VM's are important, it is highly suggested to use a high quality SLOG device such as an Intel Optane, which is fast enough that the performance penalty is usually quite modest.

What size is recommended for this setup? My motherboard does not allow pcie bifforcation, so if 2 or more nvme is needed, i will put more m.2 pcie cards, they are cheap.

Just a technology correction issue: even if your board does not support PCIe bifurcation, you can still get the end result by using a PLX switch card like the AOC-SHG3-4M2P card.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
An Intel Optane device is great for this purpose, though there are good M.2 NVMe choices as well.

Good luck!

And Intel Optane comes in M.2 varieties now too ;)

Intel P4801x 100GB is pretty good for up to 1.1GB/s sync writes.
 

a.alipio

Cadet
Joined
Jun 5, 2021
Messages
5
Thanks a lot....


I've reed about that but completely forgot... a lot on my mind in these lats days.... =/

I'm about to order a LSI 9300-8i for pure HBA.... that part is ok....


Also, about ZIL, SLOG and all, I've been reading a lot for the last 6 months, and still have some difficulties to quite understand that.... thinking on disable sync writes (as i did on my bkp TNAS and it was way better) and maybe i will get more ram or leave it like that for now. I'm considering even putting more ram and another CPU in it... have 4x 32GB ECC and can add another 4 for 1 CPU or another 6 for 2x CPU. current CPU is a Xeon E5-2630 v3 8/16.


What I'm really struggling is the design of my pool vdev for VM storage....


With this scenario, this hardware setup.... what would be the suggestion considering security x performance? Can some one give me a kind hand on this?.... Not looking for some one to do it for me, just some more experienced advice on this pool dataset setup, what would be the best choice, so that I won't have to redo it after is in production.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Ooh, a VM build thread, I want in.

The comments about HBAs are correct, and you want to get either the PERC HBA330 (which is fine in its vanilla form) or PERC H330 (and reflash it to an HBA330) if you are looking to stay on the "Official Dell Supported List" - the LSI 9300-8i is fine as well.

For multiple VMs in a production environment there really is only one answer and that is "use mirrors." SSDs in RAIDZ setups is "okay" from a performance perspective but you will still suffer the poor space efficiency from padding blocks, so I leave that tradeoff as "for homelabbers only."

Speaking of production VMs:

thinking on disable sync writes (as i did on my bkp TNAS and it was way better)

Don't do this, for the reasons @jgreco highlighted above about VM corruption in case of a filer crash or hardware failure. You do have your backup array but that should be a last resort restoration option, not a regular fallback. Use quality SLOG devices like the Optane DC series and sync=always here.

For your PE T430 have you updated to the latest BIOS? The 13G servers should support it. Check System BIOS settings -> Integrated Devices and at the very bottom of the list for "Slot Bifurcation" ... although I wouldn't put it past Dell to have enabled it only for their rackmount lines despite the tower systems being just as capable.

PLX chips can add latency (which kills SLOG performance) if you overcommit the PCIe lanes - the Supermicro card there has a PCIe 3.0 x8 link to the board, so don't expect to add more than that number of lanes total across the four M.2 slots - so a pair of Optane DC cards at PCIe 3.0 x4 each and that's all. If it still gives poorer than expected results you may need to use a pair of direct adapters.

I'm considering even putting more ram

Never a bad decision when it comes to VMs. Max out the single socket first and then consider if you want to add the second CPU and DIMMs.
 

a.alipio

Cadet
Joined
Jun 5, 2021
Messages
5
For multiple VMs in a production environment there really is only one answer and that is "use mirrors." SSDs in RAIDZ setups is "okay" from a performance perspective but you will still suffer the poor space efficiency from padding blocks, so I leave that tradeoff as "for homelabbers only."

Understood... so far I'm on to this:

1x HBA LSI 9300-8i
1x CPU E5-2630 v3 and 192GB RAM (max for single CPU)
2x internal SATA SSD 120GB for O.S. mirrored
8x hotswap SATA SSD DC S4500 3.84TB 6Gbps (also mirrored and not RAIDZ) and sync=always.... check...

Now, Optane is a bit hard to find here, specially for short time delivery in Brazil, and our tax laws makes it way expensive for a single buy and not a volume order.


For your PE T430 have you updated to the latest BIOS? The 13G servers should support it. Check System BIOS settings -> Integrated Devices and at the very bottom of the list for "Slot Bifurcation" ... although I wouldn't put it past Dell to have enabled it only for their rackmount lines despite the tower systems being just as capable.

Yes, all up-to-date firmwares, BIOS (2.13.0 at this time), etc... but even if the 13G have support, the 430 and 530 servers (rack or tower) didn't get this feature enabled in BIOS, there are a few discusses over this on dell forums, but Dell didn't provided a final answer yet, so I can't count on that (also checked with dell support and they think this is unlikely to happen).

What are the choices? A regular PCIe gen3 4 or 8x card can do the trick with single nvme m.2 like Samsung EVO 970 PLUS 2280? if yes, what size for nvme is recommended in this case?

Should I repurposed this server and get another for this function? I have 2 equals T430, same specs... also one R440 with basically the same specs but not room for disks (4x 3.5"). Trying to find a good external SAS disk bay shelf but no luck yet that could fit our budget.

Overall, thank you so much, this really cleared a lot for me so far.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Now, Optane is a bit hard to find here, specially for short time delivery in Brazil, and our tax laws makes it way expensive for a single buy and not a volume order.

Optane is really the best option here as most (almost all) "consumer" SSDs are not up to the task for SLOG purposes. See the later benchmarks in the thread here:


You can possibly find an older model like the DC P3700 or other full-sized PCIe card, but for M.2 form factor options you are fairly limited for "good SLOG device." Short term since you are using SSDs for pool vdevs, sync=always would be usable when writing direct to the Intel SSDs. If it was spinning disk I would not recommend it of course.

Is something like the Samsung 983 DCT any easier to obtain? It was benchmarked here (thank you @douglasg for the test results!) and does fairly well at the low recordsizes, but it doesn't have the write endurance of the Optane devices.

 

a.alipio

Cadet
Joined
Jun 5, 2021
Messages
5
Hi, me again...

One of the servers is ok, but without any M.2 with pcie, and no issues but also no outstanding performance either.

Now a new plan for the Xenserver's VM main storage.

Before the hardware part, I'm tending to go to NFS instead of iSCSI, in performance tests on the same pool in truenas, the results were very similar and NFS shares are easier to troubleshoot and deal in case of any problem. Does any one has some points to share over this?

In the new hardware:
Dell R540
2x CPU Xeon Silver 4208 2.1G, 8C/16T
256 GB RAM (4x 64GB RDIMM 3200MT/s Dual Rank)
BOSS controller card + 2 M.2 480GB (NO RAID)
HBA330 12Gbps controller (NO RAID)

2x SSD SATA 480GB Read Intensive 6Gbps to Truenas OS installation
8 or 12x hotswap SATA SSD DC S4500 3.84TB 6Gbps for storage pool


Main question, is this HBA330 ok for truenas? or should I really try to get the HBA LSI 9300-8i?

On the ZFS pool side, if I end up choosing NFS for VMs storage, mirrored is still the best way to go? Better than RAIDZ (2 or 3)? Or it doesen't matter NFS or iSCSI in case of VM storage and is always better mirrored pool?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Hi, me again...

One of the servers is ok, but without any M.2 with pcie, and no issues but also no outstanding performance either.

Now a new plan for the Xenserver's VM main storage.

Before the hardware part, I'm tending to go to NFS instead of iSCSI, in performance tests on the same pool in truenas, the results were very similar and NFS shares are easier to troubleshoot and deal in case of any problem. Does any one has some points to share over this?

It can be a valid point.

In the new hardware:
Dell R540
2x CPU Xeon Silver 4208 2.1G, 8C/16T
256 GB RAM (4x 64GB RDIMM 3200MT/s Dual Rank)
BOSS controller card + 2 M.2 480GB (NO RAID)
HBA330 12Gbps controller (NO RAID)

2x SSD SATA 480GB Read Intensive 6Gbps to Truenas OS installation
8 or 12x hotswap SATA SSD DC S4500 3.84TB 6Gbps for storage pool

Main question, is this HBA330 ok for truenas? or should I really try to get the HBA LSI 9300-8i?

I believe it has been established that the HBA330 can be crossflashed to IT firmware.

On the ZFS pool side, if I end up choosing NFS for VMs storage, mirrored is still the best way to go? Better than RAIDZ (2 or 3)? Or it doesen't matter NFS or iSCSI in case of VM storage and is always better mirrored pool?

This is covered in the "path to success for block storage" article covered above, and also the "why we use" mirrors linked from within that. Please do follow and read up on the links above if you would like complete answers to these questions. These topics are sufficiently complicated that you are highly unlikely to get a complete and accurate answer here in-thread, and because they are commonly asked questions, people have already done the hard work of writing detailed answers.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I believe it has been established that the HBA330 can be crossflashed to IT firmware.
The HBA330 uses only a variant of the standard LSI IT firmware. I don't know of any deep analyses, but the major versions are in sync with the stock LSI firmware and it uses the IT mode driver (mpr on FreeBSD). I strongly suspect that Dell just adds some monitoring/control via the SMBus and calls it a day.
The H330 is basically the same thing, but I think it takes seriously-crippled IR firmware and can be crossflashed with some effort.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Before the hardware part, I'm tending to go to NFS instead of iSCSI, in performance tests on the same pool in truenas, the results were very similar and NFS shares are easier to troubleshoot and deal in case of any problem. Does any one has some points to share over this?

I'm not quite sure of the state of pNFS/NFS 4.1 on TrueNAS at this point, but iSCSI can be set up to use multiple paths easily. That said, several users have reported issues with iSCSI in TN12 to the point of some of them switching to NFS. I've been okay with my admittedly smaller setup, but the plural of "anecdote" is not "data" so if you're seeing negligible differences in your testing (what were the performance tests?) of NFS vs iSCSI and using NFS is significantly easier from a troubleshooting and configuration perspective - go with what works.

Main question, is this HBA330 ok for truenas? or should I really try to get the HBA LSI 9300-8i?

The HBA330 is fine, the H330 is not (unless you crossflash it to the HBA330 as @Ericloewe mentions) so you're all set there.

On the ZFS pool side, if I end up choosing NFS for VMs storage, mirrored is still the best way to go? Better than RAIDZ (2 or 3)? Or it doesen't matter NFS or iSCSI in case of VM storage and is always better mirrored pool?

Yes. NFS used for the purpose of serving VMs is "effectively the same as block storage" from a performance standpoint. One thing I would specifically suggest though is tuning the recordsize on the NFS dataset - the default 128K might cause a lot of read-modify-write operations on small writes, and reducing it to something like 32K could be an improvement in that front. Running some tests on the varying recordsizes with your actual workload will help bear this out, and you can always use a live storage migration (whatever Xen's equivalent to svMotion is) to move them to the fastest result.
 
Top