[---- 2024/01/16: Still relevant. Virtualization really doesn't change much. Updates made as appropriate. ----]
[---- 2018/02/27: This is still as relevant as ever. As PCIe-Passthru has matured, fewer problems are reported. I've updated some specific things known to be problematic ----]
[---- 2014/12/24: Note, there is another post discussing how to deploy a small TrueNAS VM instance for basic file sharing (small office, documents, scratch space). THIS post is aimed at people wanting to use TrueNAS to manage lots of storage space. ----]
You need to read "Please do not run FreeNAS in production as a Virtual Machine!" ... and then not read the remainder of this. You will be saner and safer for having stopped.
<the rest of this is intended as a starting point to be filled in further>
But there are some of you who insist on blindly charging forward. I'm among you, and there are others. So here's how you can successfully virtualize TrueNAS, less-dangerously, with a primary emphasis on being able to recover your data when something inevitably fscks up. And remember, something will inevitably fsck up, and then you have to figure out how to recover. Best to have thought about it ahead of time.
Now at this point, if ESXi were to blow up, you can still bring the TrueNAS back online with a USB key of TrueNAS, and a copy of your configuration. This is really the point I'm trying to make: this should be THE most important quality you look for in a virtualized TrueNAS, the ability to just stick in a USB key and get on with it all if there's a virtualization issue. Your data is still there, in a form that could easily be moved to another machine if need be, without any major complicating factors.
But, some warnings:
[---- 2018/02/27: This is still as relevant as ever. As PCIe-Passthru has matured, fewer problems are reported. I've updated some specific things known to be problematic ----]
[---- 2014/12/24: Note, there is another post discussing how to deploy a small TrueNAS VM instance for basic file sharing (small office, documents, scratch space). THIS post is aimed at people wanting to use TrueNAS to manage lots of storage space. ----]
You need to read "Please do not run FreeNAS in production as a Virtual Machine!" ... and then not read the remainder of this. You will be saner and safer for having stopped.
<the rest of this is intended as a starting point to be filled in further>
But there are some of you who insist on blindly charging forward. I'm among you, and there are others. So here's how you can successfully virtualize TrueNAS, less-dangerously, with a primary emphasis on being able to recover your data when something inevitably fscks up. And remember, something will inevitably fsck up, and then you have to figure out how to recover. Best to have thought about it ahead of time.
- Pick a virtualization platform that is suitable to the task. You want a bare metal, or "Type 1," hypervisor. Things like VirtualBox, VMware Fusion, VMware Workstation, etc. are not acceptable.
VMware ESXi is suitable to the task.
Hyper-V is not suitable for the task. It spent a good bit of its lifetime as unable to support FreeBSD or Linux, and its PCIe-Passthrough support has not been shown to be sufficiently stable.
I am not aware of specific issues that would prevent Xen or its descendant XCP-ng from being suitable. KVM and its most common implementation Proxmox are plausible. All of these options, including ESXi, are NOT blanket approvals; you still need perfect PCI-Passthrough support.
- Pick a server platform with specific support for hardware virtualization with PCI-Passthrough. Most of Intel's Xeon family supports VT-d, and generally users have had good success with most recent Intel and Supermicro server grade boards. Other boards may claim to support PCI-Passthrough, but quite frankly it is an esoteric feature and the likelihood that a consumer or prosumer board manufacturer will have spent significant time on the feature is questionable. Pick a manufacturer whose support people don't think "server" means the guy who brings your food at the restaurant.
This ends up involving the CPU, mainboard, and BIOS. Some rough guidelines now that we're in 2024 can be summed up as such:
A) Gear older than Sandy Bridge (meaning Nehalem, Westmere, and older) typically has dodgy PCIe support and shouldn't be used.
B) Supermicro's "X9" (Sandy and Ivy Bridge) server boards typically work okay for PCIe-Passthrough with ESXi but not always for other hypervisors.
C) Supermicro's "X10" (Broadwell and Haswell) server boards are better, and work with more hypervisors for PCIe-Passthrough.
D) Newer CPU's typically work fine. Even on Proxmox, but beware that the Proxmox folks still list their PCIe-Passthrough support as "experimental". When a developer tells you something, believe it.
You will actually want to carefully research compatibility prior to making a decision and prior to making a purchase. Once you've purchased a marginal board, you can spend a lot of time and effort trying to figure out the gremlins. This is not fun or productive. Pay particular attention to the reports of success or failure that other ESXi users have had with VT-d on your board of choice. Google is your friend.
Older boards utilizing Supermicro X8* or Intel 5500/5600 CPU's and prior are expected to have significant issues, some of which are fairly intermittent, and may not bite you for weeks or months. All of the boards that have been part of the forum recommended hardware series seem to work very well for virtualization.
- Do NOT use VMware Raw Device Mapping. This is the crazy train to numerous problems and issues. You will reasonably expect that this ought to be a straightforward, sensible solution, but it isn't. The forums have seen too many users crying over their shattered and irretrievable bits. And yes, I know it "works great for you," which seems to be the way it goes for everyone until a mapping goes wrong somehow and the house of cards falls. Along the way, you've probably lost the ability to monitor SMART and other drive health indicators as well, so you may not see the iceberg dead ahead.
- DO NOT use hard drive passthrough (you Proxmox guys in particular) to get around a lack of decent PCIe-Passthrough support on your platform. It will seem to work fine, but ZFS expects to be able to strictly control write ordering and cache flushing towards the drives, so it will seem to work fine until suddenly it doesn't. ZFS does not have a "fsck" or "chkdsk" to correct errors introduced into a pool. People have lost data doing this. It's dangerous.
- DO use PCI-Passthrough for a decent SATA controller or HBA. We've used PCI-Passthrough with the onboard SAS/SATA controllers on mainboards, and as another option, LSI controllers usually pass through fine. Get a nice M1015 in IT mode if need be. Note that you may need to twiddle with setting hw.pci.enable_msi/msix to make interrupt storms stop. Some PCH AHCI's ("onboard SATA") and SCU's ("onboard SAS/SATA") work. Tylersburg does not work reliably. I've seen Patsburg and Cougar Point work fine on at least some Supermicro boards, but had reports of trouble with the ASUS board. The Ivy Bridge CPU era is the approximate tipping point where things went from "lots of stuff does not to work" and began to favor "likely to work."
- Try to pick a board with em-based network interfaces. While not strictly necessary, the capability to have the same interfaces for both virtual and bare metal installs makes recovery easier. Much easier.
Now at this point, if ESXi were to blow up, you can still bring the TrueNAS back online with a USB key of TrueNAS, and a copy of your configuration. This is really the point I'm trying to make: this should be THE most important quality you look for in a virtualized TrueNAS, the ability to just stick in a USB key and get on with it all if there's a virtualization issue. Your data is still there, in a form that could easily be moved to another machine if need be, without any major complicating factors.
But, some warnings:
- Test, test, and then test some more. Do not assume that "it saw my disks on a PCI-Passthru'd controller" is sufficient proof that your PCI-Passthrough is sufficient and stable. We often test even stuff we expect to work fine for weeks or months prior to releasing it for production.
- As tempting as it is to under-resource TrueNAS, do try to aggressively allocate resources to TrueNAS, both memory and CPU. Do not go too crazy on CPU though, allocate as demand is demonstrated.
- Make sure your virtualization environment has reserved resources, specifically including all memory, for TrueNAS. There is absolutely no value to allowing your virtualization environment to swap the TrueNAS VM.
- Do not try to have the virtualization host mount the TrueNAS-in-a-VM for "extra VM storage" (also known as "hyperconverged storage"). Back in the old days, I said: "This won't work, or at least it won't work well, because when the virtualization host is booting, it most likely wants to mount all its datastores before it begins launching VM's." This may not be true any longer; if it works the way you want it to, great, go ahead, it is not dangerous to your ZFS or your data. It's mostly dangerous to your sanity, trying to get things to boot in the needed order.
--update-- ESXi 5.5 appears to support rudimentary tiered dependencies, meaning you should be able to get ESXi to boot a TrueNAS VM first.
- Test all the same things, like drive replacement and resilvering, that you would for a bare metal TrueNAS implementation.
- Have a formalized system for storing the current configuration automatically, preferably to the pool. Several forum members have offered scripts of varying complexity for this sort of thing. This makes restoration of service substantially easier.
- Since you lack a USB drive key, strongly consider having a second VM and 4GB disk configured and ready to go for upgrades and the like. It is completely awesome to be able to shut down one VM and bring up another a few moments later and restore service at the speed of an SSD datastore.