It does FreeNAS: SuperMicro X10SDV Xeon-D 1541 ESXi all-in-one

Status
Not open for further replies.

jdong

Explorer
Joined
Mar 14, 2016
Messages
59
After reading about the Xeon-D and some folks' adventures in ESXi all-in-one boxes for over a year, I finally decided to pull the trigger and build one. This is mostly to document my decisions in posterity for the benefits of others contemplating this setup, as well as for the benefit of my future forgetful self.

The short story is, I've "succeeded" in building a combo NAS + consolidated home server, using a Xeon-D 1541 running ESXi and virtualizing FreeNAS to serve an iSCSI datastore back to ESXi, which in turn runs a few VM's (Windows Server, Linux, FreeBSD). So far, "success" is 2 weeks of burn in testing using a Windows moderate database workload and FreeBSD running an IO workload while performing ZFS scrubs of its virtual device, while surviving numerous hard and soft resets without detectable corruption in the VMs or FreeNAS.


DISCLAIMER: Overall, I have to say, this isn't your average FreeNAS config. This kind of all-in-one ESXi+FreeNAS home server sounds like it's what every home server runner wants, but beware! It took much much longer to set up compared to my previous bare-metal FreeNAS config. So my warning here is: Setting up one of these all-in-one servers is relatively costly and requires a good technical grasp on both ESXi and FreeNAS along with iSCSI/NFS. And the fun won't stop when you finish your build: It puts you slightly off the beaten path (especially for ESXi) and you should be aware that future updates to FreeNAS or ESXi may require additional tweaking. Given that I've only had 3 weeks of experience with this setup, it's way too soon to tell how it stands up over time. All I can say is that I believe this setup follows most of the best-practices guidelines for virtualized FreeNAS, with the exception of serving the NAS array back to ESXi, which has been considered troublesome and possibly broken in earlier versions of ESXi.


Hardware List:
Case: Fractal Node 804
Motherboard: SuperMicro X10SDV-TLN4F-O Xeon-D 1541
PSU: Seasonic X650 Gold
HBA: IBM ServeRAID M1015 cross-flashed to IT, in VT-d passthrough
Storage Drives (via M1015): 4x 4TB WD Red, 2x 4TB HGST DeskStar NAS, combined as 3 mirrored pairs (Red Red, Red HGST, Red HGST). I know the pairing isn't ideal but it's what drives I had initially.
RAM: Crucial 2x32GiB CT2K32G4RFD4213 (DDR4-2133 RDIMM)

Boot Drive (via onboard SATA): Samsung 830 SSD (cheap and good enough for the ESXi boot drive + FreeNAS boot datastore, backup and contingency plans in place)

Extra SSD (via M1015): Intel DC3500 (originally planned as a SLOG for NFS, but since switching to iSCSI deemed unnecessary. Evaluating it as a L2ARC but not set in stone yet)


Installation Notes:

I won't attempt to do a poor job of regurgitating existing instructions. Here's the documents that I found most helpful:

https://b3n.org/freenas-9-3-on-vmware-esxi-6-0-guide/
I preferred having CHAP authentication on the iSCSI share, and found these instructions helpful for configuring this on FreeNAS: https://ryanclouser.com/2015/07/16/FreeNAS-9-3-iSCSI/

However, I would like to provide some notes here of things that snagged me:
  • Cross-flash your M1015 the old-fashioned way, with a bootable DOS flash drive. Multiple documents online made it sound like it was straightforward to just passthrough the M1015 as-is and have FreeNAS cross-flash it. But attempting this led to numerous mpt boot hangs / command timeouts before even getting to single-user mode. The instructions I found to work the best are here: https://www.servethehome.com/ibm-serveraid-m1015-part-4/ including the advice to erase the flash and reboot before cross-flashing. I found that SAS2FLASH did not want to recognize my M1015 until I erased it. YMMV
  • NFS was super easy to set up as a datastore, and you can even get automount-on-boot and VM autostart working fairly out of the box just by adding a long boot delay after FreeNAS using 100% built-in configurations. But I found iSCSI to be substantially faster, even with a SLOG device. In fact, iSCSI without SLOG was faster than NFS with SLOG, for the workloads that mattered to me (simulated by installing, uninstalling, then installing Visual Studio again). Needless to say, NFS without SLOG is basically a nonstarter if you want any hint of write performance. Without SLOG, any amount of NFS write traffic will bring your entire pool performance to a grind. You can refer to one of the many threads on NFS write performance / SLOG contemplation.
  • I don't want to start an iSCSI sync= debate, but from my opinion as a *nix kernel engineer with storage experience, I think you are fine from an integrity point-of-view with iSCSI with sync=standard than NFS sync=disabled (duh!). In particular, ESXi appears to correctly route guest SCSI "SYNCHRONIZE CACHE" commands to an iSCSI SYNCHRONIZE CACHE command, which FreeNAS will take to mean a sync iSCSI write. Modern OS filesystems understand how async writes work when dealing with drives that support a synchronize cache equivalent, and for important metadata writes they will command a barrier flush. The takeaway point is, sudden power loss with iSCSI is just like sudden power loss on a physical machine with a large disk cache. You may lose some data that was in flight, but it will not cause completely trashed guest filesystems, corrupt databases, etc etc etc. However, if you have sudden power loss with NFS and you had set sync=disabled, you could experience massive corruption of filesystems and databases in your guests. In the NFS + sync=disabled case, FreeNAS happily lies to ESXi and claims data is committed to disk when it's not. Again, modern filesystems understand the concept of async writes with volatile disk caches and to use SCSI/SATA commands to flush the disk cache synchronously when required for metadata integrity and important data. But no system can cope with a storage backend that lies about synchronous writes not being synchronous.
  • Now with that out of the way, I've hopefully set up the motivation for trying to get iSCSI working. It's a rocky road ahead, so sit tight, and hopefully resist the temptation to crawl back to NFS + SLOG.
  • For iSCSI, DO NOT use Dynamic Discovery except for initial setup. Dynamic Discovery has numerous layers of timeouts and ESXi boot will hang for literally minutes at "loaded iscsi_vmk" trying to discover the not-yet-booted FreeNAS target.
  • IF you use Dynamic Discovery to get the correct target name, you must work around an unintuitive ESXi UI bug: You must delete both the static target and the dynamic target, then re-add the static target. If you only delete the Dynamic Target, the UI makes it look like the static target survived, except upon reboot you'll find that the static target disappears to. The sequence that worked for me was to delete both the dynamic and static targets, and then re-add the static target (of course keeping the full target name on your clipboard in case you're not a wiz at iSCSI nomenclature)
  • Using static discovery, there will be an unavoidable ~45s boot hang at "loaded iscsi_vmk". Deal with it. It's not worth turning a million tunable knobs for this.
  • If you use CHAP, iSCSI will not auto mount by default. ESXi appears to treat a login timeout as a login failure which is essentially fatal (and won't be retried, at least not for an extremely long time). If you do not use CHAP, you may find that with this configuration alone, iSCSI auto mounts at boot shortly after FreeNAS starts.
  • If you use CHAP, long story short, you need to execute two commands: esxcli storage core adapter rescan --all will attempt to connect to iSCSI and make the block device available on the iSCSI vmhba. However, as of ESXi 6.0 it no longer triggers auto mounting of the datastore's vmfs5 filesystem. To do that, you must run vmkfstools -V; esxcli storage filesystem automount. The latter command is synchronous -- the datastore is immediately available after the command returns.
  • I've had great success automating this via /etc/rc.local.d/local.sh, which I will include a copy of below.
  • Note that while FreeNAS iSCSI supports VAAI (e.g. for trimming unused iSCSI blocks), ESXi does not use this capability by default. Deleting VM's and having guests trim themselves will not cause the iSCSI zvol to show less usage. You actually need to execute the esxcli storage vmfs unmap -l freenas_iscsi in ESXi. I have yet to investigate if ESXi has tunables to control this. However, just keep this in mind if you notice your ZVOL taking up too much space after spring cleaning in your datastore.
  • Use VMXNET3 for the FreeNAS VM. FreeNAS 9.10 supports it marvelously, and it's good for at least 3Gbit/s throughput at the ZFS level. I haven't bothered to see if it can go faster or not -- this speed is more than enough for my purposes.
  • I spent a little bit of time enabling Jumbo Frames across the whole stack, and verified that iSCSI frames were indeed jumbo. It didn't change performance at all. Don't waste your time, IMO, unless you have an existing 10GBe infrastructure that would benefit from jumbo frames too. VMXNET3 seems to get almost no benefits from Jumbo Frames.
Also, some miscellaneous tips on the SuperMicro board / Fractal Node 804 case:
  • The case fan has a low-medium-high speed switch on the Node 804. I find with 6 drives, Medium is acceptable. Low is much too low and allows the drives to creep into the mid-40 deg C range.
  • Speaking of the Node 804 and drives: If you are using less than 8 drives, try to leave the back drive bays empty for air circulation. I find that drives pressed against the back run around 3 deg C warmer.
  • One more thing about the Node 804 and drives: Be aware that the right most drive cage (above the PSU) has very low clearance for SATA connectors. I'd highly recommend choosing angled (L-shaped) SAS/SATA connectors rather than straight ones.
  • SuperMicro board: Update your BIOS to 1.1c. Apparently there's an Intel microcode bug in the Xeon-D's that results in ESXi panics especially when using VT-d. Flashing is once again best done using a bootable USB DOS drive.
  • SuperMicro BIOS settings: DO NOT mess around carelessly with power management settings in the BIOS. Namely, turning on HWPM in either Native Mode or OOB Mode resulted in a massive 50% loss in CPU performance. If you are going to mess with these settings, I'd recommend establishing a baseline with your favorite synthetic CPU benchmark and also having a Kill-a-Watt type power meter to measure your performance vs power tradeoff. The Xeon-D platform is already extremely efficient and quite frankly a mega server like this is never idle long enough for CPU deep C-states to really matter, so you end up causing more harm than good when tuning these parameters.
  • The only BIOS power setting I recommend tuning is Energy Efficient Turbo and Turbo Boost in general. When EET is off, you do see an extra ~5-10% CPU performance on single-core workloads, but it results in a pretty substantial +10W or so under full load. That is probably peanuts to anyone running a Xeon E3, though...
  • SuperMicro board: The CPU fan is kind of noisy on this board. It's small and creates more high-frequency noise than you'd really want. If this bothers you, set the BMC's Fan Speed parameter to "Optimal" (which reduces minimum fan speed from 50% to 30% + adaptive), and increase the speed of the case fans to Medium or High. The Fractal 804 case fans are delightfully low-noise and low-frequency hum fans, and with sufficient airflow the CPU fan barely needs to do any work to cool this energy-efficient sucker.


At any rate, I hope this helps someone.... Sorry for the long rambling read. This information took me some time to figure out from trial and error, so I genuinely hope it will help someone considering one of these ESXi+FreeNAS combo setups or considering the SuperMicro Xeon-D 1541 platform.


Appendix:

As mentioned, here is what I use in my local.sh to start FreeNAS, poll for the iSCSI target to show up, then mount the datastore and start all the VM's on it:

Remember that ESXi /bin/sh is Busybox fake bash — it is somewhere half between a true Bourne /bin/sh like *BSD and true GNU bash.
Code:

vim-cmd vmsvc/power.on 1

while [[ X"`esxcfg-scsidevs -c | grep FreeNAS`" == "X" ]];

do

esxcli storage core adapter rescan --all

sleep 5

done



while [[ X"` esxcli storage  filesystem list | grep freenas_iscsi | grep true`" == "X" ]]; do


vmkfstools -V

esxcli storage  filesystem  automount

  sleep 1;

  done


  sleep 5


  for vmid in $(vim-cmd vmsvc/getallvms | grep freenas_iscsi | grep -v DISABLED | cut -f1 -d" "); do

  vim-cmd vmsvc/power.on $vmid

  sleep 2

  done

 
Last edited:

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Last edited:

jdong

Explorer
Joined
Mar 14, 2016
Messages
59
Thanks! Yeah, so far I am quite thrilled with this setup. It replaced my previous Avoton 4-core FreeNAS as well as 2 Core i7-4770 PC's sitting in a closet, so overall it's probably a net savings of around 100W 24/7/365, which makes me quite happy. The closet itself is quite a bit cooler than it was before.

Note that, of course, the Xeon-D 1541's single-core performance is significantly less than the Core i7-4770, but honestly those machines were underutilized to begin with, and unfit from the beginning to be servers.

Allocating 6 vCPUs to FreeNAS, the Plex plugin transcoding a movie is a whopping 10x faster than the 4-core Avoton. Some part of me regrets not going for the 8-core Avoton to begin with, which is what slowly led me down the path of building this box. But so far, no looking back at this monster in my closet.
 

jdong

Explorer
Joined
Mar 14, 2016
Messages
59
One more lesson learned: DO NOT turn on SR-IOV in the BIOS unless you know what you're doing. I tried it for fun, and it renumbered all my PCIe devices so my HBA no longer got passed through. Even after fixing that, FreeNAS saw some weird PCIe derivative of my HBA and refused to recognize any drives on it.
 

Domino

Cadet
Joined
Nov 11, 2016
Messages
1
Wow this is gold, thank you! Exactly what I have in mind, I have most of the hardware san the controller.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Oh, if you wanted to run my fan controller script in your VM FreeNAS you can get it to talk to the bmc over the 'network' even if it's not running on the metal ;)
 
Status
Not open for further replies.
Top