Hey guys, looking to get some input on a new configuration I'm looking to test out for housing block storage to be used for VMs. The TrueNAS won't host the VMs, just the data. The VMs will be mixed usage for VPS hosting.
Hardware
Chassis
SSG-6028R-E1CR24L https://www.supermicro.com/en/products/system/2U/6028/SSG-6028R-E1CR24L.cfm
Things we need to factor and must verify for optimal performance:
Thank you!
Hardware
Chassis
SSG-6028R-E1CR24L https://www.supermicro.com/en/products/system/2U/6028/SSG-6028R-E1CR24L.cfm
- 24x SAS3/SATA3 12GB/s backplane
- Upgraded to add 2 more hotswap SSDs that we’re going to use for ZLOG
- Upgraded to include 4 NVMe slots (uses 4 of the 24 hotswap slots)
- Purchased add-on PCIe card that holds 2 NVMe M.2 drives (possibly use for metadata)
- Purchased Connect-3x dual port 40GBE QSFP+ network card
- 2 x Intel Xeon E5-2697V3
- 256GB RAM but I’m thinking 512GB RAM (I’ve purchased enough 32GB DDR4 ECC RAM to max it out)
- OS SSDs: 2 x WD Blue 500GB 3D NAND
- Spinning Disks: HGST Ultrastar 8TB 12Gb/s SAS HUH728080AL5200 - https://documents.westerndigital.co...astar-sas-series/data-sheet-ultrastar-he8.pdf
- ZLOG SSDs: 2 x Intel DC S3610 800 GB 2.5” (5.3 PB of endurance) - https://www.intel.com/content/dam/w.../product-specifications/ssd-dc-s3610-spec.pdf
- L2ARC NVMe: 4 x Intel P4610 NVMe
Things we need to factor and must verify for optimal performance:
- Sector size – apparently firmware on drives (SSDs and Spinning) can sometimes show 4K or 512b sectors but we need to verify the true physical sector size to determine the block size. This is needed to determine the ashift for TrueNAS
- From what I’m reading in the PDFs – 800GB SSDs are 512b and the HGST spinning disks can be 4K/512e
- Smartctl output is needed
- Diskinfo output is needed
- There’s a TrueNAS command that shows what TrueNAS sees but apparently it doesn’t always give the accurate information, I’m still looking for the command. Can somebody direct me to the right command(s) to run? I understand that writing 512 sectors to a 4K physical sector can lead to really bad write performance (would require ZFS to rewrite the record 8 times)
- I’m thinking:
- Make sure the onboard raid controller cache is turned off so we don’t risk anything in case of a power failure
- RAID1 for OS
- RAID1 for ZLOG
- 2 x RAID1 for L2ARC on the NVMe in case a drive fails.
- I know people wouldn’t suggest that because the data still sits on the pool but we want to ensure we don’t have deprecated performance in case of a drive failure.
- This will stripe across both. Each drive is 660K IOPS and with RAID1 we would double that for reads (can read from 2 drives per each RAID1 volume) so that would put us at 2640K IOPS for read, but even if we only maintain the IOPS of one drive per RAID1 volume, that still puts us at 1320K IOPS which is amazing!
- 3 vdevs of 6 drives with 2 spare .
- This will stripe across the three VDEVs for better performance
- 8TB x 3 vdev x 4 drives of data (2 parity per vdev) = 96TB usable
- Metadata is something we need to investigate
- I’m reading that people can set up a special vdev to host this to improve the performance. Would this require a lot of write endurance? I imagine that it would only change when data changes which I don't anticipate it to change that much as most of the data hosted remains as is, with the exception of DB updates and whatnot
- I understand this is critical because if we lose the metadata then we lose the pool so we can’t do that
- Others have noted that you can run a command to store this in RAM. Some people don’t do this is a good idea because it can take up a lot of RAM but we can add more RAM. I haven't tried this before, would we need to manually run the command line on each boot to pull the data from the vdevs into RAM? If so, then there is a downfall here because I imagine this may need to rebuild it in RAM which means degraded performance for a long time until it does.
- Alternatively, instead of storing it in RAM, we can store it in L2ARC. I haven’t investigated but in theory this would mean it’s at least persistent and wouldn’t degrade the performance in case we rebooted as it would sit in L2ARC. Is this correct?
Thank you!