Research data storage system

rapidgorgon

Cadet
Joined
May 20, 2020
Messages
1
I manage the IT of a small research lab, where data sets are produced consisting of ~1000 MB-sized images. We have a few analysis machines, and over time we have collected a few different NAS systems to store our data. It's come to the point where managing all the machines, and ensuring there's enough space on the analysis machines, is becoming an issue. So I would like to move our data storage to a more scalable system, which brings me to TrueNAS.

At the moment I already have recent server, which I use for hosting a few VMs. Since this VM host has resources to spare, I was thinking of setting up a virtualsed TrueNAS system, with an HBA card via PCI passthrough. The storage disks would be placed in a separate JBOD chassis. At the moment I have around 60TB of data, although some of that could probably be freed up when consolidating the storage.

VM host specs:
  • dual E5-2630 v3 CPUs (20 cores/40 threads, only 12 allocated to VMs)
  • 128GB ECC DDR4 (can be extended, currently only using ~10GB)
  • 1TB of VM flash storage on RAID1 (2×2.5" SAS bays used, 8 free)
Planned storage layout:
  • RAIDZ2 vdev of 8×12TB, for 72TB of raw storage (new disks)
  • RAIDZ2 vdev of 8×4TB, for 24TB of raw storage (reusing current disks)
Data access:
  • Serve data over SMB
  • 10Gb/s ports to data analysis servers
  • Use local (SSD backed) scratch space during data analysis.
My idea was to put the vdevs in a single zpool, so storage can be scaled up more easily without users having to worry about where their data is. The same reasoning is behind the external JBOD chassis, since I would like to start with space for a total of 24 disks, to more easily upgrade vdevs in the future.

Given my use case, I don't think SLOG devices are useful (little synchronous writes). L2ARC probably also won't be very useful, unless the network performance is good enough to be used for real-time (read) access during data analysis. In any case, the VM host has room to extend RAM and add some SATA/SAS SSDs, so this could be implemented at a later point if needed.

Given that this is the first 'larger' storage system I'm designing, I was wondering if I missed any drawbacks or bottlenecks. Recommendations on how much resources (vCPUs, RAM) I should allocate to this VM, are also welcome. :smile:
 
Top