Migrating from Windows to TrueNAS SCALE (Chia Farm)

rfc2324

Cadet
Joined
Feb 24, 2022
Messages
2
Hello TrueNAS community. First time here, just discovering TrueNAS.

I currently operate a Chia farm of about 1PiB size on a Windows 10 Pro. Here are the problems I'm facing:
- The HDD lookup times are increasing as I add more disks. It is tolerable right now (sub 2 seconds, the recommended time is less than 5 seconds)
- I want to scale to 2+ PiB in the nearest future, so lookup times will become worse
- The farm is intended to operate ~5 years into the future. So I want a file system that will heal bit rot (ZFS)

For those who are not familiar with Chia, the use case is this:
- Fill up your disk with "plot" files (101 GiB each). "Write once, read many times"
- There is no need for compression, no redundancy, no ACL, no write cache, no data read cache. Plot files are relatively cheap and easy to re-populate in an event of data loss
- There is a need for metadata read cache ONLY. This is what will presumably allow for faster lookup times
- I operate only 18TB drives at the moment. This allows to store 165 plots with 40GiB to spare (wasted space). I presume if I migrate to ZFS and create pools (?) with 3 HDDs each, this will allow to utilize 40 + 40 + 40 GiB spare space to store +1 extra plot

Current farm hardware:

Intel z390 + i7 8700
128 GiB RAM
System drive 1TB NVME
Blockchain drive 2TB NVME
An LSI knockoff HBA (12Gbs)
Multiple JBODs directly attached to the HBA. Every disk is visible individually


Bladebit plotter hardware to be commissioned soon:

2x Xeon 2683v4 (16c32t)
512 GiB RAM
1.6 TB enterprise NVME for temp drive
12 bays for HDDs


All HDDs are currently formatted in exfat. I fill up the HDDs on another PC and then stick them to the JBODs connected to the farm, then mount them into Windows system. I have some spare bays in JBOD and some spare blank HDDs. This is a dedicated system, so I run OS on the bare metal. I run 1 docker container for telemetry service (just to be on a safe side to only expose it to 1 log file instead of the whole OS file system) using Docker Desktop.

What I'd like to do:
- Install a TrueNAS CORE instead of Windows (I can use a new SSD for boot/system, and preserve existing NVME in case of fail)
- Mount all the exfat drives. Since TrueNAS CORE is a Debian system, I believe they can be mounted using the exfat-fuse and exfat-utils packages
- Create new storage pools in ZFS using blank drives in multiples of 3 HDDs per pool
- Configure each ZFS pool to have metadata read cache ONLY
- Move all the data (1PiB) to the new ZFS pools, creating new pools off off the drives that have been freed up. 165 * 3 = 495 plots per pool, plus later can populate +1 extra plot thanks to 40+40+40 TiB space enabled by ZFS striping
- It would be nice to minimize down time, so the blockchain DB drive needs to be converted from NTFS to ZFS at some point. The DB is not too large (only ~50GB), so moving it is not a problem, and disk can be wiped and re-formatted to ZFS. Probably no compression, standard ZFS otherwise. Then start Chia farming software and point it to the pools that are being created
- Actual data copying/migration from exfat/zfs will take several days at best

Plans to scale:
- The Bledebit plotter is going to be a bare metal system with only one purpose: create plots and write them to ZFS pools in multiples of 3 HDDs each
- So for best compatibility and performance, I would install a TrueNAS OS on that computer as well. Debian is known to be a good performer for the Bladebit, and it supports ZFS
- The HDDs (in multiples of 3), once filled up, will be moved to the main farm and will be mounted to be used by Chia software
- If I run out of memory (128TiB) for metadata cache and scrub utilities, I can add an SSD for L2 cache purposes. Another option would be to rebuild the farm system to server-grade hardware so that it can support a lot more RAM. I talked to someone wo has 9+PiB (!!!) farm on a single node using TrueNAS, they said the cache occupies about 280GiB RAM, plus scrubbing services take about 80 GiB RAM. Not sure if they used it for metadata read cache only
- Another possible bottleneck is CPU for scrubbing services. The 9+PiB guy said they had 2x Xeon 6230R (28c) 50-80% busy scrubbing



Please poke holes in my assumptions. Any advice, any leads would be much appreciated.
I am new to ZFS and will go ahead and read/research the basics myself, but if any insights come to your mind, please share.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Top