Mirroring Specific Files onto Scratch Drive

Joined
Dec 23, 2022
Messages
2
I have not used ZFS before and looking for some advice.

Our ubuntu 22.04 server (2x Zeon Platinum 8352Y, 1TB RAM) has 2x 480GB SSD in raid, 4x 7.68TB NVME drives (Scratch), and 12x 20TB HDD (Storage).

For the scratch, I am planning to use ZFS without redundancy. If a drive fails, it's no big deal.

For the storage, I am planning to make 2 vdev with 6 drives each and zraid2 (Allowing for 2 HDD failures). Each of the vdevs will be part of a single pool. These files will be backed up on another server simply using rsync (Due to compliance/budget issues, we can't change our backup solution to something like rsync.net, which supports ZFS natively. But please tell me if I should be considering something else than rsync).


Some of our files are quite large and read frequently. Thus, I would like to simply mirror specific folders, within the pool, onto the Scratch. Additionally, I would like this process to be (1) automatic, (2) discriminatory (e.g., I want to mirror specific folders only... not the entire pool), and (3) have the mirror read-only (e.g., users may update certain files on the storage directory, those files will be mirrored onto the Scratch, and those files will be read/executed only for all users on Scratch).

Is there a ZFS solution that I should be considering?

Thank you!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Our ubuntu 22.04 server

Welcome to the TrueNAS Community Forums.

These forums are for the discussion of TrueNAS, an appliance operating system based around ZFS. It is not a general ZFS support forum, and discussion of your Ubuntu server is off-topic in the General forum. You are welcome to discuss this sort of thing in the Off-Topic forum, so your post has been moved there.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I have not used ZFS before and looking for some advice.

...

Some of our files are quite large and read frequently. Thus, I would like to simply mirror specific folders, within the pool, onto the Scratch. Additionally, I would like this process to be (1) automatic, (2) discriminatory (e.g., I want to mirror specific folders only... not the entire pool), and (3) have the mirror read-only (e.g., users may update certain files on the storage directory, those files will be mirrored onto the Scratch, and those files will be read/executed only for all users on Scratch).

Is there a ZFS solution that I should be considering?

Thank you!
Having 1TB of main memory is not quite enough for moving the 4 x 7.68TB NVME to Storage pool as L2ARC. But, L2ARC does more or less do what you want.

Others might chime in and say 1TB of main memory and 4 x 7.68TB NVME as L2ARC might be fine. Their ARE options in ZFS' datasets for both primary cache and secondary cache.


It is possible to make a specialized read cache. But that is more manual effort.
 
Joined
Oct 22, 2019
Messages
3,641
Some of our files are quite large and read frequently.
I thought this was one of the main issues the ARC addresses?

With 1TB of RAM, shouldn't that yield ample opportunity for the most requested (and re-requested) records to be held in the ARC? (Unless you're taking about some really big files, and quite a lot at that, which are requested way beyond what 1TB of RAM can gracefully handle?)

Others might chime in and say 1TB of main memory and 4 x 7.68TB NVME as L2ARC might be fine. Their ARE options in ZFS' datasets for both primary cache and secondary cache.
I think this would be a worthwhile approach.

However, can't it be tested with just a single 7 TB NVMe? (You can always expand it later to stripe with additional NVMes.)
 
Joined
Dec 23, 2022
Messages
2
With 1TB of RAM, shouldn't that yield ample opportunity for the most requested (and re-requested) records to be held in the ARC? (Unless you're taking about some really big files, and quite a lot at that, which are requested way beyond what 1TB of RAM can gracefully handle?)

The main purpose of this server is for computations unrelated to storage. Additionally, much of the storage on the server consists of 1-3GB raw files and will be accessed infrequently (Like... every 3-6 months). However, the product of those raw files will be on the storage drive, rsync to an external backup solution, and I would like to mirror the products onto the Scratch directory. Overall, the files range from small (10 Mb) to large (1-3GB) and likely won't exceed 3-5 TB on the scratch directory (And only a portion of those files will be read daily/hourly).

Most of the scratch directory is for users to run jobs, look at the results, delete, and rinse-and-repeat.

From my initial assessment of ZFS, I read that ZFS can be a memory hog (ARC is set at 50%) but its fine to limit it and its unlikely there will be significant performance issues with storage that is essentially cold-storage. Thus, I was planning to limit ZFS to 16GB of RAM. Am I wrong to do this?

I was hoping that I could define certain folders as their own ZFS filesystem and simply mirror those filesystems to another pool. And, in the process, change the permissions so the files on the scratch directory are read/execute only (no write).
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
With 1TB of RAM, shouldn't that yield ample opportunity for the most requested

This person is discussing a non-TrueNAS application, on a Ubuntu server. Please adjust your assumptions accordingly. :smile:
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
From my initial assessment of ZFS, I read that ZFS can be a memory hog (ARC is set at 50%) but its fine to limit it and its unlikely there will be significant performance issues with storage that is essentially cold-storage. Thus, I was planning to limit ZFS to 16GB of RAM. Am I wrong to do this?

Linux has a stupid memory management design. It might be better if you could switch to FreeBSD, where memory demands are adjusted on the fly and ZFS is allowed to use whatever is not being used by the system for other purposes.

Limiting ZFS to 16GB of ARC on a system with 12 * 20 = 240TB of pool space is incredibly bad and will kill performance. Unlike your classic FAT, EXT3, UFS, or NTFS filesystems, where you are used to having a small amount of metadata on a filesystem that is probably only 1 or 2TB, ZFS has to maintain metadata awareness for all 240TB.

Now, as a thought exercise, let me run you through this. Your ZFS wants to write data to the pool. So you look at the metaslabs to find out what's a good choice for this write. However, "someone" has limited it to 4GB of metadata (25% of the ARC size) which means that ZFS will now thrash around doing I/O to the pool to try to analyze the situation to find free disk space. How efficient do you think that will be?

There is a reason for the classic "have 1GB per TB of disk space" advice; many people choose to ignore it, and to a certain extent you can get away with bending it, even pretty significantly, but ZFS really does require ARC space. Not just 16GB for a 240TB pool. I'm pretty sure that you can get away with less than 240GB, and I wouldn't be shocked if 96GB ARC worked reasonably well, but you will find performance dipping as that number drops.
 
Top