ZFS layout for mixed use

LoftyGoals

Dabbler
Joined
Oct 18, 2023
Messages
15
I intend to use a single TrueNAS server for a variety of uses. I'm starting to think I'd be better served by splitting the storage according to use because the characteristics are so different. Does anyone have any advice or can anyone propose a layout?

My original plan was 6 x 8 TB WD Red Plus as a single RAIDZ2. The data is:
  • 5 TB cold storage of medium files (images, small videos, tarballs of small and medium projects)
  • 3 TB cold storage of large files (80-200 GB disk images, tarballs, and video editing projects)
  • 2 TB live Time Machine backups from two OSX laptops
  • 1 TB live Borg backups of running OSX laptops and Linux desktops
  • 1 TB (mostly read-only) database and several million small (~1k) blob files with high latency tolerance
  • 4 small Git repositories with high latency tolerance
  • boot drive
  • SLOG drive
There are no significant latency requirements. All access will be from a local network, generally over WiFi except for one-off large transfers and initial backups.

It seems like there might be value to breaking it up by file size or access frequency (RAIDZ2 vs mirror?) but I don't know ZFS well enough. What say you oh wise storage wizards?

-- Salvatore
smile.
 

probain

Patron
Joined
Feb 25, 2023
Messages
211
If you have the drive, then maybe use it for L2Arc instead of SLOG?
SLOG is mainly beneficial for sync writes. And your list doesn't really give the impression that you'll need that so much.

L2ARC is also supposed to be a lot more intelligent with the new ZFS2.2 too.
 

LoftyGoals

Dabbler
Joined
Oct 18, 2023
Messages
15
If you have the drive, then maybe use it for L2Arc instead of SLOG?
SLOG is mainly beneficial for sync writes. And your list doesn't really give the impression that you'll need that so much.

L2ARC is also supposed to be a lot more intelligent with the new ZFS2.2 too.
That's reasonable. I was mostly spec'ing SLOG of L2ARC because I've got a lot more space for RAM if I need it. I am really curious about the data drives, though... they are the ones that it seems would benefit from a split configuration
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
With the information available (it would be helpful to know the rest of the system) I would probably go for purely RAIDZ2. Perhaps an SSD mirror in addition, if you discover that you need IOPS for some things all of a sudden.
 

LoftyGoals

Dabbler
Joined
Oct 18, 2023
Messages
15
Thank you. Good to know. I'm a little surprised that the RAIDZ2 can efficiently handle large numbers of small files (e.g. Time Machine, Git), but I don't know enough to be that surprised.

I don't know the rest of the system but I'll update this thread once I do. I'm still trying to figure out if the board I spec'd is even viable given the currelty available chips and RAM.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
To be clear: I am not saying that RAIDZ2 is definitely the best possible solution for you. But you do not provide a lot of details on your requirements. Esp. criteria like "high latency tolerance" are pretty useless, because they mean totally different things to different people. And what exactly do you mean with "handle large numbers of small files efficiently"? What is a large number for you and what do you mean by efficiently?

I know I may come across as pedantic and too detail-oriented. But I have seen many project problems because such questions had not been asked until it was too late.

If you can honestly not answer such questions, that is ok. But it will potentially impact the guidance you get in a negative way. So it's worth thinking long and hard about them. I know it's difficult, but give yourself a lot of time to sleep over things and reconsider them again and again, if possible. It will likely improve the outcome.

Good luck!
 

LoftyGoals

Dabbler
Joined
Oct 18, 2023
Messages
15
To be clear: I am not saying that RAIDZ2 is definitely the best possible solution for you. But you do not provide a lot of details on your requirements. Esp. criteria like "high latency tolerance" are pretty useless, because they mean totally different things to different people. And what exactly do you mean with "handle large numbers of small files efficiently"? What is a large number for you and what do you mean by efficiently?

I know I may come across as pedantic and too detail-oriented. But I have seen many project problems because such questions had not been asked until it was too late.

If you can honestly not answer such questions, that is ok. But it will potentially impact the guidance you get in a negative way. So it's worth thinking long and hard about them. I know it's difficult, but give yourself a lot of time to sleep over things and reconsider them again and again, if possible. It will likely improve the outcome.

Good luck!
Meh, I'm a software engineer. I should know better than to say "high" and "low" but I also have no data. If I'm talking like an product manager (since, in this situation, I don't know enough to talk like an engineer):
  • "High latency tolerance": This is all basically user-facing, command-line-driven traffic. The expectation is that the I will hit enter and then wait a few seconds before anything happens.
  • "Larger numbers of small files": Roughly 10 million files that average 2 kb. The primary activity is retrieving a single file every few days. Again, if it takes a second to pull the file, that's fine. The efficiency concern is that it seems like RAIDZ2 doesn't function as well with files that are smaller than a block size and mixing files that differ by 6 orders of magnitude would make block size choice hard.
  • "Git repository": A half dozen personal-use repositories with at most a thousand files each. These files are all also in the 2kb - 10kb range.
  • "Time Machine": This is a backup of a single OSX installation. Anecdotally many of these files are < 10kb and change daily so I expect this to be a significant load in terms of small file churn.
  • "Borg": 1 TB of data stored in 500 MB chunks, written once with multi-second latency allowed.
  • "Medium files": 10 - 100 MB
  • "Large files": About 100 in the 1 - 5 GB range, two or three (disk images) as large as 200 GB.
Thank you for keeping me honest!
 
Top