ZFS and backup best practices

dalnew

Dabbler
Joined
Dec 9, 2020
Messages
26
Hey all, I'm new to TrueNAS and I've just started using TrueNAS Scale, which I'm loving so far. I just built a new custom NAS w/TrueNAS Scale that I am planning on using as my primary NAS going forward (once it's "officially" released and stable). For now it's a test bed for me to understand the intricacies of ZFS and the TrueNAS system. I have been using Synology devices w/BTRFS for a long time now and will continue to use them as my NAS backup strategy. While I am waiting for the official TrueNAS release I would like to start setting up and configuring everything which includes:

- Docker containers for plex, video editing and encoding, software development
- VMs for Windows and other Linux distros
- A multitude of SMB shares for the family (media, backups for PCs, etc)
- Automatic backups from the TrueNAS

The first 3 I feel like I have a pretty good handle on. It's the last one I'm a bit fuzzy on what the best way to handle this would be.

To give you some understanding of the capability of the NAS I threw together:
- AMD 3950 w/128GB DDR4 ram
- 1x 1TB Samsung 980 Pro as the boot device (which seems like a bit of a waste at this point), w/ 7000MB/s (read) and 5000MB/s (write)
- 1x 1TB Sabrent Rocket 4.0 (currently unused) w/3300 MB/s (read) and 3300 MB/s (write)
- 5 zfs datasets on a pool w/1 RAIDZ1 vdev comprised of 4 Seagate x16 16TB drives (each rated for ~261MB/s max sustained transfer rate) which cover media, backups, software development storage, VMs, and scratch/tmp.
- 2x 10Gbe SFP+ NIC

I have tried to optimize each dataset according to its use (disable atime for perf, set recordsize to 1M for media, etc). There are SMB shares that expose these to various family members.

So far in my tests I've managed to get about 550 MB/s writing to the array and maybe 350 MB/s reading. From my rudimentary ZFS understanding that seems to be roughly what other folks are getting so I guess it's configured optimally? I mostly care about read speed since we will be doing more streaming of large files than writing but, at least from my understanding, the only way to improve read speed would be to add more drives to the pool, or better yet add another vdev. Unfortunately I've about hit my maximum spend, at least for a little while :)

As for the backups I understand ZFS has a nice system setup for snapshotting, which I've already configured differently for each dataset, however my synology machines will be running BTRFS. My understanding is simply doing a zfs send on the snapshot won't be very useful if I want to be able to log into the syno machines and actually "do stuff" with the files since the snapshot will be in some zfs format... is that right? If that's the case what's the best way to perform backups? Do something like:

  1. Create the baseline zfs snapshot
  2. Do stuff
  3. Create another snapshot
  4. zfs diff the snapshots to determine what changed, parse out what files need to be backed up
  5. rsync those changes to the synology backup
Is that the best way to do it or am I way over complicating this?

If anyone has any suggestions on things to check to improve configuration or performance even more I'm all ears! Thanks!
 
Joined
May 2, 2017
Messages
211
I don't know synology, but snapshots are a great backup tool. I have a pretty simple approach, because my experience is that the more complex your backup is, the less you'll do it, or check in on it. Here's what I do.

For critical stuff, like a dataset of irreplaceable documents, I do an hourly snapshot. I set that to expire after one day. At any point, you can recover to a point in time of the last day. So if you edit your resume and then decide you don't like it, you can restore to before you started up to yesterday. Then, I do a daily snapshot and keep for a week. I make a weekly and keep for a month, and a monthly which I keep for a year. The chance you need to recover to a specific time from two months ago isn't realistic. If you're going back a couple months, you just recover from the monthly snapshot from then. This scenario works to give you recovery for a full year if something horrible happens.

I also use backblaze which integrates directly into "Cloud Sync Tasks" and automatically syncs an offsite copy to the cloud. It's encrypted before it leaves your machine, and the key isn't stored by them. Private, and fairly cheap. I have that service keep all versions for year, and once a year I create a "snapshot" on their web site, and download the zip file of everything to a portable drive you can keep in the house somewhere.

With this, you can recover instantly, for a year, and keep yearly archives forever. With the advantage of having an offsite backup if your house collapses from the weight of world politics. LOL

There's also replication if you look into that, where you can export a snapshot to another machine.

Happy exploring!
 

dalnew

Dabbler
Joined
Dec 9, 2020
Messages
26
Thanks for the info. I am also snapshotting the heck out of my datasets, some more often than others. For undoing mistakes this seems like an extremely powerful and fast wrench in the ZFS "toolbox". In your case are you syncing the ZFS snapshot blobs or backing up the actual files to backblaze? Is backblaze running ZFS or some other OS?

The replication and saving of ZFS snapshots to a remote machine sounds great, except in my case I won't be syncing to another ZFS server so all I could do is store the replicated snapshots as a blob I believe. That isn't ideal since if something were to happen to my ZFS NAS I would want to boot up my backup server and have immediate access to the files without having to download the snapshot back to another ZFS server first to access them. Or maybe I'm missing something and there is a way to read them even when on a different OS?
 
Joined
May 2, 2017
Messages
211
The cloud sync task copies the actual files to backblaze. If you encrypt though, you'll have to decrypt when you restore them, but this can also be done by doing a cloud sync in the opposite direction. So you "push" to BackBlaze when you backup, and you can then create a new dataset and "pull" your backup down to it. You can enter the encryption credentials in the pull task and it will recover what you pull to the new dataset unencrypted again.

I only consider that an emergency recovery though, and haven't needed it thankfully.
 
Top