Want to ZFS send 7TB snapshot to S3 but S3 has a limited filesize of 5TB. Proper way to split?

Status
Not open for further replies.
Joined
Sep 5, 2017
Messages
8
Is there a proper way to split a ZFS send datastream at destination so that it isn't one gigantic file but rather several files instead? I've attempted to do a zfs send to S3 recently but I discovered that there is a 5TB filesize limit for objects stored on S3.

Someone suggested I zfs send to a local disk first and then split it and then upload those files to S3 but I found this to be a waste of disk space that could be used to store active data.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
How are you sending the data to S3? If you don't want to write the split chunks to disk locally, you'd need some process that can communicate with S3 that reads the snapshot data, starts uploading, stops when reaching some threshold, and starts a new S3 file. That's certainly possible, but consider how useable is this going to be?

In order to restore anything from that backup you'll need to bring down every piece, concatenate them together, and zfs receive that data to a new local dataset. If you ever want to update that snapshot you'll need to make another complete copy. You won't be able to make an incremental backup referencing those split chunks. You wouldn't even be able to reference the whole snapshot file if S3's filesize limit was higher.

I think you'd probably be happier with file-level backups using something like "duplicacy" or one of the other incremental backup tools.

But, perhaps you have a different purpose. If you state what that goal is, there's likely a better solution. I can't imagine putting a 7TB snapshot file in the cloud is it.
 
Joined
Sep 5, 2017
Messages
8
I've been piping zfs send <snapshot> to a | aws s3 cp - s3://<s3bucket> command. I understand that restoring would mean that I would have to bring everything down and concatenate them together. I am using this mostly as a seasonal full backup.

My goal has been to backup my user's home directories somehow. I avoided file-level backups because my dozen or so users generate a ton of tiny files (in the tens of millions) and I'm sure S3 would totally kill me on their per 1000 file PUT costs. Perhaps tar archives of the home directories would be a better option.
 
Status
Not open for further replies.
Top