Cloud sync task with AWS S3 Deep Archive

ChapterSevenSeeds · Feb 27, 2023

I am currently in the process of seeking out a cheap method for getting all of my data in the cloud to protect my data against a disaster, and am currently eyeing AWS S3 Deep Archive. The documentation on setting up the cloud sync task in TrueNAS is clear for the most part. I plan to set up the task to sync my data to S3 - if I delete a file on my server, I want the file gone from the cloud too. However, the TrueNAS documentation says that TrueNAS is unable to delete files from AWS S3 Deep Archive, which makes sense given that files in that storage tier must remain there for at least 180 days to avoid extra fees.

With the above, I have two questions.

If I delete files on my server, how can I have the cloud sync task delete them from the cloud too? Does anyone know of an S3 bucket lifecycle configuration that can work in conjunction with the cloud sync task? I know that I can have a lifecycle configuration mark files as noncurrent after x number of days. But, does the cloud sync task automatically mark the existing files on the server as current again?
When it comes to a complete restoration, I know that I can set the cloud sync task to PULL instead of PUSH. Does this also work with S3 Deep Archive? I read that one must move files from the Deep Archive tier to a more accessible tier in order to download them. Does the cloud sync task do this automatically?

And, if anyone knows of any other cheap methods for keeping an offline backup of all my data, please let me know. I run a RAIDZ2 pool with 8x 20 TB drives. I'm currently only using 38 TB, but I can potentially store up to 103 TB on this pool. S3 Deep Archive only charges about $1 per TB, so it seems like the obvious candidate, given the level of redundancy the pool has. Restoring all this data in the event of a failure or disaster is what scares me, with the estimated data egress cost roaming around the $3000 price point. Both Google Cloud and Microsoft Azure have similar costs for both the storage and data egress.

Any advice on this is much appreciated.

morganL · Feb 27, 2023

ChapterSevenSeeds said:
I am currently in the process of seeking out a cheap method for getting all of my data in the cloud to protect my data against a disaster, and am currently eyeing AWS S3 Deep Archive. The documentation on setting up the cloud sync task in TrueNAS is clear for the most part. I plan to set up the task to sync my data to S3 - if I delete a file on my server, I want the file gone from the cloud too. However, the TrueNAS documentation says that TrueNAS is unable to delete files from AWS S3 Deep Archive, which makes sense given that files in that storage tier must remain there for at least 180 days to avoid extra fees.

With the above, I have two questions.

If I delete files on my server, how can I have the cloud sync task delete them from the cloud too? Does anyone know of an S3 bucket lifecycle configuration that can work in conjunction with the cloud sync task? I know that I can have a lifecycle configuration mark files as noncurrent after x number of days. But, does the cloud sync task automatically mark the existing files on the server as current again?

When it comes to a complete restoration, I know that I can set the cloud sync task to PULL instead of PUSH. Does this also work with S3 Deep Archive? I read that one must move files from the Deep Archive tier to a more accessible tier in order to download them. Does the cloud sync task do this automatically?

And, if anyone knows of any other cheap methods for keeping an offline backup of all my data, please let me know. I run a RAIDZ2 pool with 8x 20 TB drives. I'm currently only using 38 TB, but I can potentially store up to 103 TB on this pool. S3 Deep Archive only charges about $1 per TB, so it seems like the obvious candidate, given the level of redundancy the pool has. Restoring all this data in the event of a failure or disaster is what scares me, with the estimated data egress cost roaming around the $3000 price point. Both Google Cloud and Microsoft Azure have similar costs for both the storage and data egress.

Any advice on this is much appreciated.

It's a good question....no easy solution I know of.

Deep archive/Glacier is great if you have some static data that you can upload, put into glacier status and never read.
However, syncing is more difficult

One of the reasons we created the iX-Storj service was its economics. $4 per TB for storage and $7/TB for download. Restoration costs would drop from $3000 to $300. It's also easier to sync. Much cheaper than standard S3. Much easier to use than Glacier.

A middle ground is to find datasets that are static and move them to Glacier....

ChapterSevenSeeds · Feb 28, 2023

morganL said:
A middle ground is to find datasets that are static and move them to Glacier....

That makes sense. Probably over 80% of the data on my pool consists of movies, videos, music, and pictures for Plex. Those are the most static of all the files I have, and I could probably get away with storing those in S3 Deep Archive. As for other things that change more often, it would make sense to save those somewhere else. But then again, if I learn that I can use a combination of lifecycle rules and a cloud sync task configuration to only upload new versions of files and delete non-current versions of files once I won't be charged an early deletion fee, I think I would just prefer to stick entirely with S3 Deep Archive. The $1/TB price point is very attractive. The restoration process is still scary.

$4/TB for iX-Storj is quite good. Storing all $38 TB would bring me to $158 per month, which isn't all that bad. It just sucks since I am a college student trying to save to buy a house . This prospect would become even more grim if I happen to fill up the rest of the space on my server, which would bring me to $400/month.

I read that the cloud sync tasks are just a wrapper around rclone. I will also hop over to their forums to get advice from the folks there.

ChapterSevenSeeds · Mar 1, 2023

morganL said:
One of the reasons we created the iX-Storj service was its economics. $4 per TB for storage and $7/TB for download. Restoration costs would drop from $3000 to $300. It's also easier to sync. Much cheaper than standard S3. Much easier to use than Glacier.

I got a file count of everything on my 38 TB pool and that number came out to 1316256. If I pushed everything to AWS S3 Deep Archive, I would imagine that a bimonthly cloud sync task would require at least twice that number of HTTP requests to the S3 API. This would push my bill to almost $170/month. Does iX-Storj charge per HTTP request made to the APIs? If it doesn't, I think my best bet would be to push all my large media to S3 Deep Archive and everything else to iX-Storj, which would be a good middle ground of storage and requests needed to maintain the backups.

morganL · Mar 1, 2023

ChapterSevenSeeds said:
I got a file count of everything on my 38 TB pool and that number came out to 1316256. If I pushed everything to AWS S3 Deep Archive, I would imagine that a bimonthly cloud sync task would require at least twice that number of HTTP requests to the S3 API. This would push my bill to almost $170/month. Does iX-Storj charge per HTTP request made to the APIs? If it doesn't, I think my best bet would be to push all my large media to S3 Deep Archive and everything else to iX-Storj, which would be a good middle ground of storage and requests needed to maintain the backups.

For iX-Storj, there is some reading of metadata which has a price, but no direct price per API requests.
You can run a trial quite cheaply and see the results.

If you can partition your large media this way, it might work. It would be useful to know that S3 Deep Archive works this way... worth another test. Its often too awkward to use easily.

nihil2041 · Jan 11, 2024

I use TrueNAS for all of my on-prem storage needs. I use AWS CLI for S3 remote backups. I've got close to ~50TB in AWS Glacier cloud where I pay less that $40 a month. I initially used an AWS Snowball device to get my data into the AWS cloud, and have a sync job run twice a week to upload the changes. For disaster recovery, I would use a Snowball device to get access to my data again.

NugentS · Jan 11, 2024

I tested ix-storj with a fairly difficult set of data. It worked, but the number of objects was silly (and costly)
Source material was approx (it changes):

Size: 56GB
Files: 180,000 ish
Folders: 87,000 ish

[aka lots and lots of small files in lots of folders]

The issue seems to be that rclone (used under the hood) just uploads each file as its own object as far as I can tell, or something similar, it made the backup impractical on Storj.

I did sucessfully test using duplicati, managing to reduce the number of objects down to a sensible number

Important Announcement for the TrueNAS Community.

Cloud sync task with AWS S3 Deep Archive

ChapterSevenSeeds

Dabbler

morganL

Captain Morgan

ChapterSevenSeeds

Dabbler

ChapterSevenSeeds

Dabbler

morganL

Captain Morgan

nihil2041

Cadet

NugentS

MVP

Similar threads

Important Announcement for the TrueNAS Community.

Cloud sync task with AWS S3 Deep Archive

ChapterSevenSeeds

Dabbler

morganL

Captain Morgan

ChapterSevenSeeds

Dabbler

ChapterSevenSeeds

Dabbler

morganL

Captain Morgan

nihil2041

Cadet

NugentS

MVP

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Cloud sync task with AWS S3 Deep Archive"

Similar threads