Strategy for dealing with archive dataset?

Apollo · Apr 30, 2015

I need to create a dataset to archive files I don't want to delete ever.
Basically, I currently have several datasets which are used to maintain integrity of personal documents but they are always in a state of flux. What I mean for instance, I have photos I can access anytime. I do move some back and forth between the computers on my network. With backups and all, I do find redundant files spanning several dataset. I have backups of those files in compressed form and I want to be able to set them as read only so that I cannot delete them due to user error.

With this in mind, I have created a dataset, which is going to contain those backup files or files/documents for which I have no need or desire in modifying them.

My question is more on the management side of things.
Ideally, I want to retain access to these files under the new dataset as a share but I want to prevent the files from being erased. I also do not want to have write access unless when I need to move new files to be placed in the archive. Once the file has been copied, I have to have the dataset revert to its "no delete" and "no write" permission. What is the best strategy to accomplish this?

With that in mind, I would like to be able to run some kind of script that would allow me to see whether a file has been added or removed. It is possible to query the snapshots to see the differences. Is this the proper way of doing it?

Apollo · May 10, 2015

echo..... anyone out there?
I am surprised I am not getting any update on this thread.
With the IT guys around here I should be able to get some feedback, but none whatsoever.

Should I rephrase my request?

Bidule0hm · May 10, 2015

The first thing I thinked of is to have a subdirectory on the dataset with write rights and a cron executed script to copy the files from this directory to the dataset. I don't know if it's the best solution but at least it's a start :)

The second solution would be to have write rights in the dataset but to have a cron executed script who change the rights on the files to be read only. But I think it's less safe than the first solution and it's heavier on the ressources.

I'm not sure to understand the added/removed thing as you'll not remove files because it's read only. Also, you want to compare what to what? backup to an user directory? backup to older backup? ...?

Apollo · May 10, 2015

Bidule0hm said:
The first thing I thinked of is to have a subdirectory on the dataset with write rights and a cron executed script to copy the files from this directory to the dataset. I don't know if it's the best solution but at least it's a start :)

I don't think the CRON approach will fit my need. I think what your approach tries to implement is the ability to copy data present in one specific folder and have it copied to the "Archive" dataset on a regular schedule, and possibly massage the permission rights accordingly.

The second solution would be to have write rights in the dataset but to have a cron executed script who change the rights on the files to be read only. But I think it's less safe than the first solution and it's heavier on the ressources.

By default, I should not be able to write to the Archives dataset with my current user account. I only want to have read access.
If I create a user with write privilege, how can I do that over CIFS as it seems not possible to access Freenas CIFS share from the same Windows account session? If I was indeed able to access a share with the write privilege, then how can I have access to the Archives dataset and prevent unintentional delete or move actions?
Ideally, I would like to have a mean to have write access only on files that needs to be saved, but the one already present on the Archives dataset should all be in read only mode. There should be no way for me to move, rename, or delete the existing files unless I do that from a SSH command line, even then I don't know if this the best strategy.

I'm not sure to understand the added/removed thing as you'll not remove files because it's read only. Also, you want to compare what to what? backup to an user directory? backup to older backup? ...?

This was more of an open question regarding safeguarding my data on the Archives dataset. As I will have snapshots taken on a regular basis on the Archives dataset, I want to be able to monitor whether a file on Archives dataset as been modified, ie newly added to the Archives or simply removed or renamed (either intentional or not).
To resolve this intent, I can use the following command:

Code:

zfs diff Archives@snap1 Archives@snap2

If I find a difference I didn't expect, then I can use the snapshot to recover the files. Also as I will not be accessing the Archives dataset with right access on a regular basis, this give me the ability to validate the files on the Archives have not been tempered with since my last write access. This is mostly to safeguard myself against user errors.

I have in mind several means to accomplish more or less this kind of operation, but I am not sure which one is the most robust. There must be procedure people have put in place for such scenario.

Bidule0hm · May 10, 2015

AFAIK you can't have RO rights and then R/W rights only when you want, unless you change the rights of course :D

I think your constraints are overly complicated: if you have snapshots on this dataset then you don't need RO rights because if you delete a file you can recover it :) Also I have never deleted a file by mistake, let alone a file on a backup, so I think the complexity doesn't outweigh the benefits. I mean, if you treat your backup as RO and do only copy from it (and moreover it should be pretty rare) I don't see how you can delete a file by mistake, but maybe it's not enough (though I'm rather paranoid with my data so I think it should be sufficient).

"I have in mind several means to accomplish more or less this kind of operation, but I am not sure which one is the most robust." Then list them here and ask for advices on which is the best ;)

Apollo · May 10, 2015

Bidule0hm said:
AFAIK you can't have RO rights and then R/W rights only when you want, unless you change the rights of course :D

This is one approach. If the folder is writeable but the existing files are RO, then it is a matter of copying the new files and folders on the Archives dataset.

I think your constraints are overly complicated: if you have snapshots on this dataset then you don't need RO rights because if you delete a file you can recover it :) Also I have never deleted a file by mistake said:
If the file is RO then there is no reason it be deleted. But by default my account as write access to most if not all my datasets.
There are occasions (on my main Desktop), that I have dragged a folder or a file because I clicked on the mouse button when I was not supposed to. The same can hapen with shared mounted network drives.

"I have in mind several means to accomplish more or less this kind of operation, but I am not sure which one is the most robust." Then list them here and ask for advices on which is the best ;)

1:- The one stated above.
2:- Creating a share that will not be advertised on the network and will be used to write to the dataset only. This doesn't prevent delete and moving files and folders.
3:- Creating a user account with write access to the dataset, while my current account will not permit write access to that dataset. Should be accessible within my own Windows session account but logging under the other user name.
4:- Logging over SSH with the account with write access to the dataset and perform the copy command.
5: Use my current user account to access the dataset as any other dataset, but having to always manipulate permissions when needed on the Archives dataset.

These are a few basic procedures in mind. They can and will need to be improved. I just don't know which one can be made the most robust. I am sure some users already have a similar setup. I just don't want to have to recreate the wheel if a reliable implemantation already exists.

Bidule0hm · May 10, 2015

But you've said that the folder needs to be RO (the whole dataset actually)? And it's roughly what I proposed (with a script to ensure the files rights are set to RO), I don't follow

I'd say from best to worse: 1, 3, 4, 2, 5 ;)

Apollo · May 10, 2015

Bidule0hm said:
But you've said that the folder needs to be RO (the whole dataset actually)? And it's roughly what I proposed (with a script to ensure the files rights are set to RO), I don't follow

I'd say from best to worse: 1, 3, 4, 2, 5 ;)

It is getting confusing.
What I meant, and I was just throwing some ideas around. is that I want to have my Archive dataset as RO for everybody except maybe one user account that will be able to write to the dataset in order to archive the data. I don't want it to be able to delete existing files or folders, unless required and for that to happen, then maybe a change in chmod could work.

Robert Trevellyan · May 11, 2015

Why not just have the dataset owned by root/wheel but readable to all? Then when you want to copy to it, use scp or sftp as root. Am I missing something?

Bidule0hm · May 11, 2015

This is the solution 4 :)

Robert Trevellyan · May 11, 2015

Bidule0hm said:
This is the solution 4 :)

Oh, right...

Seems like the simplest solution to me.

Apollo · May 11, 2015

Robert Trevellyan said:
Oh, right...
Seems like the simplest solution to me.

Simplest, but not the most robust either.
I was hoping for some feedback from users that have direct experience with this kind of approach.

Robert Trevellyan · May 12, 2015

Apollo said:
Simplest, but not the most robust either.

In what way?

Apollo · May 12, 2015

Robert Trevellyan said:
In what way?

The root has all the rights.

Robert Trevellyan · May 12, 2015

Well it doesn't have to be root that owns or writes to the dataset, but it does have to be a user with full rights, doesn't it?

Apollo · May 12, 2015

Robert Trevellyan said:
Well it doesn't have to be root that owns or writes to the dataset, but it does have to be a user with full rights, doesn't it?

This is where I am not entirely sure. I don't know if it is wise to give it all the rights all the time, or if it is possible to have only right access but no delete or moving rights, or only on needed occasions.

depasseg · May 12, 2015

What you are looking for is called WORM (Write Once Read Many). Nexenta used to offer it as a plugin for their Enterprise ZFS systems, but discontinued it. https://community.nexenta.com/message/1560#1560 Which leads me to believe that the market was too small to support.

Robert Trevellyan · May 12, 2015

Apollo said:
I don't know if it is wise to give it all the rights all the time, or if it is possible to have only right access but no delete or moving rights, or only on needed occasions.

You would access the dataset with the privileged account only when you need to, i.e. "on needed occasions." I also think the fact that a suitable snapshot schedule gives you a way to recover from mistakes makes this a viable solution.

depasseg said:
What you are looking for is called WORM (Write Once Read Many).

If you're thinking WORM then DVDs are an obvious choice, but there are practical issues. Capacity is limited, they don't last forever (burn 2 of everything to enable recovery with ddrescue) and they don't fit nicely into the FreeNAS environment.

depasseg · May 12, 2015

Similar to DVDs, but for a disk based system. Nexenta offers a ZFS based SAN similar to FreeNAS. They used to have a plugin that did exactly what the OP requested.

Robert Trevellyan · May 12, 2015

depasseg said:
Similar to DVDs, but for a disk based system. Nexenta offers a ZFS based SAN similar to FreeNAS. They used to have a plugin that did exactly what the OP requested.

Yes, I understand, though it's not clear to me that the OP wants true WORM behavior.

Important Announcement for the TrueNAS Community.

Strategy for dealing with archive dataset?

Wizard

Wizard

Server Electronics Sorcerer

Wizard

Server Electronics Sorcerer

Wizard

Server Electronics Sorcerer

Wizard

Pony Wrangler

Server Electronics Sorcerer

Pony Wrangler

Wizard

Pony Wrangler

Wizard

Pony Wrangler

Wizard

FreeNAS Replicant

Pony Wrangler

FreeNAS Replicant

Pony Wrangler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Strategy for dealing with archive dataset?"

Similar threads