Memory Disk or Zpool for archived data?

onlineforums · Jun 14, 2020

I have data that is stored for infrequent archival purposes only. This data grows one time each year as the new year of data gets moved into it. Because this data is primarily for archive reasons, I would prefer it encrypted while at rest while the FreeNAS box is on with other non-archival data that is not encrypted at rest. I simply want to decrypt (attach/mount) the archival data only when needed and then encrypt (unmount/detach) when not in use.

My understanding is that I can host this archival data in a memory disk or Zpool both of which can use GELI to encrypt. The idea is to not attach/decrypt the memory disk/Zpool until the data contained within is needed.

I intend to use replication tasks to move this data to another FreeNAS box, although I'm fairly certain either memory disk or Zpool would work the same as a memory disk would be in a dataset and a potential archival Zpool is a dataset. I could be wrong on this front and maybe this is the difference because I intend to replicate while the GELI is not attached and the data is encrypted.

My understanding above may be incorrect or maybe it is correct and there isn't any pro or con to using a memory disk or Zpool for archived data that needs to be encrypted when not in use.

I basically want to protect data that is rarely used by having it encrypted while at rest and only decrypted once or twice a year when needing something out of it all the while having other data that is constantly being used on the FreeNAS box hence the idea of either memory disk or another Zpool.

Comments?

HoneyBadger · Jun 14, 2020

Have you considered using something like Veracrypt file containers?

winnielinnie · Jun 14, 2020

When you say "memory disk", are you referring to a hard drive or a file (.iso, .img)?

I would second HoneyBadger's suggestion of creating a VeraCrypt container, and accessing it via SMB or NFS, so long as each client PC has VeraCrypt installed. It really depends where and how archives are being delivered from.

However, you might find it more streamlined if you simply encrypted the entire pool (using the FreeNAS GUI), and added a passphrase to the Encryption Key (userkey0). Any time you want it encrypted (inaccessible) at rest, you simply "lock" the pool via the padlock icon in the GUI.

If you're going to use this approach, I highly, highly, highly recommend creating, downloading, and safely storing a Recovery Key (userkey1), just in case you make a mistake during a system update, migration, or forgot the passphrase.

I furthermore highly, highly, highly recommend that you get a large external USB drive that can store the entire contents of such archives, and simply "dump" everything into a VeraCrypt container (or fully encrypted disk) as a minimal backup, considering these are invaluable files that you cannot take a chance with losing permanently.

If you decide to go with this approach just remember YOUR DATA IS LOST FOREVER if you don't have a verified backup, and...

...you forget your passphrase, and don't have a Recovery Key (userkey1)
...something goes wrong during an update that your passphrase no longer works, and don't have a Recovery Key (userkey1)
...any of the above, and you lost (or never created) a Recovery Key (userkey1).

Don't skim over this. It's very important.

onlineforums · Jun 14, 2020

HoneyBadger said:
Have you considered using something like Veracrypt file containers?

I thought about Veracrypt but figured it wasn't the best solution with FreeNAS given that FreeBSD has GELI and I'm trying to keep the FreeNAS box clean of any third party applications (nothing additional installed from ports, pkg manager, etc). See my response to winnielinnie below though as potential solution to this using Windows/MacOS.

winnielinnie said:
When you say "memory disk", are you referring to a hard drive or a file (.iso, .img)?

Yes, creating a "container" file and then using mdconfig to have it as a memory disk located at /dev/md0 and then running geli on it to get /dev/md0.eli and then mounting that whenever I need to access or add to the data in the container file.

winnielinnie said:
I would second HoneyBadger's suggestion of creating a VeraCrypt container, and accessing it via SMB or NFS, so long as each client PC has VeraCrypt installed. It really depends where and how archives are being delivered from.

Since I don't want to install any third party software into FreeNAS is your suggestion creating a dataset, say "archives" and then sharing that via SMB or NFS and then having a Windows/MacOS user go in and create the Veracrypt container and subsequently share that password to whomever may need to access the encrypted contents? Or are you thinking to have FreeBSD create the Veracrypt container? I would think having a Windows/MacOS system create the container is the safest bet to keep the FreeNAS box clean of third party software.

winnielinnie said:
However, you might find it more streamlined if you simply encrypted the entire pool (using the FreeNAS GUI), and added a passphrase to the Encryption Key (userkey0). Any time you want it encrypted (inaccessible) at rest, you simply "lock" the pool via the padlock icon in the GUI.

This was precisely the root of my question about creating an ADDITIONAL zpool. Right now I have a zpool that the entire pool is encrypted using the FreeNAS GUI. The problem is, that for the data to be useful it has to be decrypted so the system, while on, is always decrypted at rest. For the archived data, where access may only be needed three or four times a year, I would rather this archival data to be encrypted at rest and only decrypted when needed. This is why I started the thread about memory disk vs zpool and best practices.

HoneyBadger said:
....... YOUR DATA IS LOST FOREVER if you don't have a verified backup ......

Thank you for the caution. My understanding is to backup the recovery key (userkey1) as well as passphrase OFFLINE and just as important TEST IT on a backup of the zpool, file (memory disk) or even Veracrypt I believe has the ability to export a recovery key.

garm · Jun 14, 2020

onlineforums said:
Yes, creating a "container" file and then using mdconfig to have it as a memory disk located at /dev/md0 and then running geli on it to get /dev/md0.eli and then mounting that whenever I need to access or add to the data in the container file.

That’s not how it works

onlineforums said:
Since I don't want to install any third party software into FreeNAS is your suggestion creating a dataset, say "archives" and then sharing that via SMB or NFS and then having a Windows/MacOS user go in and create the Veracrypt container and subsequently share that password to whomever may need to access the encrypted contents?

Yes

onlineforums said:
This is why I started the thread about memory disk vs zpool and best practices.

Depending on your delta each year, take a look at other media. I wouldn’t trust a non redundant pool that is spooled up a few times a year to store my long term data. If we are talking hundreds of GB then MDisks are probably the best long term solution. Drives really need to be spinning and regularly scrubbed to be reliable. And tape is just to expensive.. if cloud or a second live pool isn’t an option, and the data delta is manageable with 100 GB disks, I would go that route for cold storage.

onlineforums said:
export a recovery key

It’s way to dangerous to encrypt cold storage, if you die and your relatives are to be able to access the content then you need to leave such thorough instructions that the encryption is pointless any way.. it’s much better to lock the media in a fire proof box..

https://www.amazon.com/SentrySafe-1170-Fireproof-Lock-Cubic/dp/B004QWZIMI

onlineforums · Jun 14, 2020

garm said:
That’s not how it works

Can you help me understand (or link to good article/description) of how it works?

garm said:
Depending on your delta each year, take a look at other media. I wouldn’t trust a non redundant pool that is spooled up a few times a year to store my long term data. If we are talking hundreds of GB then MDisks are probably the best long term solution. Drives really need to be spinning and regularly scrubbed to be reliable. And tape is just to expensive.. if cloud or a second live pool isn’t an option, and the data delta is manageable with 100 GB disks, I would go that route for cold storage.

I'm not understanding what you mean. I have hundreds of GB of data (and increases about 100GB per year) that I would like to keep encrypted at rest. It doesn't need to be, but I believe security-wise it would be better to encrypt data that isn't being used. Wouldn't scrubbing take place, just as the normal scrub task, on the MDisks, or Zvols or Veracrypt single file container, etc and be just as good as having the files raw unencrypted on the drive being scrubbed?

garm said:
It’s way to dangerous to encrypt cold storage, if you die and your relatives are to be able to access the content then you need to leave such thorough instructions that the encryption is pointless any way.. it’s much better to lock the media in a fire proof box..

My plan is having the encryption key and passphrase with an I.T. professional that knows how to properly store these credentials and the instructions using encryption. Sure, it adds another layer of complexity (ie: what happens if this I.T. professional loses his key to his encryption container to get my instructions and key/passphrase, etc) but I think that is a better risk taken then simply having cold storage type data sitting unencrypted at rest incase the FreeNAS box gets compromised.

I'm open to suggestions.
I basically have yearly data that I want encrypted at rest. The projects that get completed for the year, say 2018, get tossed into the archived container that contains 2015, 2016, 2017 folders (and now 2018 folder). Let's say that it is 100GB per year. So in 10 years it will be 1TB worth of data. This data will only be accessed three to four times a year (to grab data from the 2016 folder for example) or to add a year (such as 2019's).... the concern I have is that the FreeNAS box will be online 24/7/365. My thoughts are to encrypt this archived data that isn't used frequently in a container (Memory Disk, Zpool, Zvol) and then only decrypt to access or add contents a few times a year. I'm trying to prevent a situation where a compromised FreeNAS box releases all raw and unencrypted data spanning many years rather than just the current year, thereby reducing whatever data a hacker gets to only the current non-archived years. I'm not saying this is the appropriate solution and open to suggestions.

winnielinnie · Jun 14, 2020

onlineforums said:
I basically have yearly data that I want encrypted at rest. The projects that get completed for the year (...snip...) thereby reducing whatever data a hacker gets to only the current non-archived years. I'm not saying this is the appropriate solution and open to suggestions.

My fear is that over-complexity might cause an unforseen issue, whether in the near future or much later down the road, but you seem to understand the risks and are comfortable taking on the responsibility.

Like garm posited above, the pool should have redundancy (even a simple mirror vdev of two disks, say 4TB each to compensate for many years to come). You could consciously give this pool a sole purpose with a single dataset underneath (archives only, nothing else, infrequent access), and add a passphrase so that it can be "locked" manually, i.e, inaccessible even while the FreeNAS system is live and running; while not affecting your other pools and data. Any time you wish to dump a yearly archive inside the pool, you would first "unlock" it, save the yearly archive (e.g., 2021), then "lock" it again.

Since speed isn't much of an issue, you could purchase a couple decent-sized USB drives, encrypt them with VeraCrypt with a simpler filesystem underneath (exFAT, ext4, NTFS, etc) and simply dump the same archives into them. You can use a passphrase, keyfile, or both, and give them to trusted people, so that they can access the archives in the USB drive if anything were to ever happen to your pools.

If you decide to create an unencrypted pool (with redundancy, of course!) and slap a VeraCrypt container within that is to be accessed via SMB or NFS, please consider the precautions from one of the developers of using such an approach:

Mounir said:
If different users are accessing the same container file, there can be a data corruption if they write data to it. So, either make it read-only or have only a unique user accessing the container at a time.

(...snip...)

If the connection to the NAS is lost, the handles opened on the remote file become invalid and thus we can't continue using it. Even if the connection comes back "quickly", we can't reconnect manually as everything is mapped using the old file handles.

(...snip...)

Moreover, if data were being written to the VeraCrypt volume when the connection was lost, there is a risk of data corruption because there is no guarantee that the data written by VeraCrypt arrived to the server before the connection was cut.

There are so many risks when sharing encrypted container over the network in Read/Write mode

garm said:
Drives really need to be spinning and regularly scrubbed to be reliable.

I'm far from an expert on longterm storage and hard drive reliability, but couldn't he manually invoke scrubs after unlocking the pool, and then after a successful scrub, lock the pool again? I have an old PATA drive (I believe from the year ~2000 or ~2001) that I've kept "alive" by attaching it via a PATA-to-USB adapter, running a non-destructive read-write badblocks test, and then running an "extended" SMART test every couple years or so. So far, to this day, the data is still accessible and no errors have been logged. (Disclaimer: There is nothing valuable on this drive, as I've only been keeping it "healthy" as a personal experiment for my own entertainment.)

So from what I understand, as long as you force every sector to be re-written and let to drive spin for a few days, it has a significantly better chance of survival. This obviously does not replace the principles of accessible and verified backups.

In onlineforums' case, the drives in the archive pool will always be spinning as long as the FreeNAS box is powered on, and there is nothing to prevent him from running manual scrubs on the pool.

With that said, nothing supersedes the importance of the principles of backups, verification, accessibility, and trust. I'm personally not a fan of how native encryption is implemented in the FreeNAS GUI (especially the inconsistent terminology used in older versions that has caused confusion in myself and others), as well as the fragility of permanently losing all of your data from a small mistake in the order of what steps are needed when importing, upgrading, resilvering, etc. I've already ranted about this before, comparing it to LUKS on Linux, which I find to be more sensible and intuitive in keyfile/passphrase management for an enrypted container or block device.

This might change in the future, as TrueNAS 12+ with OpenZFS 2.0 may potentially allow per-dataset encryption, rather than be limited to encryption of the entire pool. But I'm not sure, and I'm waiting for someone with more insight and knowledge to chime in.

onlineforums · Jun 14, 2020

Just doing some experiments tonight with a file based container using GELI for encryption using memory disk method.

Steps taken were:
1. Create a 1G "archive" file using 'dd'
2. Use mdconfig with the archive file to associate it with /dev/md0
3. Create a random key using dd
4. Initialize /dev/md0 with the key above using 'geli init'
5. Attach geli with the key to /dev/md0 using 'geli attach'
6. Create new file system on /dev/md0.eli using 'newfs'
7. Mount /dev/md0.eli to 'archive-share' folder
8. Setup SMB share for 'archive-share'
9. Put a bunch of random files in the folder through SMB (to test that aspect).
10. Created a snapshot of the location of the 1G archive folder as well as for testing purposes to see what happens the 'archive-share' folder.

The 'archive-share' (not a dataset) snapshot did NOT get a snapshot of the unencrypted random files I put in the 'archive-share' folder (as expected and as I would have hoped)

The 'archive' (dataset holding the 1G archive file) snapshot did indeed get a snapshot of the 'archive' file.

I then did the following:
11. Delete a bunch of the files I put in the 'archive-share' folder
12. Unmounted the 'archive-share'
13. Detached geli /dev/md0.eli
14. Ran a snapshot of both the 'archive-share' and 'archive' folder

Of course, the 'archive-share' folder was blank, as expected again. The 'archive' folder containing the 1G 'archive' file was indeed changed, atleast the sha1 was (the filesize was still 1G as that is static).

I then did two experiments if you are still following along:
First experiment I went into the hidden .zfs folder to get the original snapshot before item 11 above where I deleted a bunch of files. I used 'cp' to copy the 'archive' 1G file to the archive folder and overwrite the 'archive' file. sha1 calculation matched the original archive file before deleting items (item 11 above). I subsequently used geli attach, then mount to 'archive-folder' and all of the original files were there. Yay! Then did item 11 above again, where I deleted some files from the 'archive-folder', unmounted, and geli detach.

Second experiment now is that the sha1 calculation doesn't match the original snapshot before the deletions. I then did a rollback of the exact same snapshot I used to 'cp' the 'archive' file above (which has the original sha1 calculation I'm looking for.) The snapshot rollback is successful, I run sha1 calculation on 1G 'archive' file and the sha1 calculation MATCHES as expected. I then geli attach, works fine, BUT WHEN I GO TO MOUNT /dev/md0.eli to 'archive-folder' it gives an error: "mount: /dev/md0.eli: R/W mount of /mnt/tank/testing/archive-folder denied. Filesystem is not clean - run fsck.: Operation not permitted"

So this is a long post, and if you got this far, congrats and hopefully this helps someone or there can be some comments/knowledge shared on why this would be. The second experiment mentioned above had the exact same sha1 hash as the 'cp' method and as the 'archive' in the actual .zfs/snapshot/<name> so it would have to be a duplicate unless there is somehow some permissions bits that can be different while having the same sha1 hash calculation. I don't know.

garm · Jun 14, 2020

I still don’t understand why you would put the container in memory.. you said you had hundreds of GB each year to archive, do you have hundreds of GB of RAM?
Why not just use tar and gpg to store files in your existing pool? If you really really want a block device you can use geli on top of a zvol, but why mix in md?

Patrick M. Hausen · Jun 14, 2020

A file backed vnode device is not in memory - apart from the buffer cache of course. Probably the OP got confused by the name mdconfig. There used to be a separate vnconfig years ago but the two commands for memory disks and file backed block devices were merged, eventually.

@onlineforums why not use a zvol?

onlineforums · Jun 15, 2020

Patrick M. Hausen said:
A file backed vnode device is not in memory - apart from the buffer cache of course. Probably the OP got confused by the name mdconfig. There used to be a separate vnconfig years ago but the two commands for memory disks and file backed block devices were merged, eventually.

@onlineforums why not use a zvol?

Correct, a bit confused about mdconfig. zvol would make more sense; I agree.

So the question becomes why use a zvol with geli over a dataset (or zvol) with a single veracrypt container inside? Actually, now thinkinga bout it, I'm not sure you can do separate zvol with geli.

Patrick M. Hausen · Jun 15, 2020

I don't know Veracrypt and the answer is as always: that depends on what you feel most comfortable with.

For archival purposes I personally prefer to have "images" of some sort and store them in a file system, i.e. your mdconfig approach makes sense to me. Advantage: you can copy these images to removable media with a FAT32 filesystem. Why FAT32? Because if years from now somebody has got a look at the contents of that drive/stick/whatever he/she will see that there is something possibly valuable on there, even if that person does not have the key/passphrase to unlock the image. By storing ZFS directly on removable media a "naive" Windows or Mac user might come to the conclusion that there is nothing on that drive and format it.
For archival I prefer lowest common denominator, i.e. FAT32 as the lowest layer.

HTH,
Patrick

onlineforums · Jun 15, 2020

Patrick M. Hausen said:
For archival purposes I personally prefer to have "images" of some sort and store them in a file system, i.e. your mdconfig approach makes sense to me. Advantage: you can copy these images to removable media with a FAT32 filesystem.

I'll have to read up a bit more on mdconfig. I understand the basic high level (single file basically mounted as /dev/mdX) but don't know the nuances in terms of risks, RAM usage, buffering, multiple people using it at the same time when decrypted, how well it works being scrubbed while "locked" and exactly how snapshots and snapshot replications work with them (you can see a previous post from last night where I was playing around with snapshots and after doing a rollback I could no longer mount he /dev/md0 due to R/W issues).

Some people recommended Veracrypt which to me seems extremely similar, just not native to FreeBSD and I'm sure has its own slew of nuances.

Important Announcement for the TrueNAS Community.

Memory Disk or Zpool for archived data?

onlineforums

Explorer

HoneyBadger

actually does care

winnielinnie

MVP

onlineforums

Explorer

garm

Wizard

onlineforums

Explorer

winnielinnie

MVP

onlineforums

Explorer

garm

Wizard

Patrick M. Hausen

Hall of Famer

onlineforums

Explorer

Patrick M. Hausen

Hall of Famer

onlineforums

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

Memory Disk or Zpool for archived data?

Explorer

actually does care

MVP

Explorer

Wizard

Explorer

MVP

Explorer

Wizard

Hall of Famer

Explorer

Hall of Famer

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Memory Disk or Zpool for archived data?"

Similar threads