The backup throw down

Status
Not open for further replies.

DaPlumber

Patron
Joined
May 21, 2014
Messages
246
So I was having a discussion with a colleague of mine the other day, extolling the benefits of FreeNAS as a home/prosumer NAS solution. * Her comment though caused me to stop dead in my tracks and think: "It sounds like a good NAS, especially with ZFS, but it still doesn't solve the backup problem on the non-enterprise scale."

Allow me to elucidate: Backups are a "good thing". "Duh" I hear you say, and a quick search of "backup" on these forums will confirm that. But consider: What makes for a cost effective backup solution for a modern user? At the Enterprise scale it's relatively easy, if non-trivial, to design a backup system that appearing expensive at first blush when amortized over a large number of systems and a large amount of protected storage is actually quite inexpensive. These days at the back end for the final cold copy you are still likely to find good old tape. Tape when considered on a $/Byte basis is ridiculously cheap, especially when ordering them by the pallet-load. The cost of the SAN/Network transport to get the data to the transport devices and the tape drives themselves are expensive up-front costs (don't forget maintenance!), but depreciated over 3-5 years and at Enterprise discount levels still much cheaper than even cheap disk based solutions.

None of this applies down here at the thin end of the wedge where we're using OS Free software with Enterprise NAS functionality (and then some) on "high end consumer" hardware at home.

So, let's set up a scenario: Let's presume a current common home setup with a system with 4X4TB drives (WD Red, ST, etc.) 16GB of ECC-RAM, and GbE connected. The backup needs to be portable and able to be easily detached since backups sitting next door to the Primary are disaster waiting to happen. The backup needs to complete "in a few hours" (let's limit to 2 for incrementals, 12 for initial). Most importantly the cost of the backup can't exceed more than ~40% of the cost of the NAS in the first place.

The Enterprise Architect in me wants an LTO drive, but those cost 400% the cost of the NAS, never mind the interface and the cost of a few tapes and what software to back up to the drive with?:eek:

The best answer that I could come up with is 4X the non-NAS version of the drives in an eSATA box with a port multiplier interface. Duplicate using "zfs send | zfs recv", disconnect, unplug and keep somewhere else (work? Neighbor/friend/family's house?) Ideally I'd like two sets, one to be backed up and a set for rotation that's always "off-site", and now we've just doubled the cost of the NAS, we might as well have built a mirror NAS and a VPN to the other site...

So here's the throwdown: There has to be a better, more cost effective way of doing backups of modern FreeNAS home systems, but darned if I can figure out what it is. :oops:

Is anyone up for the challenge? :p

------
* Full disclosure, we're both IT architects with many years experience designing large scale UNIX infrastructure, including NAS, databases, IaaS, SaaS, backup, you name it. Bottom line: We know our **** but not on the home scale.
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
My $dayjob is similar to yours, btw.. )

I'm sure people are doing $cloud_backup_provider, but I have enough data I can't afford to get it back out of the cloud if I have a disk failure.

Personally, the best I can do on my home budget is do snapshots on my main NAS and rsync the really important stuff to another freenas on the other side of my basement.. That way if the dishwasher falls through the floor, it won't get both NASs.. :P

I suppose if I ever get to the point I'm spending $2k on a backup NAS, I'll look at a small LTO5 library from ebay.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ok, for starters, if it's not good enough for a reliable FreeNAS box, it's not good enough for your backups. PERIOD. You'd be crazy to consider port multipliers in a FreeNAS box, so why you'd consider trusting them with your backups is totally beyond my understanding.

Yes, if you have to do a 20 disk mirrored pool to get the I/O you need, your backups are still fine on a RAIDZ2, so you can save some money there.

Other then that, you're basically right. There is no 'cheap' backup. Virtually every backup solution is either too slow to every be used (think DVD/Blu-Ray disk backups), outrageously expensive (think up-front costs for LTO drive), or are inexpensive in the big picture, but are only expensive to the home user. The backups typically can't be significantly cheaper than your main server unless you decide not to backup all of your data. But if watching this forum is any indicator of reality, let me put a few things in light:

1. Plenty of people have big servers because they are hoarding pirates. It cost you nothing to download it, so why would you then pay to back it up? If you lost that movie you ripped off and you want to get upset, there's a solution. Buy the F'in movie/song/etc. I know, I'll probably be castrated for saying it, but that's a fundamental truth. Which leads me into the next part...
2. Some people(like myself) have tubs of movies on blu-ray and DVD in their basement. If I had to rerip them all I'd probably cry. It has taken me about 10 years or so to rip all of the stuff I have, and I can't even fathom the idea of having to go through it again. I also don't have backups, but I'm doing RAIDZ3 and I took very conservative approaches to everything including snapshots so I'm not particularly worried about the potential loss of my movie collection. I still technically have them on blu-ray or DVD and I'd probably slowly rerip some and ignore the rest since many of them I don't watch frequently anyway.
3. Sorry, but your pr0n collection is not so important that it needs to be backed up in 3 places. Sorry, but it's just not that important.
4. Most people are absolutely terrible at organizing their data! (I'm one of those poor souls too). If you actually kept your crap sorted you'd probably find you don't need near the quantity of data that you think you do.


Noticing a trend yet? To put it succinctly people aren't willing to prune their data usage, but cry about the cost. The reality of it is that the quantity of data you value enough to demand backups is directly proportional to the cost you will incur to backup said data. If you don't like the cost stop and ask yourself if stuff is really that important. I'd bet most of us could delete 80-95% of our data and we'd go on with our lives.

Most people do NOT need all of the data that they keep. If you simply say "4TB is the largest single drive on the market, I refuse to keep more than 4TB of personal data" you suddenly have a very simple requirement for backups. They must fit on a single disk. All of your thoughts are immediately void because you've got a very solid limit to how much you are going to spend for the backup and how much data you actually need to store. If you simply buy a disk and say "it all must fit on this" you'll suddenly reevaluate how important your data is. The most important stuff will go on the disk, the rest will be ignored.

What sucks is stuff like I had to deal with last summer... Some guy contact me hysterical because he lost his family pictures. Wedding pictures, pictures of his 2 children, etc. Is this a big deal? Well, life would go on, except it didn't. His 2 kids had died in a house fire and those digital pictures he retrieved from his server were all he had left. But he didn't do RAIDZ2 so when one disk failed his pool paniced. So after retrieving his data from his old server in a fire, moving to FreeNAS and then losing his data months later he couldn't believe this was happening. He never did get his pictures back, but both him and his wife did some serious crying on Skype when I gave them the bad news. When I asked about backups he said that since the server had cost him $XXXX dollars he couldn't afford backups. I asked him why he didn't keep a single disk with your absolutely most important data on it. Most of us do NOT have more than 4TB of data that would be world-shattering for us if we lost it. He thought about it for a second and realized if he had spent just $200 on a 4TB drive he would still have the pictures of his long lost kids. I'm not sure if I did him a favor or not by pointing out this very simple reality.

So when people complain about the cost of backups I always remind them its not the backups that are expensive, its your inability to separate what you need from what you want... and THAT is the reality of backups.
 

DaPlumber

Patron
Joined
May 21, 2014
Messages
246
Very reality-based assessment there cyberjock, and I agree with almost everything you said. My pr0n collection is very precious to me... I kid, I kid! :D

Sorry to hear about the kid-photo-hard-case too.:(

@c32767a: I have a native distrust of Public Cloud Services, but very vital information I keep in an encrypted image there.

You're correct that what "we in the biz" call ILM (Information Lifecycle Management) is a manual and tedious chore in the home environment. Oddly some of the best I've seen ARE digital pack-rats. Ironic. I'm in the same boat as the tubs of DVDs and BD's, except that mine was a few years ago switch to iTunes (and the odd torrent of things that aren't available on iTunes). Like you having to re-rip the problem is that I'm unlikely to get maximum download speed if I had to get all of my collection again and probably blow out whatever cap Comcast is denying it's using this month. The BT's are probably no longer available because they're obscure. My primary use for my current FreeNAS box, though is backups (Time Machine in my case, Mac household). Since TM actually let's one backup to multiple destinations these days like a civilized backup tool, I have multiple copies there too. I also have the usual suspects: photos, documents, etc and there are multiple copies of those around the place. In case of house fire my Macbook is practically welded to me (yes I do sleep with it next to the bed, I'm a mobile professional at home too! ), and it has a $$$ 750GB SSD for a reason. So I'm pretty happy that I'm personally covered. However doing this kind on ILM at the consumer level is no "easy button" as you pointed out. I was hoping for something better at the prosumer end of the market than I was unaware of.

Maybe this is the next great piece of OSS tool? Something like a "Super" rsync that keeps track of where/when/how many copies of data files you collectively have. Something like an aggressively proactive crashplan/owncloud? Talking of which I need to spend some quality time with the owncloud plugin...

Bottom line: Just like the RAID-Z1/5 problem, this "backup gap" is only going to widen as more of everyone's life is digital. Public Cloud isn't the answer as the data is growing exponentially and Internet connections are growing (at best!) linearly. DARPA-level challenge anyone?
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
Agreed on the cloud storage.. I sometimes wonder why nobody seems to remember the adage "possession is 9/10s of the law".

I'm still waiting for a reliable HSM that doesn't come with a per-TB price tag.

On the flip side, there's a lot of object stores of varying levels of capability, some are even free.. Most of the media you are talking about could technically be considered objects. The problem is today's playback interface (plex, VLC, quicktime, whatever) expects a file interface.

Maybe we need a kickstarter to adapt Plex to use Hadoop as a media store?
 

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
Hi

Is there an advantage of using a 1-drive ZFS pool for backups comparing to a "simple" drive (ntfs or something else). I suppose it can be scrubbed and thus it's possible to find errors. I can do file checksumming in a simple drive and then I know which file is corrupt. I fear that a ZFS 1 drive volume is lost when it finds an error or am I mistaken?

I would use multiple copies for drives aka 3 drive with the same info on.

BTW. It's easy when the backup can on 1 drive or maybe 2 for different needs.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You get all the advantages of ZFS without any redundancy. As for "simple" drives the only good long-term option is UFS. As of of the next release of FreeNAS UFS will be removed so there will be no other options.
 

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
You get all the advantages of ZFS without any redundancy. As for "simple" drives the only good long-term option is UFS. As of of the next release of FreeNAS UFS will be removed so there will be no other options.

Considering :

"When a VDev can no longer provide 100% of
its data using checksums or mirrors, the VDev
will fail.
 If any VDev in a zpool is failed, you will lose
the entire zpool with no chance of partial
recovery. (Read this again so it sinks in)"

--> is an error occurs on a 1-drive zpool, the complete zpool (drive) is lost.

I was thinking for the comparision of using a windows machine to do the backups to a local esata ntfs drive, not from freenas.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
It could be. If the error is on metadata and the pool cannot function then yes.

NTFS can have the same fault. If you trash the NTFS file system data you end up with some broken volume with data missing. Or worse, you click the drive letter and it asks if you want to format the disk!
 

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
It could be. If the error is on metadata and the pool cannot function then yes.

NTFS can have the same fault. If you trash the NTFS file system data you end up with some broken volume with data missing. Or worse, you click the drive letter and it asks if you want to format the disk!

Am I right reading that a 1-drive zpool will survive a scrub if only "file data" is lost (just like ntfs)?
Is it possible to know which file?
 

DaPlumber

Patron
Joined
May 21, 2014
Messages
246
Hi

Is there an advantage of using a 1-drive ZFS pool for backups comparing to a "simple" drive (ntfs or something else). I suppose it can be scrubbed and thus it's possible to find errors. I can do file checksumming in a simple drive and then I know which file is corrupt. I fear that a ZFS 1 drive volume is lost when it finds an error or am I mistaken?

I would use multiple copies for drives aka 3 drive with the same info on.

BTW. It's easy when the backup can on 1 drive or maybe 2 for different needs.

Actually, yes there are benefits. If we ignore catastrophes that could take out the entire disk* the issue with checksums/RAID/Mirroring is always: "OK something went wrong and I have two chunks of data (blocks, files, whatever) that are different. Which (if any) is the correct copy? Because ZFS generates the checksum at the time of write it mathematically "knows" which is the good copy or how to rebuild it.* BTW FreeNAS doesn't seem to have an interest since it's not what it does, but newer ZFS versions support multiple copies on the same device. Effectively it's mirroring on the same device. When running this under Solaris on a laptop (don't ask**) it "caught" bad blocks that had experienced "bit rot" and corrected them in spite of only having the single disk. With only the regular single copy, single disk you could tell for definite which file(s) had been corrupted and needed to be restored from backup. Of course WHEN you find out about the corruption depends on when you next access that data or do a scrub. Short answer: if it's a disk error only the file affected is lost and ZFS will tell you what it is, unaffected files are unaffected but should, as a best practice, be evacuated from the degraded pool ASAP. Statistically speaking disk errors come not single spies but in battalions as the advance guard of total failure. Do you feel lucky?

On your FreeNAS box no redundancy (1 disk or many) won't protect your data, but you WILL know something has gone wrong (no silent corruption) and you will know what data is bad. If that fits your usage profile (for backups or whatever) then good for you. Personally for backup pools I prefer at least a mirrored pair. For the price of a high-end drive I'd rather get a pair of cheap ones and mirror.

--------

* Down Cyberdog! :D I'm including non ECC RAM failures here in the same kind of "total failure" odds bracket. Use ECC with ZFS kids, just do it.
** ECC laptops aren't complete Unicorns, but those of us wanting our ZFS to-go are usually left with just having to live with the best quality, burned in, RAM we can find.
 

DaPlumber

Patron
Joined
May 21, 2014
Messages
246
Am I right reading that a 1-drive zpool will survive a scrub if only "file data" is lost (just like ntfs)?
Is it possible to know which file?


Yes and Yes. The odds of the 1 disk zpool surviving a single hit are better than NTFS because zpool metadata is more spread out and somewhat more redundant. A "zpool status -v" will tell you exactly which files (or nodes if the file's been deleted) are bad.

If you're the experimental type get a thumb drive or two, format as a zpool, and then use something like "dd" to put block writes into various places on the device to simulate data corruption, then do a scrub or a read and see the results. By and large (exception, non ECC RAM) ZFS zpools do not die quietly, they scream loudly in the logs and status messages about where the pain is... :eek:
 

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
Actually, yes there are benefits. If we ignore catastrophes that could take out the entire disk* the issue with checksums/RAID/Mirroring is always: "OK something went wrong and I have two chunks of data (blocks, files, whatever) that are different. Which (if any) is the correct copy? Because ZFS generates the checksum at the time of write it mathematically "knows" which is the good copy or how to rebuild it.* BTW FreeNAS doesn't seem to have an interest since it's not what it does, but newer ZFS versions support multiple copies on the same device. Effectively it's mirroring on the same device. When running this under Solaris on a laptop (don't ask**) it "caught" bad blocks that had experienced "bit rot" and corrected them in spite of only having the single disk. With only the regular single copy, single disk you could tell for definite which file(s) had been corrupted and needed to be restored from backup. Of course WHEN you find out about the corruption depends on when you next access that data or do a scrub. Short answer: if it's a disk error only the file affected is lost and ZFS will tell you what it is, unaffected files are unaffected but should, as a best practice, be evacuated from the degraded pool ASAP. Statistically speaking disk errors come not single spies but in battalions as the advance guard of total failure. Do you feel lucky?

On your FreeNAS box no redundancy (1 disk or many) won't protect your data, but you WILL know something has gone wrong (no silent corruption) and you will know what data is bad. If that fits your usage profile (for backups or whatever) then good for you. Personally for backup pools I prefer at least a mirrored pair. For the price of a high-end drive I'd rather get a pair of cheap ones and mirror.

--------

* Down Cyberdog! :D I'm including non ECC RAM failures here in the same kind of "total failure" odds bracket. Use ECC with ZFS kids, just do it.
** ECC laptops aren't complete Unicorns, but those of us wanting our ZFS to-go are usually left with just having to live with the best quality, burned in, RAM we can find.

Thanks

For extra backups it a trade off : have 2 external sets with mirroring or 4 without mirroring ;-)

Having the possibility a scrub on "cold" storage from time to time is very useful. Read take a disk, mount it and scrub it.

I use hashdeep for generating checksums and checking them on windows ntfs, but it takes time and thoughtful effort.


Alain
 

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
Yes and Yes. The odds of the 1 disk zpool surviving a single hit are better than NTFS because zpool metadata is more spread out and somewhat more redundant. A "zpool status -v" will tell you exactly which files (or nodes if the file's been deleted) are bad.

If you're the experimental type get a thumb drive or two, format as a zpool, and then use something like "dd" to put block writes into various places on the device to simulate data corruption, then do a scrub or a read and see the results. By and large (exception, non ECC RAM) ZFS zpools do not die quietly, they scream loudly in the logs and status messages about where the pain is... :eek:

Thanks

BTW. I use ECC ram for my freenas box.
 

DaPlumber

Patron
Joined
May 21, 2014
Messages
246
Thanks

For extra backups it a trade off : have 2 external sets with mirroring or 4 without mirroring ;-)

Having the possibility a scrub on "cold" storage from time to time is very useful. Read take a disk, mount it and scrub it.

I use hashdeep for generating checksums and checking them on windows ntfs, but it takes time and thoughtful effort.


Eh, bear in mind if you're putting together a zpool for backup use only that backups have a very different performance profile to "regular" NAS duties. Aggressive sleep and spin down, slightly mis-matched disks, non-"NAS" disks can all start to come into consideration again. PROVIDING your backup routine allows it.

Fun fact: a scrub is essentially just an end-to-end read of every used block on the disks. (and a bit of note-taking and retry when things don't work.) It uses the exact same mechanism as a regular read to check for errors.
 

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
Eh, bear in mind if you're putting together a zpool for backup use only that backups have a very different performance profile to "regular" NAS duties. Aggressive sleep and spin down, slightly mis-matched disks, non-"NAS" disks can all start to come into consideration again. PROVIDING your backup routine allows it.

Fun fact: a scrub is essentially just an end-to-end read of every used block on the disks. (and a bit of note-taking and retry when things don't work.) It uses the exact same mechanism as a regular read to check for errors.

I know both ;-)

I also know that disks need to be read end to end from time to time, even if it's just static data on cold storage. It seems that disks can recover from slight problems (ECC on the disk) which tend to get worse over time. (If the disks are stored in "good" storage, otherwise its probably not the case.)
 

DaPlumber

Patron
Joined
May 21, 2014
Messages
246
If people only knew what goes on with ECC down at the disk head/media level. :eek: The noise and error rates are horrendous, but ECC plucks the data from it all! That's literally what's meant by a disk block "going bad" the error rate is starting to creep up toward a point where the ECC can't handle it. :eek: Yet for all that, the modern hard disk is an amazing piece of precision engineering. Like a bear on a unicycle one is amazed that it works at all, not that it occasionally fails. :D Thermal cycling (even small amounts), changes in pressure, yada, yada, all affect the error rate. I've seen disks do crazy things...:rolleyes:
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
If people only knew what goes on with ECC down at the disk head/media level. :eek: The noise and error rates are horrendous, but ECC plucks the data from it all! That's literally what's meant by a disk block "going bad" the error rate is starting to creep up toward a point where the ECC can't handle it. :eek: Yet for all that, the modern hard disk is an amazing piece of precision engineering. Like a bear on a unicycle one is amazed that it works at all, not that it occasionally fails. :D Thermal cycling (even small amounts), changes in pressure, yada, yada, all affect the error rate. I've seen disks do crazy things...:rolleyes:

I dunno, the bear has a tiny amount of angular momentum that's conserved helping him stay upright. If there was a scale for such things, HDDs would be closer to the top than bears riding unicycles. Mass-produced unicycle-riding bears on the other hand...
 

DaPlumber

Patron
Joined
May 21, 2014
Messages
246
I dunno, the bear has a tiny amount of angular momentum that's conserved helping him stay upright. If there was a scale for such things, HDDs would be closer to the top than bears riding unicycles. Mass-produced unicycle-riding bears on the other hand...

Like my Mech. Eng. prof. used to say a long time ago and in a galaxy far, far away: "You're confusing accuracy with precision. Again." :D
 

grep137

Dabbler
Joined
Mar 21, 2014
Messages
36
I'm sorry, I know that this may be a little off topic (I wasn't sure if this is worth its own thread), but in case people aren't aware of it, I just wanted to throw it out there as an option for backing up small amounts of really important stuff.

This convinced me to invest in a recorder (I think you can get one for around $60) and some mDiscs:
Millenniata commissioned the Navy to stress-test its mDisc technology. The Navy complied. The US Department of Defense Naval Air Warfare Center Weapons Division (NAWCWD) in China Lake, CA put the mDisc up against six leading archival DVD makers in three series of demanding stress tests. ... According to the NAVAIR report, of all discs tested, only the mDiscs survived.

http://www.networkcomputing.com/storage/mdisc-review-a-thousand-years-of-storage/d/d-id/1099593?

They now make a Blu-Ray version now too.

I wonder if it would be possible to make something similar to a zpool across several of these in case one of the discs gets damaged?
 
Status
Not open for further replies.
Top