Can I trust my (zfs replicated) backups?

Status
Not open for further replies.

saurav

Contributor
Joined
Jul 29, 2012
Messages
139
I have snapshot replication set up from FreeNAS mini to HP N36L. It has been going on _apparently_ fine since last couple of months. But since I'm an idiot, it took me that long to realise that I didn't do any burn-in or memory testing since I upgraded the N36L with additional 4GB RAM and rebuild the box with zfs setup.

So I'm doing it now. I started yesterday and two passes of memtest 5.1 are complete with zero errors so far. The third pass should be going on right now.

What I want to know is, can I trust my backups from the time I hadn't done any testing? Or should I just re-create those backups?

The reason I'm asking is, if I *can* trust those backups, I will just use jgreco's script to do the disk burn-in which doesn't require me to destroy my pool.
 

Attachments

  • IMG_20141111_065259.jpg
    IMG_20141111_065259.jpg
    207.8 KB · Views: 247

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Not absolutely, but you can't dismiss them as useless either.

The problem with backups is that they're only useful if they work on demand, even if they hadn't been used in years. This means regular testing and eliminating as many sources of issues as possible. Hence, ECC.
 

saurav

Contributor
Joined
Jul 29, 2012
Messages
139
The box has ECC which probably does its job though there is no way to verify, this being an AMD box. I'm saying that "it does its job" based on HP's literature (e.g. read a post a few days back on how HP stresses on ECC in their Microserver literature). The Passmark's version of memtest 5.1 seems to have ECC-related features, but it seems to be supported on limited hardware. I don't think mine would qualify. In any case, Passmark's memtest5.1 requires UEFI and this box doesn't have it.

But all that notwithstanding, can the backups from before the testing be trusted? They were made with the same ECC RAM, but without having run memtest and without any disk burn-in. The more I think about it, the more it seems likely that the answer is "yes": if the RAM is good now, it was good earlier too, so the backups are good. The disks _may_ not be good (which I will figure out with jgreco's script), but if something goes wrong with them after the backups were made, ZFS would alert me during scrubs and fix those errors also if possible.

Btw, with zfs, scrubs are basically "regular testing", right? Scrubs have not given me any errors (yet). I also have weekly SMART tests scheduled, without errors so far.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The box has ECC which probably does its job though there is no way to verify, this being an AMD box. I'm saying that "it does its job" based on HP's literature (e.g. read a post a few days back on how HP stresses on ECC in their Microserver literature). The Passmark's version of memtest 5.1 seems to have ECC-related features, but it seems to be supported on limited hardware. I don't think mine would qualify. In any case, Passmark's memtest5.1 requires UEFI and this box doesn't have it.

But all that notwithstanding, can the backups from before the testing be trusted? They were made with the same ECC RAM, but without having run memtest and without any disk burn-in. The more I think about it, the more it seems likely that the answer is "yes": if the RAM is good now, it was good earlier too, so the backups are good. The disks _may_ not be good (which I will figure out with jgreco's script), but if something goes wrong with them after the backups were made, ZFS would alert me during scrubs and fix those errors also if possible.

Btw, with zfs, scrubs are basically "regular testing", right? Scrubs have not given me any errors (yet). I also have weekly SMART tests scheduled, without errors so far.

Scrubs just read the entire pool, which automatically triggers ZFS' correction mechanisms if needed. They do depend on good data being written to disk originally, but if there is a chain of error-corrected operations (which there is, with ECC), ZFS will be able to detect any issues (and solve them if enough redundancy exists).
 
Status
Not open for further replies.
Top