NAS migration with data integrity checks?

Status
Not open for further replies.

jimmy_1969

Cadet
Joined
Sep 27, 2015
Messages
7
Hi,

I am researching how to migrate date from my old, existing ReadyNAS NV+ (Debian based) to my newly built FreeNAS (FreeNAS-9.3-STABLE-201602031011). Have been searching both the FreeNAS forum as week as a fair bit of googling, but hasn't been able to find a solid method.

Background
On my old NAS there is about 3 TB of data including about 100 GB of original content media (documents, family pictures, videos, etc). ReadyNAS is running a forked version of Debian Sarge which was discontinued in March 2008. It has tar, gzip and md5sum to play around with.

FreeNAS-9.3 comes with md5 which uses a different output format compared to md5sum, making comparison very fiddly. Which begs the question: how to do md5 hashing in a mixed Linux and BSD environment?

Migration
My original plan was to use rsync with CRC checks. However, practical test proved that to be very slow (<1 Mbps). Using a NFS share and cp yeilds ~2 Mbps in throughput which sadly is the best the old NAS can perform.

My objective is to take selected directories from my old NAS using EXT2 file system, migrate them individually to FreeNAS using ZFS and be able to validate data integrity afterwards. The intended work-flow would be:
  1. Initial bulk migration from old to new NAS
  2. Periodic updates using rsync for delta updates
  3. Repeat #2 until new NAS passed stability monitoring phase
But I can't get my head around how to design a md5 check that works on file level across Linux to a BSD platform. So I would be very interested in getting suggestions from the community on how to address data integrity issues.

What would be the preferred method to migrate the data to the new FreeNAS and validate the data integrity?

Best Regards

//Jimmy
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Doesn't rsync already have the ability to do checksums (--checksum)? As well, if you were connecting to the shares from a Windows machine you could use RoboCopy... Perhpas one of the plugins in FreeNas (BTSync, SyncThing, etc.)

Just some ideas.
 

jimmy_1969

Cadet
Joined
Sep 27, 2015
Messages
7
@Mirfster

rsync with checksums is ok for doing small delta updates, but way too slow for any heavy lifting.
I haven't any Windows machines in my environment. Only Linux and BSD OSes in my home network.
 
Last edited:

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
rsync with checksums is ok for doing small delta updates, but way too slow for any heavy lifting.
When rsync initially transfers a file, or updates an existing file, it verifies the checksum. With a properly functioning ZFS filesystem, you don't need to checksum previously transferred files that are unchanged on the source. Am I missing something?
 

jimmy_1969

Cadet
Joined
Sep 27, 2015
Messages
7
@Robert Trevellyan

The concern is how to migrate data files from old Linux based NAS to new FreeNAS and be able to prove that there is no data corruption during this process.

This could potentially mean archiving, splitting files, transferring them across platforms and storing them on the FreeNAS whilst maintaining time stamps, directory structure and file data content throughout the process. And the point is that I want to validate that the migrated data is migrated without any data corruption.

Let us imagine for a second that it wasn't my private cat pictures migrated, but a few TB of financial backup records of an enterprise. At the end of the day whoever delivered the migration service would be asked to verify the integrity of the migrated data to the new storage platform.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
The concern is how to migrate data files from old Linux based NAS to new FreeNAS and be able to prove that there is no data corruption during this process.
I understand.

Like I said, rsync validates each transferred file. From the man page:
Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole-file checksum that is generated as the file is transferred...
How does this not address your concern?
 

jimmy_1969

Cadet
Joined
Sep 27, 2015
Messages
7
@Robert Trevellyan

rsync performance is way too slow for this volume of data. I did some rsync transfer tests using both with and without CRC checks, but based on my estimates it would take weeks to transfer data this way.
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Why don't you md5sum the files before you transfer them (ftp for speed) and again after they've landed on the zfs system. If you find one that doesn't match, then resend. On a stable LAN network the chance of a corruption during transfer are very very small.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
FreeNAS-9.3 comes with md5 which uses a different output format compared to md5sum, making comparison very fiddly. Which begs the question: how to do md5 hashing in a mixed Linux and BSD environment?
shasum may be your answer then...
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
This is from a different forum but the OP was having the same md5 vs md5sum problem.



"... I came up with this small sed script to create the checksums on FreeBSD that match the Linux md5sum output:

Code:
md5 file [file ...] | sed -e 's#^MD5 [(]\(.*\)[)] = \(.*\)$#\2 \1#' > md5sums.txt


This will use the FreeBSD md5 command and rearrange the output to look like the GNU md5sum.

Then on Linux I can just use md5sum --check md5sums.txt

You can also use the above sed script with an existing file produced by FreeBSD's md5command.

I also put this alias in my FreeBSD .cshrc file:

Code:
alias md5sum "md5 \!* | sed -e '"'s#MD5 [(]\(.*\)[)] = \(.*\)$#\2 \1#'"'"


now on FreeBSD I can just say md5sum file1 file2 file3 ... and it just works."
 
Last edited:

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
rsync performance is way too slow for this volume of data.
I saw that in your first post. Its important to understand the impact of using the --checksum option, which checksums every file on both sides to determine whether it has changed and should be transferred again. This is unnecessary in normal use, because rsync always verifies each transfer. Given that this will be a one-time migration, I would go with the simple solution.
 

jimmy_1969

Cadet
Joined
Sep 27, 2015
Messages
7
Hi Again,

I spent a few days researching and measuring my old Netgear ReadyNAS NV+ v1.

The NAS it self originates from 2006. Under the hood there is a sparc-based IT3107 CPU running at 280 MHz . My unit's RAM has been upgraded from the original 256 MG to 1 GB. It's runing on a Netgear fork of Debian Sarge (3.1) with a 2.6.17 kernel. If you never heard of it is no wonder as it went out of maintenance circa 2008.

Even with all services disabled except NFS, and with no other users accessing the discs, the max consistent throughput for a cp copy command over NFS share is in the ~500 KB/s range. Using rsync gives a throughput in the ~200 KB/s range. With CPU usage in the up to 80% range.

After some googling it is clear hat this model had reported tissues with throughput performance. Netgear even published a guide on how to trouble shoot it, which includes "if all else fails - do a factory restore". I verified NIC links to be at expected 1 GB/s, and the MTU working without defragmentation at 1500 bytes. This is the same MTU setting used in my switch and in the new FreeNAS server.

My assumption is that the slow transfer performance is CPU related. A quick "back-of-an-envelope" calculation that network transfers would take months for the full 3 GB data set on my old NAS.

For anyone planning to do a NAS data migration I suggest to start off reading this guide that gave me may good pointers.

Currently I switched to using one 2 TB external USB2 to copy the data across. After reviewing and cleaning up my original data, I now only have 1.8 TB to migrate.

My weapon of choice for data integrity is MD5deep which has the capability of running md5 hashes on entire directory structures. It wasn't part of my ReadyNAS repository but tanks to limited dependencies it was easy to built it from source on my old NAS. At the time of writing this the latest source package is hashdeep-release-4.4. Even though the name is different, compiling this will result in a md5deep executable.

For freeNAS there is an existing md5deep package that can be installed in a jail. Just be sure to use the -j0 switch to disable multi-threading to get a deterministic order of file hashes. Or else the process sequence might differ between machines resulting in false positive integrity detection.

The data integrity is done in two steps:
1. Create baseline md5 hash on original ReadyNAS directory
Code:
md5deep -rel -j0 orig_directory > dirhash_orig.txt

2. Validate the copied data and write failed checks in log file
Code:
md5deep -X dirhash_orig.txt-r -j0 dest_directory > Diffhash.txt


Data integrity validation is done in the final target NAS, i e my new FreeNAS.

Thanks to all of you who spared your time to answer my questions.

Best Regards

//Jimmy
 
Last edited:
Status
Not open for further replies.
Top