First FreeNAS restore weirdness

Status
Not open for further replies.

DAXQ

Contributor
Joined
Sep 5, 2014
Messages
106
Sorry in advance for the long winded question, but I believe it needs background.

I have been running freeNAS for a couple of months now on older hardware [HP Prolaint ML370 G6 - not a slouch server by any means, but certainly older - and not the greatest raid config with its 16 drives set up individually]. I was thrust into a restore with my HP server - one of my main production servers [IBM System x3650 M3] lost a raid controller. So I had to restore my VM's off the HP FreeNAS server last night rather unexpectedly. I use Veeam B&R to take nightly backups of the servers for just this kind of thing.

For my set up, I am using:
  • Veeam running on a PC
  • Veeam connects to a Linux Datastore that is using NFS
  • The Linux DataStore is connecting to FreeNAS via NFS mount
  • This setup is how I backup and restore the VM's and the backup part has worked very well
The back up (using CBT and DeDup etc...) can really cut down on the amount of data being sent to FreeNAS, however as all of you surely know - restoration is a different animal entirely. There is no way around a slow painful restore for larger files/VM Drives.

The Restore worked great on smaller servers (DC's & Small File servers 60GB - 150GB) however while restoring one of my application specific servers 699GB - it was a different thing - and I am not sure I understand how it worked - by all rights I would have expected the restoration to have failed, but it didn't and I wonder if one of you guys may be able to shed some light on it.

During the restore of the larger VM (Veeam PC restoring from FreeNAS datastore via NFS on a Linux computer to an ESXi server) I got to about 50% ~4hours in - and I heard the FreeNAS server fans kick on - as if the server was rebooting. When I tried to web brows to it, I indeed verified it was rebooting. The Veeam process did not cut out, but did stop moving its progression & %. I watched that server for 10 minutes (the logs even show the gap) rebooting - when it finally got back up - the Veeam process began ticking away again - as if nothing had happened. This happened once more during the night at ~80% - talk about sick to your stomach stress. The second time I had a monitor screen on the FreeNAS box [normally headless] and granted it was about 1AM in the morning I didn't catch much but did see the server go through its entire reboot process [another 10 minutes] and Veeam waited util the stream was available again and carried on as if nothing had happened.

I am sure CIFS would have crapped out completely - so my question(s) are:

How did the NFS mount manage to keep things going between the VeeamPC, the Linux server with the NFS mount and FreeNAS?

Secondly - I sit and listen to these servers daily, and I have never seen/heard the FreeNAS box just randomly reboot like that, making me think it must have had something to do with the restore process that was going on - any suggestions to help me determine how the reboots happened so I can prevent them in the future?

Thirdly - would the best way to speed these larger restores up be to get into 10G network between the servers and backup? Can FreeNAS use those NICs?

(Edit - OK 10G network was a stupid thought, I figured it was more "readily available" and used than it appears it is in its current state).
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
First answer: the NFS mounts were probably "hard mounts" meaning that they hang the process until restored. Since they did eventually return in this case, that worked.

Second and third combined: You aren't going to like this answer, but here goes:

HP Prolaint ML370 G6 - not a slouch server by any means, but certainly older - and not the greatest raid config with its 16 drives set up individually

The age of the server doesn't matter, I've used older and had it work fine. But the "RAID config" is a horrible mess for both reliabilty and performance that has no easy migration path. ZFS on top of hardware RAID is bad, because the RAID controller interferes with ZFS's ability to directly address the disks for things like cache flushing. That impacts performance, as well as the controller potentially still holding data in its write cache that ZFS thinks is safely on disk. If that was crucial metadata and you have a power outage/battery failure of the RAID card, that can toast the whole pool right there.

It also means ZFS can't give a pre-fail warning for a disk that is degraded since the HP P410 isn't going to pass SMART values. The P410 itself is also a pretty bad RAID card (like most HP ones) from a reliability perspective - IIRC it liked to overheat under heavy load. Which might be why it died repeatedly under the restore, since it was furiously seeking all over the disks.

Unfortunately there's no migration path that keeps your data intact, because each RAID0 won't be readable on anything other than an HP RAID card. The only way out is "destroy pool, install supported HBA, recreate pool."

...

On the upside though, FreeNAS supports most Intel 10Gbps cards, so that's an option ...
 

DAXQ

Contributor
Joined
Sep 5, 2014
Messages
106
Yup - I was sure the RAID card was an issue going into this, but I don't think I can replace it and still use the same sas disks and cage. If I need to replace all, may as well get new servers. Did not however expect it would bring things to a stop like that - that was a really long night and NFS really was the hero in that restoration. Really disappointed that I only recently started using it because CIFS just decided to stop working for me with Veeam which forced me to learn the NFS work around. Has been pretty stable - and really impressive.

After the crash on the System X server - I think we will be replacing the pair of servers for new - which will make the two 3650 M3's available for use as a freeNAS server. Maybe it will work out better.
 
Status
Not open for further replies.
Top