[Strategy] Backup ZFS inderectly

Status
Not open for further replies.

MtK

Patron
Joined
Jun 22, 2013
Messages
471
yes, the title is not the best, so let me explain:
I have 2 servers, obviously running ZFS. One server hosts XenServer VM through NFS, the other is a backup server. For the sake of this discussion, all VMs are Linux Web servers, running some sort of control panel (Directadmin, cPanel, etc).

Sounds simple at first, but now for the complication:
leaving aside the (direct) ZFS backups, snapshots, replication etc, the backup server also acts as an internal backup for the user files. This is done, by letting the (above mentioned) control panel, to backup each account's (user's) files/db and put it in a specific /directory. From there, we have an external script, rsync-ing the files each day, from each VMs /directory into the backup server.

Now for the real complication, the actual structure and layout (for a single VM, example):
  • Control panel - running it's internal backup script, creating a tar file for each user.
    this is a very read intensive task, specially because user's files are mostly small php, causing a lot of random reads.
  • Temp backup location - the CP script, stores the tar files in a /directory, which is actually an NFS mount to the backup ZFS server. This is done for 2 reasons:
    1. avoid the writes on the main ZFS server.
    2. have more space, and actually prevent the space from running out on the main ZFS.
    this was
  • RSYNC script - takes the tar files from each VM's /directory, and puts them in the right (and internal) place. This is done to keep history of backups, saving up to 3 months of those tar files, for each user.
Now for the real/actual problem:
Except the redundant network overhead, the RSYNC script reads and write from (eventually) the same server, causing a little bit of "stress" on the ZFS and even though those are tar files, I see a lot of "little" reads/writes bursts instead of big sequential IO.


I've tried several strategies, including:
  • tar.gz instead of tar
  • also dedup on the backup server, because eventually the sum of all those php files are practically the same (wordpress, joomla, drupal, etc), but I guess with tar files, it's not really doing any good.

so... after the long introduction, I'd be more than happy to hear any suggestions on a better strategy for this.

Just to be clear, while I could change the ZFS layout (raidz, mirror, ssd, etc) the point here is to first have a better strategy. For the moment, I am fine with less then optimal performance from the pools.
Meaning, if possible, let's avoid a conversation about the actual ZFS layout, and server specs, and focus more on the how to backups and were to put things.

And just a last tiny reminder, again, this is not a pure ZFS replication, snapshot discussion. Those are working, and they are going to keep working... :)

thanks in advance!
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
You kinda lost me, but it sounds like you have user info in your XenServer VM's that you are trying to back up. You are doing that via scripts and rsync to pull data from a dataset on your zfs server into the VM and then write it back to the ZFS server. Is that close?

Does (or could each user) have an nfs share on the zfs server and have it mounted it in the vm's? That way their data resides on the zfs server and you could use snapshots and replication at the users dataset level.
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
You kinda lost me, but it sounds like you have user info in your XenServer VM's that you are trying to back up. You are doing that via scripts and rsync to pull data from a dataset on your zfs server into the VM and then write it back to the ZFS server. Is that close?
Pretty much, with the change that we are talking about 2 different ZFS/NFS servers.
  1. Server #1: XenServer VM (NFS)
  2. Server #2: One backup directory (NFS) per VM + the end point destination of the RSYNC.
Does (or could each user) have an nfs share on the zfs server and have it mounted it in the vm's? That way their data resides on the zfs server and you could use snapshots and replication at the users dataset level.
I could, in theory, but this will cause a lot more network traffic, which I'm not sure is a good practice for the moment.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Is server 1 providing NFS shares or using the NFS shares provided from server 2?
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
  • Server 1, provides NFS to the (various) XenServer Hosts.
  • Server 2, provide NFS shares to each of the VMs on the above hosts, for the temp location of the backups, so the VM itself "feels" as if it's doing the backup internally (from the control panel point of view).
  • There is actually another VM in the picture, running the actual rsync script. Server 2, provides the NFS share for the end destination of this script.
    not the same share, but yes the same pool, which causes the simultaneous read+write on that (backup) pool.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Now for the real/actual problem:
Except the redundant network overhead, the RSYNC script reads and write from (eventually) the same server, causing a little bit of "stress" on the ZFS and even though those are tar files, I see a lot of "little" reads/writes bursts instead of big sequential IO.

Just to be clear, while I could change the ZFS layout (raidz, mirror, ssd, etc) the point here is to first have a better strategy. For the moment, I am fine with less then optimal performance from the pools.
Meaning, if possible, let's avoid a conversation about the actual ZFS layout, and server specs, and focus more on the how to backups and were to put things.

So now that the environment is clearer, what exactly is the problem? The first quote sounds like it's a performance problem, but the second seems to say that you are fine with the less than optimal performance.
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
So now that the environment is clearer, what exactly is the problem? The first quote sounds like it's a performance problem, but the second seems to say that you are fine with the less than optimal performance.
well, what I mean is, I would be ok if the transfer bandwith (i.e) of the backups (a.k.a the rsync described above) was 80% of the "full speed".
The problem is that, by now, it's more bursts of 10%, with pauses of a few long seconds in between.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
So either look up how to improve performance of small file rsync backups, or look at ways to improve your zfs read write performance. It seems like an amazingly convoluted process, and I'm not sure I understand why you aren't just running rsync directly on either ZFS server (in a jail). And NFS uses sync writes, but you don't want to discuss ZFS pool strategies, so pretend that I didn't tell you that a RAIDZ pool has the IOPS of the single slowest drive, and that every write must be written to disk before moving on and that a SLOG would likely be very helpful.
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
I know the mirror/raidz/slog pros & cons , thanks.
The reason I don't want to discuss this is because it's already implemented and while I have real data on the servers,changes are a bit out of scope right now

Back to the subject, yes, considering moving the VMs data into a more direct zfs/nfs share is one option
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
[B]@MtK[/B], I think the problem is your load being a particularly evil mix of reads & writes. Faster storage..., at least parts of it.

P.S.
Just my curiosity, how much of available pool space is utilized?
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
@solarisguy, Far less than 80% on both
But you probably meant the actual usage, which somewhere in the range of 2Tb for the data that needs to he backed up.
 
Last edited:

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Less than 50% ? However, that was only tangential...

There is a reason 10K and 15K disk are still on the market, and they tend to have capacities distinctly below 1TB. Large multimedia files, streamed sequentially, that is the market for the new large (1TB and up) hard drives. Lot's of small files spread all over the disk? Move to SSDs manufacturers appear to say.

Latency, seek time (more than one type of seek time), etc. are now gone from the hard drive specifications. We are buying new disks, but not all parameters are better. Simultaneous reads and writes in different parts of a disk were always challenging. And nowadays even more.

Lots of useful statistics can be extracted from ZFS, but I think they would all point towards SSDs

P.S.
The pauses you are observing suggest to me that some cache is too small.
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
@solarisguy The main ZFS already has 10k drives and a cache ssd.
Those (control panel) backups work fine.

For mow I'm more concerned about the secondary backup ZFS.
Any wild guess, which cache could it be or how can I check/monitor this?
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Some ZFS guru might know which cache is starved.

In my experience, the behaviour you are observing is typical to cache problems (in general).
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
Some ZFS guru might know which cache is starved.

In my experience, the behaviour you are observing is typical to cache problems (in general).
On the backup server, right?
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
Status
Not open for further replies.
Top