How long should it take to do zfs snapshot?

kjstech · Mar 17, 2014

I have two FreeNas 9.2.0 release boxes from two older Dell R200 1U servers. Since only 2 drives can fit in the chassis and I need a large volume and they don't make drives bigger than 4TB, I have them in RAID-0 configuration. I know, I know, raid-0 ... gasp! Well thats why theres two of these guys, since I figure one can replicate to the other which is identical config.

After formatted, the volume is about 7.1 TB. The primary has 3.94 TB of used space on it. ZFS is replicating to the other box over a dedicated 1gbps nic. In the reporting tab it looks about a constant 500mbps usage. Over an entire weekend I still don't think its done replicating.

The reason I say that is because on the second box under storage > active volumes, it shows 2.0 KB used, 756 KB available and only 823.5 KB in size.

On the main unit it shows 3.9 TB used, 2.1 TB avail and 6.1 TB in size. Its set to snapshot once a day at 8 AM and keep 2 snapshots. Under ZFS Snapshots I have two:
auto-20140315.1448-2d
auto-20140316.1448-2d

So either on the receiving side the space is not ever calculated or the replication just never completes. But when I look at the interface traffic for the nic that replication is forced over, it seems like it may be completing nightly. I'll attach a screen shot. The low point on the graph was when I was trying rsync but as you can see I wanted to abandon it since its so slow. Prior to that was ZFS replication when I gave up on it the first time.

Memory shows 6gb used out of 8gb installed. They said 1gb for every tb of storage and formatted, minus snapshots its about 7.1tb so I have an extra gig. Not doing anything crazy like dedupe or compression. Just samba and zfs replication.

ser_rhaegar · Mar 17, 2014

Try running

Code:

zfs list -o space

on the second box. This will list all the datasets and used space. You should see the datasets from the first box in this list if replication is working properly.

cyberjock · Mar 17, 2014

Yeah. those weird <1MB numbers are because replication is in progress(for various definitions of "in progress"). Nothing is wrong, it's just that the FreeNAS GUI doesn't understand what is going on. I'd look at network traffic and I'd look at "zpool status 1". If there's no network traffic and no zpool activity then something is wrong on your destination box. I will say that 8GB of RAM is a bit small, and if you can afford going to 16GB I'd definitely recommend it. The 1GB of RAM per TB of storage is a thumbrule, and it doesn't specify raw storage or total available user storage. It also doesn't mention a minimum(such as 8GB of RAM plus 1GB per TB). 8GB makes the system not crash. More RAM will only make things run smoother. You may have an issue where your ZFS cache just isn't big enough for the load it's being handed to function properly.

Transferring 7TB of data will take a while.. you're first replication will take a very long time, with subsequent replication tasks being only as long as the amount of data that's changed since the previous snapshot.

2d is a short time frame.. that might not go well if it takes more than 2d for the replication to happen since the backup server will always end up out of sync(resulting in a resync from scratch).

kjstech · Mar 17, 2014

Ok on the second box receiving the replicated data I do see used space:
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD backups
1.33T 5.81T 0 5.12T 0 700G

kjstech · Mar 17, 2014

Sounds plausible about the snapshots being two close together. The snapshot interval is 1 day begin at 08:00 end at 18:00 with a 2 day lifetime.
The next possible interval after 1 day is 1 week. I guess I could do 1 week, then when its all replicated change it to 1 day if you think that would work.

Just interesting to note that the bandwidth graph for interface bge1 shows around 500mbps traffic from 08:00 sharp until around 19:45. Then its nothing until the next day 08:00 sharp. If its not 'done' then why is it stopping? Replication task begin is 00:00 end 23:59. Recursively replicate and remove stale snapshot on remote side and Initialize remote side for once.

Problem with the RAM is theres only 4 slots and it only recognizes up to 2GB per slot, so 8GB is maxed. Its about 7.1 TB so I though ok 1GB per TB of zfs storage, great. Then when looking at the physical memory utilization graph in reporting tab it still shows 1.3G free, so I thought I was ok. But I see what your saying with added overhead and stuff. Since the hardware can't handle more ram then if I continue to have issues I may have to go with UFS instead and I guess RSYNC is my only option at that point.

cyberjock · Mar 17, 2014

kjstech said:
Sounds plausible about the snapshots being two close together. The snapshot interval is 1 day begin at 08:00 end at 18:00 with a 2 day lifetime.
The next possible interval after 1 day is 1 week. I guess I could do 1 week, then when its all replicated change it to 1 day if you think that would work.

That's exactly what I'd do to start.. the systems will work it out when you change it later.

kjstech said:
Just interesting to note that the bandwidth graph for interface bge1 shows around 500mbps traffic from 08:00 sharp until around 19:45. Then its nothing until the next day 08:00 sharp. If its not 'done' then why is it stopping? Replication task begin is 00:00 end 23:59. Recursively replicate and remove stale snapshot on remote side and Initialize remote side for once.

It could be the data transfer was complete but the PULL server has/had more work to do. Not to familiar with the work needed to actually "close" out a snapshot after replication finishes.

kjstech said:
Problem with the RAM is theres only 4 slots and it only recognizes up to 2GB per slot, so 8GB is maxed. Its about 7.1 TB so I though ok 1GB per TB of zfs storage, great. Then when looking at the physical memory utilization graph in reporting tab it still shows 1.3G free, so I thought I was ok. But I see what your saying with added overhead and stuff.

First: Building a system that had a max that was also the minimum for FreeNAS was not the world's smartest decision.
Second: UFS is going away.. 9.2.1.3(which is a bug fix from 9.2.1.2) is the last UFS version to be released. So again, not the best long-term choice.
Third: rsync is not a record setting performer. It is extremely slow, and if you have multi-TB of data, doing the compares and whatnot can take 24+ hours for 1 run. So again, not the best long-term choice. Remember, rsync was designed to minimize traffic needed between the source and destination and ensure the data on the destination isn't corrupted in transit. One of the design considerations was not "ultra-fast with 5TB+ of data". Rsync was first created in 1996 and at that time even having a 25GB server was immensely expensive. It just doesn't scale up very well to 5TB+ of data.
Fourth: Don't let the free RAM fool you. ZFS can only use so much on a given system without causing problems. You actually don't want to see a system with 100MB of RAM free because of ZFS. The first time a service needs more RAM the system would likely crash. ZFS won't grab more even if it needs it because it expects the admin to right-size the amount of RAM for the task at hand(which most people aren't able to do from a few CLI commands). ZFS' defaults work for common design criteria, but the honus is still on the server admin to make sure everything is appropriate for the situation. 8GB of RAM makes your server stable for 99.9999% of people. More RAM only makes the server more efficient. Generally if nothing is going wrong and is plenty fast, adding more RAM won't buy you anything.

You're looking at a bunch of crappy options.. with the "best" option is getting hardware that is more fit to the task(which I'm sure you are just ecstatic to hear).

Important Announcement for the TrueNAS Community.

How long should it take to do zfs snapshot?

kjstech

Dabbler

Attachments

ser_rhaegar

Patron

cyberjock

Inactive Account

kjstech

Dabbler

kjstech

Dabbler

cyberjock

Inactive Account

Similar threads

Important Announcement for the TrueNAS Community.

How long should it take to do zfs snapshot?

kjstech

Dabbler

Attachments

ser_rhaegar

Patron

cyberjock

Inactive Account

kjstech

Dabbler

kjstech

Dabbler

cyberjock

Inactive Account

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "How long should it take to do zfs snapshot?"

Similar threads