We use replication from one server to a backup server. It's not much data to
replicate, around 1.9 TiB in 36 datasets (4400 snapshots).
Transmitting the first snapshots per dataset maxed out the network connection
(1 GiB up/down). After that it stalled and transferring a single snapshot
takes several minutes. Every few minutes a few KiB to a few MiB are
transferred. It took more than 24 hours to transmit 2000 snapshots (and it's
still doing it's thing).
To test I manually transferred all the data to the remote side:
This transfer including all 4400 snapshots took around 6 hours which seems
much more reasonable. The FreeNAS replication, however, transferred half of
the snapshots in 24 hours. There seems to be something wrong and I may suspect
it has something to do with the “zfs list” process (on PUSH) which “ps” shows
almost all the time:
But since I don't know the nitty gritty details of the replication I might be wrong. Furthermore, the log file is filled with thousands of lines:
CPU, network and disk are low on PUSH and on PULL during the replication. SSH
cipher is deactivated and I left replication stream compression on the default
setting (lz4).
Am I alone with this or is it a known issue? Should I file a bug report? It's
a little hard to reproduce in a VM since you need quite some data including a
bunch of snapshots.
Specs PUSH:
FreeNAS-9.2.1.7-RELEASE-x64
32 GiG RAM, 2 mirrored vdevs
Specs PULL:
FreeNAS-9.2.1.7-RELEASE-x64
8 GiG RAM, RAIDZ-1
replicate, around 1.9 TiB in 36 datasets (4400 snapshots).
Transmitting the first snapshots per dataset maxed out the network connection
(1 GiB up/down). After that it stalled and transferring a single snapshot
takes several minutes. Every few minutes a few KiB to a few MiB are
transferred. It took more than 24 hours to transmit 2000 snapshots (and it's
still doing it's thing).
To test I manually transferred all the data to the remote side:
Code:
zfs snapshot -r pool@foo zfs send -Rv pool@foo | ssh freenas-backup | zfs receive -Fdu newpool
This transfer including all 4400 snapshots took around 6 hours which seems
much more reasonable. The FreeNAS replication, however, transferred half of
the snapshots in 24 hours. There seems to be something wrong and I may suspect
it has something to do with the “zfs list” process (on PUSH) which “ps” shows
almost all the time:
Code:
/sbin/zfs list -t snapshot -H
But since I don't know the nitty gritty details of the replication I might be wrong. Furthermore, the log file is filled with thousands of lines:
Code:
Aug 26 21:48:20 freenas autosnap.py: [tools.autosnap:58] Popen()ing: /sbin/zfs get -H freenas:state tank/foo/bar@auto-20140703.0700-2m Aug 26 21:48:20 freenas autosnap.py: [tools.autosnap:58] Popen()ing: /sbin/zfs get -H freenas:state tank/foo/bar@auto-20140719.0700-2m Aug 26 21:48:20 freenas autosnap.py: [tools.autosnap:58] Popen()ing: /sbin/zfs get -H freenas:state tank/foo/bar@auto-20140717.0700-2m
CPU, network and disk are low on PUSH and on PULL during the replication. SSH
cipher is deactivated and I left replication stream compression on the default
setting (lz4).
Am I alone with this or is it a known issue? Should I file a bug report? It's
a little hard to reproduce in a VM since you need quite some data including a
bunch of snapshots.
Specs PUSH:
FreeNAS-9.2.1.7-RELEASE-x64
32 GiG RAM, 2 mirrored vdevs
Specs PULL:
FreeNAS-9.2.1.7-RELEASE-x64
8 GiG RAM, RAIDZ-1
Last edited: