Replication size

Xyrgh · May 28, 2017

I'm just looking for some clarification on a replication task I have. I've set up my jails to have a 'fixed' data set, so that if I nuke a jail, the data for it remains intact (ie. my plex metadata), My jails are on a mirrored pool using SSDs. I was then rsyncing the 'data' dataset to my main spinner pool (tank) for redundancy purposes.

So, my structure is as follows

 /mnt/tank/jails-backup/data

/mnt/jails/data

 [root@freenas] ~# zpool list

NAME		   SIZE  ALLOC   FREE  EXPANDSZ   FRAG	CAP  DEDUP  HEALTH  ALTROOT

freenas-boot   111G  1.70G   109G		 -	  -	 1%  1.00x  ONLINE  -

jails		  254G  7.26G   247G		 -	 7%	 2%  1.00x  ONLINE  /mnt

scratch	   3.62T  3.02T   621G		 -	51%	83%  1.00x  ONLINE  /mnt

tank		  43.5T  26.4T  17.1T		 -	11%	60%  1.00x  ONLINE  /mnt

The 'data' dataset is 3.8GB in the GUI, the 'data' dataset on jails-backup is 5.9GB. I could not get the two amounts to match, even using the sparse files and hardlinks switches with rsync, so I instead moved to replication (which seems quicker and easier anyway).

Nonetheless, there is still a difference in size. I did an md5 check on the two plex directories, both of which came back with the same hash. I've also checked blocksize (128k) and ashift (12) which is the same on all datasets (except boot, where ashift=9).

I cannot figure out why the replicated dataset is 1.1GB bigger than the source. Has anyone run into this before? It's not a massive space issue at the moment, but I'm planning on having a large amount of data on the jails dataset shortly and will then be more concerned about preserving space.

Any ideas?

Stux · May 28, 2017

I assume you have snapshots on the replicated dataset. Any snapshots will be holding onto the data that has been deprecated in the newer snapshots. You need at least one old snapshot to do a delta replicatation, but you can remove older ones.

There is even an option in the replication configuration to remove old/stale snapshots.

Xyrgh · May 28, 2017

Stux said:
I assume you have snapshots on the replicated dataset. Any snapshots will be holding onto the data that has been deprecated in the newer snapshots. You need at least one old snapshot to do a delta replicatation, but you can remove older ones.

There is even an option in the replication configuration to remove old/stale snapshots.

This is a brand new replication, so the initial replication is 5.9GB, no snapshots of the replication at present.

As per my top post, the size is exactly the same if I run rsync as well. It mostly relates to the plexdata folder, see here:

 # sha256 /mnt/tank/jails-backup/data/plex /mnt/jails/data/plex

SHA256 (/mnt/tank/jails-backup/data/plex) = 287a3387a6a65e0b5483d02992ed2649ef4a9f48d79c7c1d3484acfaec7367d3

SHA256 (/mnt/jails/data/plex) = 287a3387a6a65e0b5483d02992ed2649ef4a9f48d79c7c1d3484acfaec7367d3

 # du -sh /mnt/tank/jails-backup/data/plex

5.1G	/mnt/tank/jails-backup/data/plex

 # du -sh /mnt/jails/data/plex

3.2G	/mnt/jails/data/plex

Exact same dataset, exact same hash, 1.9GB difference.

Xyrgh · Jun 9, 2017

So, I did a bit more testing. If I set a replication job to another pool made of SSDs, they are both 3.2GB. Sent to another different pool of spinning drives and reports 5.1GB.

It appears that going from an SSD pool to a standard platter hard drive pool increases the size. I assume this has something to do with blocksize/ashift, but 59% difference?

Intriguing.

Robert Trevellyan · Jun 9, 2017

Let's have a look at zfs get all for each dataset. Please use CODE tags to preserve formatting, not CMD tags.

Xyrgh · Jun 11, 2017

/Edit see below.

Robert Trevellyan · Jun 12, 2017

I don't see the the jails-backup dataset in this output.

Xyrgh · Jun 12, 2017

Robert Trevellyan said:
I don't see the the jails-backup dataset in this output.

Apologies, I changed the name after switching from rsync to zfs replication.

The jails-back dataset is now /tank/data

Robert Trevellyan · Jun 12, 2017

tank/data does not appear to be a backup of jails. Maybe it's a backup of jails/data? In any case, I can't match it up with what you described in terms of mismatched storage usage.

It would be simpler if you were to post the output of zfs get all for only the individual datasets that you're seeing unexpected results for, e.g. zfs get all jails or zfs get all tank/data.

Xyrgh · Jun 12, 2017

Robert Trevellyan said:
tank/data does not appear to be a backup of jails. Maybe it's a backup of jails/data? In any case, I can't match it up with what you described in terms of mismatched storage usage.

It would be simpler if you were to post the output of zfs get all for only the individual datasets that you're seeing unexpected results for, e.g. zfs get all jails or zfs get all tank/data.

Here we go:

Code:

# zfs get all tank/data
NAME	   PROPERTY					  VALUE						 SOURCE
tank/data  type						  filesystem					-
tank/data  creation					  Fri Jun  2 15:55 2017		 -
tank/data  used						  6.79G						 -
tank/data  available					 9.94T						 -
tank/data  referenced					6.10G						 -
tank/data  compressratio				 1.42x						 -
tank/data  mounted					   yes						   -
tank/data  quota						 none						  default
tank/data  reservation				   none						  default
tank/data  recordsize					128K						  default
tank/data  mountpoint					/mnt/tank/data				default
tank/data  sharenfs					  off						   default
tank/data  checksum					  on							default
tank/data  compression				   lz4						   inherited from tank
tank/data  atime						 off						   inherited from tank
tank/data  devices					   on							default
tank/data  exec						  on							default
tank/data  setuid						on							default
tank/data  readonly					  on							local
tank/data  jailed						off						   default
tank/data  snapdir					   hidden						default
tank/data  aclmode					   passthrough				   inherited from tank
tank/data  aclinherit					passthrough				   inherited from tank
tank/data  canmount					  on							default
tank/data  xattr						 off						   temporary
tank/data  copies						1							 default
tank/data  version					   5							 -
tank/data  utf8only					  off						   -
tank/data  normalization				 none						  -
tank/data  casesensitivity			   sensitive					 -
tank/data  vscan						 off						   default
tank/data  nbmand						off						   default
tank/data  sharesmb					  off						   default
tank/data  refquota					  none						  default
tank/data  refreservation				none						  default
tank/data  primarycache				  all						   default
tank/data  secondarycache				all						   default
tank/data  usedbysnapshots			   706M						  -
tank/data  usedbydataset				 6.10G						 -
tank/data  usedbychildren				0							 -
tank/data  usedbyrefreservation		  0							 -
tank/data  logbias					   latency					   default
tank/data  dedup						 off						   default
tank/data  mlslabel													-
tank/data  sync						  standard					  default
tank/data  refcompressratio			  1.20x						 -
tank/data  written					   0							 -
tank/data  logicalused				   6.42G						 -
tank/data  logicalreferenced			 4.85G						 -
tank/data  volmode					   default					   default
tank/data  filesystem_limit			  none						  default
tank/data  snapshot_limit				none						  default
tank/data  filesystem_count			  none						  default
tank/data  snapshot_count				none						  default
tank/data  redundant_metadata			all						   default
tank/data  org.freenas:description									 received

Code:

# zfs get all jails/data
NAME		PROPERTY				 VALUE					SOURCE
jails/data  type					 filesystem			   -
jails/data  creation				 Mon May 29  9:03 2017	-
jails/data  used					 4.66G					-
jails/data  available				237G					 -
jails/data  referenced			   3.97G					-
jails/data  compressratio			1.53x					-
jails/data  mounted				  yes					  -
jails/data  quota					none					 default
jails/data  reservation			  none					 default
jails/data  recordsize			   128K					 default
jails/data  mountpoint			   /mnt/jails/data		  default
jails/data  sharenfs				 off					  default
jails/data  checksum				 on					   default
jails/data  compression			  lz4					  inherited from jails
jails/data  atime					off					  inherited from jails
jails/data  devices				  on					   default
jails/data  exec					 on					   default
jails/data  setuid				   on					   default
jails/data  readonly				 off					  default
jails/data  jailed				   off					  default
jails/data  snapdir				  hidden				   default
jails/data  aclmode				  passthrough			  inherited from jails
jails/data  aclinherit			   passthrough			  inherited from jails
jails/data  canmount				 on					   default
jails/data  xattr					off					  temporary
jails/data  copies				   1						default
jails/data  version				  5						-
jails/data  utf8only				 off					  -
jails/data  normalization			none					 -
jails/data  casesensitivity		  sensitive				-
jails/data  vscan					off					  default
jails/data  nbmand				   off					  default
jails/data  sharesmb				 off					  default
jails/data  refquota				 none					 default
jails/data  refreservation		   none					 default
jails/data  primarycache			 all					  default
jails/data  secondarycache		   all					  default
jails/data  usedbysnapshots		  710M					 -
jails/data  usedbydataset			3.97G					-
jails/data  usedbychildren		   0						-
jails/data  usedbyrefreservation	 0						-
jails/data  logbias				  latency				  default
jails/data  dedup					off					  default
jails/data  mlslabel										  -
jails/data  sync					 standard				 default
jails/data  refcompressratio		 1.21x					-
jails/data  written				  18.4M					-
jails/data  logicalused			  6.78G					-
jails/data  logicalreferenced		4.63G					-
jails/data  volmode				  default				  default
jails/data  filesystem_limit		 none					 default
jails/data  snapshot_limit		   none					 default
jails/data  filesystem_count		 none					 default
jails/data  snapshot_count		   none					 default
jails/data  redundant_metadata	   all					  default
jails/data  org.freenas:description						   local

Also:

Code:

SHA256 (/mnt/tank/data) = 146bc6323d5414537affb1290db59609c16acab94e7d09c3bd36a773c2e12ebf
SHA256 (/mnt/jails/data) = 146bc6323d5414537affb1290db59609c16acab94e7d09c3bd36a773c2e12ebf

This is just to show that that above two datasets have identical checksums, but obviously differ in size on disk.

Robert Trevellyan · Jun 13, 2017

The main difference appears to be in the usedbydataset property:

Read-only property that identifies the amount of disk space that is used by a dataset itself, which would be freed if the dataset was destroyed ...

There's also a difference in the compression ratios.

The underlying pools are massively different size, which could affect storage allocation efficiency, but I think the key is probably that the jails pool has no redundancy, while the tank pool does. Obviously there's an overhead when storing data with redundancy. This would imply that the tank pool has 33% redundancy, e.g. 6-disk RAIDZ2, which means the data occupies about 1.5x as much raw disk space there.

Xyrgh · Jun 18, 2017

Robert Trevellyan said:
The main difference appears to be in the usedbydataset property:

There's also a difference in the compression ratios.

The underlying pools are massively different size, which could affect storage allocation efficiency, but I think the key is probably that the jails pool has no redundancy, while the tank pool does. Obviously there's an overhead when storing data with redundancy. This would imply that the tank pool has 33% redundancy, e.g. 6-disk RAIDZ2, which means the data occupies about 1.5x as much raw disk space there.

That's an interesting explanation. The jails pool doesn't have redundancy per se, it's just a mirror. I didn't realise that the space taken up for redundancy would be shown like that, you learn something new every day.

Thanks!

Robert Trevellyan · Jun 19, 2017

Xyrgh said:
The jails pool doesn't have redundancy per se, it's just a mirror

If it's a mirror, it has 50% redundancy, which means my explanation only works if ZFS reports disk usage differently on mirrors. Looking at my pool, that does appear to be the case:

Code:

root@T20:~# zfs get usedbydataset pool0/media 
NAME		 PROPERTY	   VALUE   SOURCE
pool0/media  usedbydataset  118G	-
root@T20:~# du -sh /pool0/media/
118G	/pool0/media/
root@T20:~#

Ericloewe · Jun 20, 2017

Do keep in mind that du is mostly useless with ZFS and that zfs list and zpool list report space differently.

Xyrgh · Jun 20, 2017

Ericloewe said:
Do keep in mind that du is mostly useless with ZFS and that zfs list and zpool list report space differently.

Thanks, but these dataset sizes are directly reported in the FreeNAS GUI and match zfs list and zpool list.

Important Announcement for the TrueNAS Community.

Replication size

Xyrgh

Explorer

Stux

MVP

Xyrgh

Explorer

Xyrgh

Explorer

Robert Trevellyan

Pony Wrangler

Xyrgh

Explorer

Robert Trevellyan

Pony Wrangler

Xyrgh

Explorer

Robert Trevellyan

Pony Wrangler

Xyrgh

Explorer

Robert Trevellyan

Pony Wrangler

Xyrgh

Explorer

Robert Trevellyan

Pony Wrangler

Ericloewe

Server Wrangler

Xyrgh

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

Replication size

Explorer

MVP

Explorer

Explorer

Pony Wrangler

Explorer

Pony Wrangler

Explorer

Pony Wrangler

Explorer

Pony Wrangler

Explorer

Pony Wrangler

Server Wrangler

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replication size"

Similar threads