Replication size

Status
Not open for further replies.

Xyrgh

Explorer
Joined
Apr 11, 2016
Messages
69
I'm just looking for some clarification on a replication task I have. I've set up my jails to have a 'fixed' data set, so that if I nuke a jail, the data for it remains intact (ie. my plex metadata), My jails are on a mirrored pool using SSDs. I was then rsyncing the 'data' dataset to my main spinner pool (tank) for redundancy purposes.

So, my structure is as follows

/mnt/tank/jails-backup/data
/mnt/jails/data


[root@freenas] ~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
freenas-boot 111G 1.70G 109G - - 1% 1.00x ONLINE -
jails 254G 7.26G 247G - 7% 2% 1.00x ONLINE /mnt
scratch 3.62T 3.02T 621G - 51% 83% 1.00x ONLINE /mnt
tank 43.5T 26.4T 17.1T - 11% 60% 1.00x ONLINE /mnt


The 'data' dataset is 3.8GB in the GUI, the 'data' dataset on jails-backup is 5.9GB. I could not get the two amounts to match, even using the sparse files and hardlinks switches with rsync, so I instead moved to replication (which seems quicker and easier anyway).

Nonetheless, there is still a difference in size. I did an md5 check on the two plex directories, both of which came back with the same hash. I've also checked blocksize (128k) and ashift (12) which is the same on all datasets (except boot, where ashift=9).

I cannot figure out why the replicated dataset is 1.1GB bigger than the source. Has anyone run into this before? It's not a massive space issue at the moment, but I'm planning on having a large amount of data on the jails dataset shortly and will then be more concerned about preserving space.

Any ideas?
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I assume you have snapshots on the replicated dataset. Any snapshots will be holding onto the data that has been deprecated in the newer snapshots. You need at least one old snapshot to do a delta replicatation, but you can remove older ones.

There is even an option in the replication configuration to remove old/stale snapshots.
 

Xyrgh

Explorer
Joined
Apr 11, 2016
Messages
69
I assume you have snapshots on the replicated dataset. Any snapshots will be holding onto the data that has been deprecated in the newer snapshots. You need at least one old snapshot to do a delta replicatation, but you can remove older ones.

There is even an option in the replication configuration to remove old/stale snapshots.

This is a brand new replication, so the initial replication is 5.9GB, no snapshots of the replication at present.

As per my top post, the size is exactly the same if I run rsync as well. It mostly relates to the plexdata folder, see here:

# sha256 /mnt/tank/jails-backup/data/plex /mnt/jails/data/plex
SHA256 (/mnt/tank/jails-backup/data/plex) = 287a3387a6a65e0b5483d02992ed2649ef4a9f48d79c7c1d3484acfaec7367d3
SHA256 (/mnt/jails/data/plex) = 287a3387a6a65e0b5483d02992ed2649ef4a9f48d79c7c1d3484acfaec7367d3


# du -sh /mnt/tank/jails-backup/data/plex
5.1G /mnt/tank/jails-backup/data/plex


# du -sh /mnt/jails/data/plex
3.2G /mnt/jails/data/plex


Exact same dataset, exact same hash, 1.9GB difference.
 

Xyrgh

Explorer
Joined
Apr 11, 2016
Messages
69
So, I did a bit more testing. If I set a replication job to another pool made of SSDs, they are both 3.2GB. Sent to another different pool of spinning drives and reports 5.1GB.

It appears that going from an SSD pool to a standard platter hard drive pool increases the size. I assume this has something to do with blocksize/ashift, but 59% difference?

Intriguing.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Let's have a look at zfs get all for each dataset. Please use CODE tags to preserve formatting, not CMD tags.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
tank/data does not appear to be a backup of jails. Maybe it's a backup of jails/data? In any case, I can't match it up with what you described in terms of mismatched storage usage.

It would be simpler if you were to post the output of zfs get all for only the individual datasets that you're seeing unexpected results for, e.g. zfs get all jails or zfs get all tank/data.
 

Xyrgh

Explorer
Joined
Apr 11, 2016
Messages
69
tank/data does not appear to be a backup of jails. Maybe it's a backup of jails/data? In any case, I can't match it up with what you described in terms of mismatched storage usage.

It would be simpler if you were to post the output of zfs get all for only the individual datasets that you're seeing unexpected results for, e.g. zfs get all jails or zfs get all tank/data.

Here we go:

Code:
# zfs get all tank/data
NAME	   PROPERTY					  VALUE						 SOURCE
tank/data  type						  filesystem					-
tank/data  creation					  Fri Jun  2 15:55 2017		 -
tank/data  used						  6.79G						 -
tank/data  available					 9.94T						 -
tank/data  referenced					6.10G						 -
tank/data  compressratio				 1.42x						 -
tank/data  mounted					   yes						   -
tank/data  quota						 none						  default
tank/data  reservation				   none						  default
tank/data  recordsize					128K						  default
tank/data  mountpoint					/mnt/tank/data				default
tank/data  sharenfs					  off						   default
tank/data  checksum					  on							default
tank/data  compression				   lz4						   inherited from tank
tank/data  atime						 off						   inherited from tank
tank/data  devices					   on							default
tank/data  exec						  on							default
tank/data  setuid						on							default
tank/data  readonly					  on							local
tank/data  jailed						off						   default
tank/data  snapdir					   hidden						default
tank/data  aclmode					   passthrough				   inherited from tank
tank/data  aclinherit					passthrough				   inherited from tank
tank/data  canmount					  on							default
tank/data  xattr						 off						   temporary
tank/data  copies						1							 default
tank/data  version					   5							 -
tank/data  utf8only					  off						   -
tank/data  normalization				 none						  -
tank/data  casesensitivity			   sensitive					 -
tank/data  vscan						 off						   default
tank/data  nbmand						off						   default
tank/data  sharesmb					  off						   default
tank/data  refquota					  none						  default
tank/data  refreservation				none						  default
tank/data  primarycache				  all						   default
tank/data  secondarycache				all						   default
tank/data  usedbysnapshots			   706M						  -
tank/data  usedbydataset				 6.10G						 -
tank/data  usedbychildren				0							 -
tank/data  usedbyrefreservation		  0							 -
tank/data  logbias					   latency					   default
tank/data  dedup						 off						   default
tank/data  mlslabel													-
tank/data  sync						  standard					  default
tank/data  refcompressratio			  1.20x						 -
tank/data  written					   0							 -
tank/data  logicalused				   6.42G						 -
tank/data  logicalreferenced			 4.85G						 -
tank/data  volmode					   default					   default
tank/data  filesystem_limit			  none						  default
tank/data  snapshot_limit				none						  default
tank/data  filesystem_count			  none						  default
tank/data  snapshot_count				none						  default
tank/data  redundant_metadata			all						   default
tank/data  org.freenas:description									 received


Code:
# zfs get all jails/data
NAME		PROPERTY				 VALUE					SOURCE
jails/data  type					 filesystem			   -
jails/data  creation				 Mon May 29  9:03 2017	-
jails/data  used					 4.66G					-
jails/data  available				237G					 -
jails/data  referenced			   3.97G					-
jails/data  compressratio			1.53x					-
jails/data  mounted				  yes					  -
jails/data  quota					none					 default
jails/data  reservation			  none					 default
jails/data  recordsize			   128K					 default
jails/data  mountpoint			   /mnt/jails/data		  default
jails/data  sharenfs				 off					  default
jails/data  checksum				 on					   default
jails/data  compression			  lz4					  inherited from jails
jails/data  atime					off					  inherited from jails
jails/data  devices				  on					   default
jails/data  exec					 on					   default
jails/data  setuid				   on					   default
jails/data  readonly				 off					  default
jails/data  jailed				   off					  default
jails/data  snapdir				  hidden				   default
jails/data  aclmode				  passthrough			  inherited from jails
jails/data  aclinherit			   passthrough			  inherited from jails
jails/data  canmount				 on					   default
jails/data  xattr					off					  temporary
jails/data  copies				   1						default
jails/data  version				  5						-
jails/data  utf8only				 off					  -
jails/data  normalization			none					 -
jails/data  casesensitivity		  sensitive				-
jails/data  vscan					off					  default
jails/data  nbmand				   off					  default
jails/data  sharesmb				 off					  default
jails/data  refquota				 none					 default
jails/data  refreservation		   none					 default
jails/data  primarycache			 all					  default
jails/data  secondarycache		   all					  default
jails/data  usedbysnapshots		  710M					 -
jails/data  usedbydataset			3.97G					-
jails/data  usedbychildren		   0						-
jails/data  usedbyrefreservation	 0						-
jails/data  logbias				  latency				  default
jails/data  dedup					off					  default
jails/data  mlslabel										  -
jails/data  sync					 standard				 default
jails/data  refcompressratio		 1.21x					-
jails/data  written				  18.4M					-
jails/data  logicalused			  6.78G					-
jails/data  logicalreferenced		4.63G					-
jails/data  volmode				  default				  default
jails/data  filesystem_limit		 none					 default
jails/data  snapshot_limit		   none					 default
jails/data  filesystem_count		 none					 default
jails/data  snapshot_count		   none					 default
jails/data  redundant_metadata	   all					  default
jails/data  org.freenas:description						   local

Also:
Code:
SHA256 (/mnt/tank/data) = 146bc6323d5414537affb1290db59609c16acab94e7d09c3bd36a773c2e12ebf
SHA256 (/mnt/jails/data) = 146bc6323d5414537affb1290db59609c16acab94e7d09c3bd36a773c2e12ebf


This is just to show that that above two datasets have identical checksums, but obviously differ in size on disk.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
:confused:
The main difference appears to be in the usedbydataset property:
Read-only property that identifies the amount of disk space that is used by a dataset itself, which would be freed if the dataset was destroyed ...
There's also a difference in the compression ratios.

The underlying pools are massively different size, which could affect storage allocation efficiency, but I think the key is probably that the jails pool has no redundancy, while the tank pool does. Obviously there's an overhead when storing data with redundancy. This would imply that the tank pool has 33% redundancy, e.g. 6-disk RAIDZ2, which means the data occupies about 1.5x as much raw disk space there.
 

Xyrgh

Explorer
Joined
Apr 11, 2016
Messages
69
:confused:
The main difference appears to be in the usedbydataset property:

There's also a difference in the compression ratios.

The underlying pools are massively different size, which could affect storage allocation efficiency, but I think the key is probably that the jails pool has no redundancy, while the tank pool does. Obviously there's an overhead when storing data with redundancy. This would imply that the tank pool has 33% redundancy, e.g. 6-disk RAIDZ2, which means the data occupies about 1.5x as much raw disk space there.

That's an interesting explanation. The jails pool doesn't have redundancy per se, it's just a mirror. I didn't realise that the space taken up for redundancy would be shown like that, you learn something new every day.

Thanks!
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
The jails pool doesn't have redundancy per se, it's just a mirror
If it's a mirror, it has 50% redundancy, which means my explanation only works if ZFS reports disk usage differently on mirrors. Looking at my pool, that does appear to be the case:
Code:
root@T20:~# zfs get usedbydataset pool0/media 
NAME		 PROPERTY	   VALUE   SOURCE
pool0/media  usedbydataset  118G	-
root@T20:~# du -sh /pool0/media/
118G	/pool0/media/
root@T20:~# 
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Do keep in mind that du is mostly useless with ZFS and that zfs list and zpool list report space differently.
 

Xyrgh

Explorer
Joined
Apr 11, 2016
Messages
69
Do keep in mind that du is mostly useless with ZFS and that zfs list and zpool list report space differently.

Thanks, but these dataset sizes are directly reported in the FreeNAS GUI and match zfs list and zpool list.
 
Status
Not open for further replies.
Top