Encryption size overhead of 10-13% expected?

someone1

Dabbler
Joined
Jun 17, 2013
Messages
37
Hello!

Background Info:
  • I'm running on TrueNAS-12.0-U2.1
  • I just got 6x8TB disks to make a new RaidZ2 pool with.
    • I wanted to use encryption for the new pool, so I enabled it while creating the pool using the GUI
  • I plan to migrate my existing 5x4TB RaidZ1 pool to this new pool
  • Old Pool: MegaVol
  • New Pool: GigaVol
Encryption settings on GigaVol:
Code:
$ zfs get all GigaVol
...
GigaVol  encryption            aes-256-gcm            -
GigaVol  keylocation           prompt                 local
GigaVol  keyformat             hex                    -
GigaVol  pbkdf2iters           0                      default
GigaVol  encryptionroot        GigaVol                -
GigaVol  keystatus             available              -
GigaVol  special_small_blocks  0                      default


As I can't migrate the pool due to some encrypted root error, I am migrating as follows:

Code:
// Shutdown all plugins, VMs, jails, services, shares, etc.
$ zfs snapshot -r MegaVol@migrate
$ zfs list -t snapshot -r -o name MegaVol | grep migrate
// Generate list of zfs send/receive commands to run
// Example:
$ zfs send -p MegaVol/iocage@migrate | pv | zfs receive -F -o encryption=on GigaVol/iocage


This is working great, except that all datasets in GigaVol are larger than their counterparts in MegaVol, even blank datasets are ~1% bigger in size! most datasets (so far) are 10-13% larger in size:

Code:
$ zfs get logicalused,logicalreferenced,used,referenced,compressratio,recordsize,compression,encryption,encryptionroot,usedbydataset,usedbychildren MegaVol/iocage
NAME            PROPERTY           VALUE           SOURCE
MegaVol/iocage  logicalused        8.13G           -
MegaVol/iocage  logicalreferenced  7.92M           -
MegaVol/iocage  used               5.64G           -
MegaVol/iocage  referenced         14.7M           -
MegaVol/iocage  compressratio      1.78x           -
MegaVol/iocage  recordsize         128K            default
MegaVol/iocage  compression        lz4             local
MegaVol/iocage  encryption         off             default
MegaVol/iocage  encryptionroot     -               -
MegaVol/iocage  usedbydataset      14.7M           -
MegaVol/iocage  usedbychildren     5.62G           -

Code:
$ zfs get logicalused,logicalreferenced,used,referenced,compressratio,recordsize,compression,encryption,encryptionroot,usedbydataset,usedbychildren GigaVol/iocage
NAME            PROPERTY           VALUE           SOURCE
GigaVol/iocage  logicalused        7.59G           -
GigaVol/iocage  logicalreferenced  8.64M           -
GigaVol/iocage  used               6.16G           -
GigaVol/iocage  referenced         21.0M           -
GigaVol/iocage  compressratio      1.74x           -
GigaVol/iocage  recordsize         128K            default
GigaVol/iocage  compression        lz4             received
GigaVol/iocage  encryption         aes-256-gcm     -
GigaVol/iocage  encryptionroot     GigaVol         -
GigaVol/iocage  usedbydataset      21.0M           -
GigaVol/iocage  usedbychildren     6.14G           -


I can't figure out why there's such a large difference (5.46GB used on old pool vs 6.16GB used on new pool = 12.82% more space used!) - the same compression algorithm is being used though the ratios are different, the same exact settings/data should be found in each dataset. Does encryption (or it being a RaidZ2 pool) really add so much overhead in terms of space usage?

Any advice/guidance would be greatly appreciated!
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
170
The encryption overhead with regards to disk space is negligible. I think most likely you are looking at the RAIDZ1 vs RAIDZ2 difference.
 

someone1

Dabbler
Joined
Jun 17, 2013
Messages
37
The encryption overhead with regards to disk space is negligible. I think most likely you are looking at the RAIDZ1 vs RAIDZ2 difference.

Thanks for the quick reply! I guess I'm curious as to why RaidZ2 can take up so much more space? I'm already sacrificing 2 disks vs 1 and on top of that the same data uses 10% more space of the remaining disks?

Do you have any details as to why or can you link to articles explaining the difference? I haven't been able to turn up much in my (admittedly short) research.
 

someone1

Dabbler
Joined
Jun 17, 2013
Messages
37
A quick test where I omitted the -o encryption=on option from transfer -

Original Dataset:
Code:
$ zfs get logicalused,logicalreferenced,used,referenced,compressratio,recordsize,compression,encryption,encryptionroot,usedbydataset,usedbychildren MegaVol/VirtualDisks
NAME                  PROPERTY           VALUE           SOURCE
MegaVol/VirtualDisks  logicalused        24.8G           -
MegaVol/VirtualDisks  logicalreferenced  38.5K           -
MegaVol/VirtualDisks  used               21.2G           -
MegaVol/VirtualDisks  referenced         141K            -
MegaVol/VirtualDisks  compressratio      1.49x           -
MegaVol/VirtualDisks  recordsize         128K            default
MegaVol/VirtualDisks  compression        lz4             inherited from MegaVol
MegaVol/VirtualDisks  encryption         off             default
MegaVol/VirtualDisks  encryptionroot     -               -
MegaVol/VirtualDisks  usedbydataset      141K            -
MegaVol/VirtualDisks  usedbychildren     21.2G           -


New Dataset With Encryption:
Code:
$ zfs get logicalused,logicalreferenced,used,referenced,compressratio,recordsize,compression,encryption,encryptionroot,usedbydataset,usedbychildren GigaVol/VirtualDisks
NAME                  PROPERTY           VALUE           SOURCE
GigaVol/VirtualDisks  logicalused        24.8G           -
GigaVol/VirtualDisks  logicalreferenced  54K             -
GigaVol/VirtualDisks  used               23.5G           -
GigaVol/VirtualDisks  referenced         304K            -
GigaVol/VirtualDisks  compressratio      1.49x           -
GigaVol/VirtualDisks  recordsize         128K            default
GigaVol/VirtualDisks  compression        lz4             inherited from GigaVol
GigaVol/VirtualDisks  encryption         aes-256-gcm     -
GigaVol/VirtualDisks  encryptionroot     GigaVol         -
GigaVol/VirtualDisks  usedbydataset      304K            -
GigaVol/VirtualDisks  usedbychildren     23.5G           -


New Dataset Without Encryption:
Code:
$ zfs get logicalused,logicalreferenced,used,referenced,compressratio,recordsize,compression,encryption,encryptionroot,usedbydataset,usedbychildren GigaVol/VirtualDisks2
NAME                   PROPERTY           VALUE           SOURCE
GigaVol/VirtualDisks2  logicalused        24.8G           -
GigaVol/VirtualDisks2  logicalreferenced  38.5K           -
GigaVol/VirtualDisks2  used               23.4G           -
GigaVol/VirtualDisks2  referenced         176K            -
GigaVol/VirtualDisks2  compressratio      1.49x           -
GigaVol/VirtualDisks2  recordsize         128K            default
GigaVol/VirtualDisks2  compression        lz4             inherited from GigaVol
GigaVol/VirtualDisks2  encryption         off             default
GigaVol/VirtualDisks2  encryptionroot     -               -
GigaVol/VirtualDisks2  usedbydataset      176K            -
GigaVol/VirtualDisks2  usedbychildren     23.4G           -


Without encryption shows about a 10.4% increase in disk space usage (I still don't understand why this is), with encryption is 10.8% over the original, and 0.4% over the same dataset without encryption, I guess that's arguably negligible.
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
170
I guess I'm curious as to why RaidZ2 can take up so much more space? I'm already sacrificing 2 disks vs 1 and on top of that the same data uses 10% more space of the remaining disks?

You are not sacrificing 2 disks, because RAIDZn does not really work as you expect.

In RAID5/RAID6 the overhead is fixed, you lose one or two disks worth of usable space regardless of how much data is written and what kind of data.

In RAIDZn the overhead depends on what kind of data is being written. This is because in RAIDZn, it is not the disks that are fault-tolerant, but data records. So if your data record is large (and uncompressible), it will span all disks; one or two disks will be used for parity and the rest for data. However, if the record is 512 bytes (or is compressed to 512-bytes), it will not span all disks. Two or three copies of the record will be written, producing 2x or 3x increase in size (sometimes more with padding, but that's bloody complicated). The average overhead depends on what kind of data mix is stored on the dataset. Small and/or highly compressible files will have more overhead. The effect increases as you add more redundancy, which is what you are looking at.
 
Top