Dedup working or not?

ThisTruenasUser

Dabbler
Joined
Apr 19, 2023
Messages
44
Hi
I am testing on an old machine.
It is an i7-3770, 24GB ,1TB drive, boot drive & 16GB optane.
I created a pool 1TB with the optane as a dedup volume.Created two identical sparce 600GB zvlos.
Each has dedup on, no compression, sync disabled.
Also two ISCSI for each, both the same with the 4k block option for modern OSes.
So in windows 11, VM, connected to both drives & fromatted.

The purpose is test out as I want multiple ISCSI to store games (on a much more powerful machine)
I copied 'DOOM' to each of the drives twice. It is about 70GB in size.
So 4 copies in all.

I tried installing truenas core initially, but would not boot, so are using truenas scale.

From the storage dashboard for the pool the figures are:

Usable Capacity: 1.07 TiB
  • Used: 280.94 GiB
  • Available: 814.05 GiB
It seems that it is not working, but its not thathelpful.
How exactly to I find the dedup ratio in truenas scale?
If it worked, should be between 3 and 4.

I am trying to find out if it works & if not, how is it (mis)configured?

Useful information is appreciated.

thanks
 

ThisTruenasUser

Dabbler
Joined
Apr 19, 2023
Messages
44
Oh and I need to add, after I copied all over,, this was the memory usage:

Free: 9.9 GiB

ZFS Cache: 11.7 GiB

Services: 1.9 GiB

It was not exactly taxing on memory
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Why not ask ZFS?

I know, a bit snarky. You probably don't know the commands and what they provide, yet. So, use this. My root pool on my Linux desktop does not use De-Dup, so it's 1.00x;

> zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT rpool 41.8G 17.3G 24.5G - - 59% 41% 1.00x ONLINE -

Also, here is some helpful information about De-Dup. We want you prepared for ZFS De-Dup, because most users think it is a magic bullet to get more space. It's not. But, you can do what you want.
 

ThisTruenasUser

Dabbler
Joined
Apr 19, 2023
Messages
44
It is the first thing I did.
In the gui, navigated to the shell and:
-----------------------------------------------------------------------------------------------------------

Linux truenas 5.15.79+truenas #1 SMP Mon Apr 10 14:00:27 UTC 2023 x86_64

TrueNAS (c) 2009-2023, iXsystems, Inc.
All rights reserved.
TrueNAS code is released under the modified BSD license with some
files copyrighted by (c) iXsystems, Inc.

For more information, documentation, help or support, go here:
http://truenas.com

Welcome to TrueNAS
Last login: Mon Apr 24 03:24:30 PDT 2023 on pts/2

Warning: the supported mechanisms for making configuration changes
are the TrueNAS WebUI, CLI, and API exclusively. ALL OTHERS ARE
NOT SUPPORTED AND WILL RESULT IN UNDEFINED BEHAVIOR AND MAY
RESULT IN SYSTEM FAILURE.

admin@truenas[~]$ zpool list
zsh: command not found: zpool
admin@truenas[~]$
admin@truenas[~]$
-----------------------------------------------------------------------------------------------------------

I am used to using truenas core. I assume it is anbother command, or buried in the gui somewhere?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
try sudo zpool list
 

ThisTruenasUser

Dabbler
Joined
Apr 19, 2023
Messages
44
Thank you

Here are the results.

NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
OneTB 941G 85.2G 856G - - 0% 9% 3.47x ONLINE /mnt
boot-pool 31G 2.65G 28.4G - - 0% 8% 1.00x ONLINE -

So 3.47 dedup value is quite good it seems.

the pool is for testing with 1TB and a 16GB intel optane as a dedup device.

How to I tell is 'full' of percent used the dedup device is (16GB intel optane)?

It is very important for me to find out!

I did go through this video


It claims the requirement is about 1GB deduplication for 1TB storage.
I need to figure out if this is true.

The plans is to have about 2 x (8- 12TB) ISCSI devices on a raidz1 pool (with other non dedup shares, samba & nfs), with 2 x 16GB intel optane for dedup device.
It is for games.

It may mean buying more storage, but are not doing so until I am confident the deduplication setup will be suitable.

Thanks
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Note that ZFS Special vDevs should automatically spill over to data vDevs for De-Dup metadata. (That is my understanding of the Special vDevs, the are priority for the selected metadata, but if full, then continue writing metadata to the pools main data vDevs.)

Note that zVols can have MUCH, MUCH, higher memory and De-Dup table requirements;
  • And from the ZFS manual page zfsconcepts;
    It is generally recommended that you have at least 1.25 GiB of RAM per 1 TiB of storage when you enable deduplication. Calculating the exact requirement depends heavily on the type of data stored in the pool.
  • More often, we see the "5GB of RAM per 1TB" suggestion and that notably also applies to a dataset with an average record size of 64KB. If de-duplication is applied to an iSCSI ZVOL (with a default volblocksize of 16K) this will result in the potential for 4x the memory usage, or "20GB per 1TB"
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
The plans is to have about 2 x (8- 12TB) ISCSI devices on a raidz1 pool (with other non dedup shares, samba & nfs), with 2 x 16GB intel optane for dedup device.
Raidz1 for iSCSI is a bad plan.
And you'd need much bigger dedup vdevs (redundant!) to go with ca. 10 TB of storage.

Bad plan overall. Buy more drives, use mirrors, enable compression, don't use dedup.
 
Joined
Jun 15, 2022
Messages
674
Bad plan overall. Buy more drives, use mirrors, enable compression, don't use dedup.
De-duplication is actually a good plan if we understand the need.

Gamers will install multiple versions of the same game because some updates will break some modifications from the Steam Workshop and they want to continue playing with a set of modifications that enhances their character, or characters as the case may be. They may also want to play the game with different sets of mods and to do so have one install saved per set of mods. Therefore it's not uncommon for a player who loves a game to have many installs of the same game, each "slightly" different (given the code changes, the resources not so much). The end result is they're not trying to de-duplicate data from a game, they want to de-duplicate data from a game install re-saved many times.

RAID-Z1 isn't necessarily bad. I agree it's probably not a great idea, but given gamers tend to run consumer-grade drives on SATA, without ECC RAM (or at minimum not realizing AMD just recently introduced boards that won't crash if ECC RAM is stuck in them, not that the boards probably don't do actual ECC), Realteck network cards, crappy imported cables, ... it's a data disaster waiting to happen anyway, but only every six months to a year or so. Think about it, their gaming computer is tuned for speed, not reliability beyond "it plays games fast" and stability is measured in system lockups per year where under 10 is good.

In my opinion (which granted, is not at all humble), they should be running ext4 under LVM so they can re-use old drives, and without RAID for pure speed (though with good backups). AND they should be running an OS that supports those crappy Chinese 2.5GbE cards that again are good enough for gaming, not running TrueNAS--a system that's really fast for truckloads of data but rots it out for single-user gaming.

As far as compression goes it would depend on the resources which are probably all compressed as they sit because they load faster that way.
 

ThisTruenasUser

Dabbler
Joined
Apr 19, 2023
Messages
44
De-duplication is actually a good plan if we understand the need.

Gamers will install multiple versions of the same game because some updates will break some modifications from the Steam Workshop and they want to continue playing with a set of modifications that enhances their character, or characters as the case may be. They may also want to play the game with different sets of mods and to do so have one install saved per set of mods. Therefore it's not uncommon for a player who loves a game to have many installs of the same game, each "slightly" different (given the code changes, the resources not so much). The end result is they're not trying to de-duplicate data from a game, they want to de-duplicate data from a game install re-saved many times.

RAID-Z1 isn't necessarily bad. I agree it's probably not a great idea, but given gamers tend to run consumer-grade drives on SATA, without ECC RAM (or at minimum not realizing AMD just recently introduced boards that won't crash if ECC RAM is stuck in them, not that the boards probably don't do actual ECC), Realteck network cards, crappy imported cables, ... it's a data disaster waiting to happen anyway, but only every six months to a year or so. Think about it, their gaming computer is tuned for speed, not reliability beyond "it plays games fast" and stability is measured in system lockups per year where under 10 is good.

In my opinion (which granted, is not at all humble), they should be running ext4 under LVM so they can re-use old drives, and without RAID for pure speed (though with good backups). AND they should be running an OS that supports those crappy Chinese 2.5GbE cards that again are good enough for gaming, not running TrueNAS--a system that's really fast for truckloads of data but rots it out for single-user gaming.

As far as compression goes it would depend on the resources which are probably all compressed as they sit because they load faster that way.

My situation is similar, with other things going on.
As for using raidz1,no more sata ports or space for 3.5 inch drives in the machine.
It has/will have entertainment data.
So Games that can be downloaded again, so just 1 external HD backup.
Media to stream - backup up on multiple drives.

I have managed to get it working and able to use a 16GB intel optane as deduplication. This is on the old testing machine using truenas.
It can be very resource intensive is not configured properly.
So I had to create the zvol(s) in the command line, with 1MB block sizes. The option is 128k max in the gui, even though zfs allows 1MB.
I use gaming for windows, so when connecting to the ISCSI & format it, choose 1MB block sizes.
It would be wasteful of space if using just many small files, but games are generally not like that.

My main server has a windows 10 low spec virtual machine. Its only job is to update games. That will be connected to one of the ISCSI shares.
The server also has a lancache server installed, described in this site: https://lancache.net/. It is on an ubuntu server virtual machine.
Games on the low spec virtual machine update overnight typically.

As for the windows gaming machine, I have been using primocache for years. It software to do block storage caching at https://www.romexsoftware.com/en-us/primo-cache/. I had to pay the $30 fee about 6 years ago, but it is useful. Typically it is used for maybe using an SSD to speed up a hard drive. For me, use nvme storage to speed up the second ISCSI drive.

So when I turn on my gaming machine:
Steam starts, updates games as expected. Game download speed typically far faster than internet speed due the lancache server.
Game updates & frequently played games will be likely be in the NVMe cache.
Also writes to the ISCSI drive will be fairly fast, as little actual writes to disk needed on the server due to deduplication.

The 2.5Gb network may be a bottleneck, but will wait and see. maybe look into getting a couple of 10Gb cards for a direct connection at some point.
 
Top