Disck activity falls to zero if there are file deletions

Joined
Apr 8, 2018
Messages
44
Hello, everyone.

I have this situation, sporadically disk activity goes to zero, investigating the situation a bit I noticed that this occurs during file deletion, but not always.

I have the following topology:
  • Data VDEVs 6 x RAIDZ2 | 11 wide | 20.01 TiB
  • Metadata VDEVs VDEVs not assigned
  • Log VDEVs VDEVs not assigned
  • Cache VDEVs VDEVs not assigned
  • Spare VDEVs VDEVs not assigned
  • Dedup VDEVs 2 x MIRROR | 2wide | 3.49 TiB
No problem is reported in any disk.

Can you tell me what this problem depends on and maybe how to avoid these freezes?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Dedup VDEVs 2 x MIRROR | 2wide | 3.49 TiB
As mentioned by @morganL deletions under dedup are very metadata/IO intensive operations. What make and model of drives are you using for this, and did you add them before enabling deduplication on your pool or dataset(s)?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Quoting this documentation:
Special vdev SSDs receive continuous, heavy I/O. HDDs and many common SSDs are inadequate. As of 2021, some recommended SSDs for deduplicated ZFS include Intel Optane 900p, 905p, P48xx, and better devices. Lower cost solutions are high quality consumer SSDs such as the Samsung EVO and PRO models. PCIe NVMe SSDs (NVMe, M.2 “M” key, or U.2) are recommended over SATA SSDs (SATA or M.2 “B” key).

When special vdevs cannot contain all the pool metadata, then metadata is silently stored on other disks in the pool. When special vdevs become too full (about 85%-90% usage), ZFS cannot run optimally and the disks operate slower. Try to keep special vdev usage under 65%-70% capacity whenever possible. Try to plan how much future data will be added to the pool, as this increases the amount of metadata in the pool. More special vdevs can be added to a pool when more metadata storage is needed.

So, as @HoneyBadger asked, please tell us your dedup drives model... and I would also ask for RAM size and CPU model as well; also, how big is your dedup table? If it's too big it will get stored into the spinners, and that would annihilate performance.
 
Joined
Apr 8, 2018
Messages
44
I enabled decuplication when creating the pool.

I use two Intel D3-S4620 (SSDSC2KG038TZ).

CPU: 2 x Intel(R) Xeon(R) Silver 4310 CPU
RAM: 512GB ECC (8 x 64 GB)

How can I see the size of my dedup table?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Joined
Apr 8, 2018
Messages
44
@Davvo Thank you!

Code:
root@archivio[~]# zpool status -D
  pool: Archivio
 state: ONLINE
  scan: scrub repaired 0B in 13 days 20:52:06 with 0 errors on Sat Oct 21 20:52:14 2023
config:

        NAME                                      STATE     READ WRITE CKSUM
        Archivio                                  ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            b097df6a-2f27-4394-9730-76261871cec3  ONLINE       0     0     0
            f0cd341a-31cb-4d1c-af7e-41135bde3067  ONLINE       0     0     0
            9ad1c3d8-ecf6-41f1-8a02-2a27441a08e8  ONLINE       0     0     0
            9e13868e-9418-4a26-b858-d480ce639fd5  ONLINE       0     0     0
            ac7d4716-ba83-4209-af42-ad7a1f7e8799  ONLINE       0     0     0
            9ad14a36-d5ad-40d3-bf11-40590853b7a8  ONLINE       0     0     0
            3062e26b-01c1-4233-8735-f235958c9d5e  ONLINE       0     0     0
            5015360e-990c-428a-a3c0-5ad57ea530f1  ONLINE       0     0     0
            cdf836cf-e017-4df8-bf4d-2b0e7b4bbf0a  ONLINE       0     0     0
            660d5ea7-049b-44be-819b-fc5866bad3ad  ONLINE       0     0     0
            361f3028-4da4-490c-a674-704918283cf4  ONLINE       0     0     0
          raidz2-1                                ONLINE       0     0     0
            3e93718b-720e-4f5f-a7f4-7b3ea3750ae2  ONLINE       0     0     0
            17a10b38-da89-4090-bad9-e7b9b58d82a1  ONLINE       0     0     0
            260476a7-a757-4293-be9e-7f94d3ad1127  ONLINE       0     0     0
            e6800b36-8d0e-41f6-8eef-cd8bda0f423b  ONLINE       0     0     0
            f0bb6ca1-3a14-479a-a3a2-3f9db1126070  ONLINE       0     0     0
            d69e75a9-76a8-419f-8fb4-964a95633db4  ONLINE       0     0     0
            873cb9f9-4b20-4cd0-a464-71b5c91669a6  ONLINE       0     0     0
            9fe2f86b-3681-4772-bcd6-dac7961a6172  ONLINE       0     0     0
            fe92d53d-a151-450c-85b4-685ea1d3d82a  ONLINE       0     0     0
            fd8a246d-5346-4b84-baf6-c1169e9238f6  ONLINE       0     0     0
            716bcb7f-d18b-4e82-866b-7afabfa68029  ONLINE       0     0     0
          raidz2-3                                ONLINE       0     0     0
            68a7a9b3-12b9-4738-9836-327b7e872bae  ONLINE       0     0     0
            851f1e90-d49a-4f44-93df-baf5cf368ad0  ONLINE       0     0     0
            8a8976b5-227a-4d2c-a7c5-b56e171e2186  ONLINE       0     0     0
            e4646155-a067-492d-b5f3-c17c8e400008  ONLINE       0     0     0
            ccfd693d-ccb2-4e87-b2fb-0458731d1c10  ONLINE       0     0     0
            a6c8d71a-26a4-40b0-aef1-7cd8a5ec31df  ONLINE       0     0     0
            27c1e6d6-274c-461f-b458-b9ecea0ab187  ONLINE       0     0     0
            ddaba7e8-5c4f-49db-86a8-3944f04d229c  ONLINE       0     0     0
            3980fb00-af74-47d0-bd6b-dc88de0e0be7  ONLINE       0     0     0
            63fb1f47-0d2a-4b11-a534-e75091418966  ONLINE       0     0     0
            ba1fa4e5-1729-4249-803b-725a5be6026d  ONLINE       0     0     0
          raidz2-4                                ONLINE       0     0     0
            ad2112a5-834f-487f-8e04-647a503bf678  ONLINE       0     0     0
            709505c7-40bc-4489-bc77-2883871dc245  ONLINE       0     0     0
            0765d45e-aeed-4300-9d28-7d8b464c5b09  ONLINE       0     0     0
            eca420ec-f3f8-4c0a-b6b2-08341a752184  ONLINE       0     0     0
            131d1bdc-be96-4365-b9bb-71cb54b7e022  ONLINE       0     0     0
            f19a7a96-2496-43af-91b0-3c12d0d534da  ONLINE       0     0     0
            5b79e55d-6d4d-43d8-863a-c741809ef504  ONLINE       0     0     0
            11f95813-4191-4b17-876e-6fb03dbf313c  ONLINE       0     0     0
            15a8a606-f117-4d35-81d2-93fcec38c451  ONLINE       0     0     0
            83aa4cf6-fa4e-4600-9d88-74d6a99a6933  ONLINE       0     0     0
            daa9f691-7ca7-4888-afea-2952f6274496  ONLINE       0     0     0
          raidz2-5                                ONLINE       0     0     0
            04056fc8-84f0-412f-8276-56fa5747f0b9  ONLINE       0     0     0
            f27444a7-f73b-45dc-91d3-07436e3d666b  ONLINE       0     0     0
            06cfc1b9-b7ab-4091-bedd-27d644bf9688  ONLINE       0     0     0
            297aca17-ffec-49e1-b70d-a9ff01289518  ONLINE       0     0     0
            91826ebe-cad1-4ed0-97f3-2b9fa4732ab6  ONLINE       0     0     0
            62b145da-f906-4096-ace3-7f5ddfe387ad  ONLINE       0     0     0
            3cc6e221-e537-4906-956f-3138ab840e62  ONLINE       0     0     0
            0ede001d-32c5-4c72-abee-493a88d495d8  ONLINE       0     0     0
            5ba8c56f-bec4-4337-9bc4-42a04025edc3  ONLINE       0     0     0
            df2be133-15da-4801-880d-e8272dd9f66d  ONLINE       0     0     0
            2e2576c6-f6a5-46a4-a81b-14e4e35c2551  ONLINE       0     0     0
          raidz2-6                                ONLINE       0     0     0
            87ca9464-1e71-4bed-93d3-f07ccef00d65  ONLINE       0     0     0
            c6091290-9990-4618-84e0-a92a83648e9e  ONLINE       0     0     0
            8e961f3b-470e-4d49-98da-60d8186afab6  ONLINE       0     0     0
            a7b9947b-1a66-4aac-9c59-25cc673e73d7  ONLINE       0     0     0
            efd33267-21bf-4647-80e4-62b3218241d5  ONLINE       0     0     0
            ecd56be3-be62-45df-9c20-f60337a4cd16  ONLINE       0     0     0
            927aed67-cb5d-4afe-8a16-f51492336bcc  ONLINE       0     0     0
            9582d73e-e2ec-4e09-899b-adee554868f5  ONLINE       0     0     0
            1d633f20-44eb-42d6-ae86-4cddbc155776  ONLINE       0     0     0
            46ddf912-20ec-470b-9ab2-66fdbd64132e  ONLINE       0     0     0
            b263089b-37e2-43b9-aee5-8cb6ec7a2126  ONLINE       0     0     0
        dedup
          mirror-2                                ONLINE       0     0     0
            d8c51d4c-1dd3-4648-a990-fdf7d6972333  ONLINE       0     0     0
            abf1aa23-632f-4042-9490-103aa8529599  ONLINE       0     0     0

errors: No known data errors

 dedup: DDT entries 5605103970, size 605B on disk, 195B in core

bucket              allocated                       referenced         
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    4.66G    594T    538T    539T    4.66G    594T    538T    539T
     2     528M   65.9T   59.7T   59.7T    1.10G    140T    127T    127T
     4    41.6M   4.20T   3.99T   4.06T     201M   18.1T   17.2T   17.6T
     8    1.00M    111G   47.5G   51.1G    11.3M   1.17T    412G    459G
    16    1.59M    203G   13.2G   21.1G    34.7M   4.32T    297G    469G
    32     475K   59.0G   14.1G   15.9G    19.1M   2.38T    612G    685G
    64    47.3K   5.90G   1.94G   2.09G    3.57M    456G    149G    161G
   128    8.39K   1.04G    285M    315M    1.34M    171G   41.6G   46.7G
   256      552   66.0M   8.70M   11.1M     166K   19.6G   2.65G   3.38G
   512      752   92.0M   5.34M   9.01M     599K   73.5G   4.12G   7.05G
    1K       60   4.22M    706K   1.13M    80.0K   5.64G    912M   1.49G
    2K       40   3.16M    226K    484K     116K   9.33G    723M   1.43G
    4K       24   1.90M     91K    256K     143K   11.7G    528M   1.44G
    8K       20   1.76M    146K    265K     223K   18.6G   1.60G   2.91G
   16K       35    300K    212K    484K     803K   6.70G   4.74G   10.9G
   32K        5    132K   92.5K    128K     237K   6.06G   4.23G   5.88G
   64K        1    512B    512B   9.14K     111K   55.6M   55.6M   1017M
  256K       47   5.88M    188K    430K    13.7M   1.71T   54.7G    125G
    1M        1   1.50K   1.50K   9.14K    1.62M   2.43G   2.43G   14.8G
 Total    5.22G    665T    602T    603T    6.04G    763T    684T    686T


  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:01:18 with 0 errors on Fri Oct 20 03:46:19 2023
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb3    ONLINE       0     0     0
            sda3    ONLINE       0     0     0

errors: No known data errors

 dedup: no DDT entries
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Joined
Apr 8, 2018
Messages
44
I don't know what calculations to do so I can't help you! :smile:

If your calculations are right that should not be the problem, the disks are 3.49 TiB.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I don't know what calculations to do so I can't help you! :smile:

If your calculations are right that should not be the problem, the disks are 3.49 TiB.

It's this line specifically here:

Code:
dedup: DDT entries 5605103970, size 605B on disk, 195B in core


You have about 5.6 billion entries in your deduplication table (DDT).

They're taking up 195 bytes each in memory which is the (roughly) 1T number that @Davvo came up with previously. This is over the 512G of RAM in your machine, but you're likely being bound by the deletion speeds on disk, so we go down to that level. (I'm glossing over the metadata ratios in ARC for now, because the problem likely lies elsewhere.)

The "605B on disk" presents a different challenge, as that's ~3T of on-disk DDT. The default tunables in ZFS reserve 25% of the special devices for "regular metadata" that's necessary for ZFS, so the 3.49TiB drives likely started shuffling DDT to your spinning disks once it crossed 2.62T used. You can check the amount of space used with zpool iostat -v - if your special drives are at or above that value, this is probably what's happened.

You can add additional special vdevs if you have bays (or swap them out for larger drives if you don't, although SATA/SAS SSDs in the 7.6T size are rather pricey) but this seems on first blush to be the cause of the problems.
 
Joined
Apr 8, 2018
Messages
44
Code:
root@archivio[~]# zpool iostat -v
                                            capacity     operations     bandwidth
pool                                      alloc   free   read  write   read  write
----------------------------------------  -----  -----  -----  -----  -----  -----
Archivio                                   811T   513T  6.16K  20.3K   178M   307M
  raidz2-0                                 208T  11.7T    485  1.57K  48.3M  47.9M
    b097df6a-2f27-4394-9730-76261871cec3      -      -     44    145  4.39M  4.35M
    f0cd341a-31cb-4d1c-af7e-41135bde3067      -      -     44    145  4.39M  4.35M
    9ad1c3d8-ecf6-41f1-8a02-2a27441a08e8      -      -     43    144  4.39M  4.35M
    9e13868e-9418-4a26-b858-d480ce639fd5      -      -     44    146  4.39M  4.35M
    ac7d4716-ba83-4209-af42-ad7a1f7e8799      -      -     43    146  4.39M  4.35M
    9ad14a36-d5ad-40d3-bf11-40590853b7a8      -      -     43    146  4.39M  4.35M
    3062e26b-01c1-4233-8735-f235958c9d5e      -      -     44    144  4.39M  4.35M
    5015360e-990c-428a-a3c0-5ad57ea530f1      -      -     44    146  4.39M  4.35M
    cdf836cf-e017-4df8-bf4d-2b0e7b4bbf0a      -      -     43    144  4.40M  4.35M
    660d5ea7-049b-44be-819b-fc5866bad3ad      -      -     43    145  4.40M  4.35M
    361f3028-4da4-490c-a674-704918283cf4      -      -     43    146  4.39M  4.35M
  raidz2-1                                 208T  12.1T    484  1.55K  48.3M  47.8M
    3e93718b-720e-4f5f-a7f4-7b3ea3750ae2      -      -     43    144  4.39M  4.34M
    17a10b38-da89-4090-bad9-e7b9b58d82a1      -      -     43    144  4.39M  4.34M
    260476a7-a757-4293-be9e-7f94d3ad1127      -      -     44    144  4.39M  4.34M
    e6800b36-8d0e-41f6-8eef-cd8bda0f423b      -      -     44    143  4.39M  4.34M
    f0bb6ca1-3a14-479a-a3a2-3f9db1126070      -      -     44    144  4.39M  4.34M
    d69e75a9-76a8-419f-8fb4-964a95633db4      -      -     44    144  4.39M  4.34M
    873cb9f9-4b20-4cd0-a464-71b5c91669a6      -      -     44    144  4.39M  4.34M
    9fe2f86b-3681-4772-bcd6-dac7961a6172      -      -     44    144  4.39M  4.34M
    fe92d53d-a151-450c-85b4-685ea1d3d82a      -      -     43    144  4.39M  4.34M
    fd8a246d-5346-4b84-baf6-c1169e9238f6      -      -     44    144  4.39M  4.34M
    716bcb7f-d18b-4e82-866b-7afabfa68029      -      -     43    144  4.39M  4.34M
  raidz2-3                                 203T  17.4T    474  1.51K  47.2M  46.7M
    68a7a9b3-12b9-4738-9836-327b7e872bae      -      -     42    141  4.29M  4.24M
    851f1e90-d49a-4f44-93df-baf5cf368ad0      -      -     42    140  4.29M  4.24M
    8a8976b5-227a-4d2c-a7c5-b56e171e2186      -      -     43    140  4.29M  4.24M
    e4646155-a067-492d-b5f3-c17c8e400008      -      -     42    140  4.29M  4.24M
    ccfd693d-ccb2-4e87-b2fb-0458731d1c10      -      -     43    141  4.29M  4.24M
    a6c8d71a-26a4-40b0-aef1-7cd8a5ec31df      -      -     43    141  4.29M  4.24M
    27c1e6d6-274c-461f-b458-b9ecea0ab187      -      -     43    141  4.29M  4.24M
    ddaba7e8-5c4f-49db-86a8-3944f04d229c      -      -     43    140  4.29M  4.24M
    3980fb00-af74-47d0-bd6b-dc88de0e0be7      -      -     43    141  4.29M  4.24M
    63fb1f47-0d2a-4b11-a534-e75091418966      -      -     43    141  4.29M  4.24M
    ba1fa4e5-1729-4249-803b-725a5be6026d      -      -     43    140  4.29M  4.24M
  raidz2-4                                83.9T   136T    167    875  23.3M  45.2M
    ad2112a5-834f-487f-8e04-647a503bf678      -      -     15     79  2.12M  4.11M
    709505c7-40bc-4489-bc77-2883871dc245      -      -     15     79  2.12M  4.11M
    0765d45e-aeed-4300-9d28-7d8b464c5b09      -      -     15     79  2.12M  4.11M
    eca420ec-f3f8-4c0a-b6b2-08341a752184      -      -     15     79  2.12M  4.11M
    131d1bdc-be96-4365-b9bb-71cb54b7e022      -      -     15     79  2.12M  4.11M
    f19a7a96-2496-43af-91b0-3c12d0d534da      -      -     15     79  2.12M  4.11M
    5b79e55d-6d4d-43d8-863a-c741809ef504      -      -     15     79  2.12M  4.11M
    11f95813-4191-4b17-876e-6fb03dbf313c      -      -     15     79  2.12M  4.11M
    15a8a606-f117-4d35-81d2-93fcec38c451      -      -     15     79  2.12M  4.11M
    83aa4cf6-fa4e-4600-9d88-74d6a99a6933      -      -     15     79  2.12M  4.11M
    daa9f691-7ca7-4888-afea-2952f6274496      -      -     14     79  2.12M  4.11M
  raidz2-5                                96.1T   124T    137  2.02K  13.7M  61.8M
    04056fc8-84f0-412f-8276-56fa5747f0b9      -      -     12    187  1.25M  5.62M
    f27444a7-f73b-45dc-91d3-07436e3d666b      -      -     12    189  1.25M  5.62M
    06cfc1b9-b7ab-4091-bedd-27d644bf9688      -      -     12    188  1.25M  5.62M
    297aca17-ffec-49e1-b70d-a9ff01289518      -      -     11    188  1.25M  5.62M
    91826ebe-cad1-4ed0-97f3-2b9fa4732ab6      -      -     11    188  1.25M  5.62M
    62b145da-f906-4096-ace3-7f5ddfe387ad      -      -     13    188  1.25M  5.62M
    3cc6e221-e537-4906-956f-3138ab840e62      -      -     12    188  1.25M  5.62M
    0ede001d-32c5-4c72-abee-493a88d495d8      -      -     12    189  1.25M  5.61M
    5ba8c56f-bec4-4337-9bc4-42a04025edc3      -      -     12    187  1.25M  5.62M
    df2be133-15da-4801-880d-e8272dd9f66d      -      -     12    188  1.25M  5.62M
    2e2576c6-f6a5-46a4-a81b-14e4e35c2551      -      -     12    187  1.25M  5.62M
  raidz2-6                                8.75T   211T      0    982  4.18K  51.9M
    87ca9464-1e71-4bed-93d3-f07ccef00d65      -      -      0     89    387  4.72M
    c6091290-9990-4618-84e0-a92a83648e9e      -      -      0     89    388  4.72M
    8e961f3b-470e-4d49-98da-60d8186afab6      -      -      0     89    385  4.72M
    a7b9947b-1a66-4aac-9c59-25cc673e73d7      -      -      0     89    388  4.72M
    efd33267-21bf-4647-80e4-62b3218241d5      -      -      0     89    391  4.72M
    ecd56be3-be62-45df-9c20-f60337a4cd16      -      -      0     89    392  4.72M
    927aed67-cb5d-4afe-8a16-f51492336bcc      -      -      0     89    392  4.72M
    9582d73e-e2ec-4e09-899b-adee554868f5      -      -      0     89    391  4.72M
    1d633f20-44eb-42d6-ae86-4cddbc155776      -      -      0     89    389  4.72M
    46ddf912-20ec-470b-9ab2-66fdbd64132e      -      -      0     89    387  4.72M
    b263089b-37e2-43b9-aee5-8cb6ec7a2126      -      -      0     89    386  4.72M
dedup                                         -      -      -      -      -      -
  mirror-2                                3.13T   358G  4.63K  14.6K  19.4M   123M
    d8c51d4c-1dd3-4648-a990-fdf7d6972333      -      -  2.31K  7.29K  9.70M  61.3M
    abf1aa23-632f-4042-9490-103aa8529599      -      -  2.32K  7.29K  9.70M  61.3M
----------------------------------------  -----  -----  -----  -----  -----  -----
boot-pool                                 6.90G   457G      1     48  59.6K  1.13M
  mirror-0                                6.90G   457G      1     48  59.6K  1.13M
    sdb3                                      -      -      0     24  31.1K   579K
    sda3                                      -      -      0     24  28.4K   579K
----------------------------------------  -----  -----  -----  -----  -----  -----


It seems that more than 3TB are being used, so is the problem excessive occupancy of dedicated dedup disks?
If yes then is it sufficient to add more dedicated dedup disks?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
It seems that more than 3TB are being used, so is the problem excessive occupancy of dedicated dedup disks?
If yes then is it sufficient to add more dedicated dedup disks?
Yes, you have to add another dedup VDEV. This will fix your crawling performance issues for now.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
It seems that more than 3TB are being used, so is the problem excessive occupancy of dedicated dedup disks?
If yes then is it sufficient to add more dedicated dedup disks?
Yes, you've overflowed from your dedup tables to spinning-disk vdevs.

Adding more capacity/throughput will help, but not immediately - any of the DDT that has overflowed to the spinning disks is going to stay there until updated, so deletions or any operations that need to access that section of the DDT will result in the same storm of small mostly-random I/O to those spindles, until those tables are updated. New data will have its DDT written to the newly-added SSDs, but you'll have that "sandwich" of metadata that was too large for the initial capacity.

Digging into your DDT histogram a little:

Code:
bucket              allocated                       referenced         
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    4.66G    594T    538T    539T    4.66G    594T    538T    539T
     2     528M   65.9T   59.7T   59.7T    1.10G    140T    127T    127T
     4    41.6M   4.20T   3.99T   4.06T     201M   18.1T   17.2T   17.6T
     8    1.00M    111G   47.5G   51.1G    11.3M   1.17T    412G    459G
    16    1.59M    203G   13.2G   21.1G    34.7M   4.32T    297G    469G
    32     475K   59.0G   14.1G   15.9G    19.1M   2.38T    612G    685G
    64    47.3K   5.90G   1.94G   2.09G    3.57M    456G    149G    161G
   128    8.39K   1.04G    285M    315M    1.34M    171G   41.6G   46.7G
   256      552   66.0M   8.70M   11.1M     166K   19.6G   2.65G   3.38G
   512      752   92.0M   5.34M   9.01M     599K   73.5G   4.12G   7.05G
    1K       60   4.22M    706K   1.13M    80.0K   5.64G    912M   1.49G
    2K       40   3.16M    226K    484K     116K   9.33G    723M   1.43G
    4K       24   1.90M     91K    256K     143K   11.7G    528M   1.44G
    8K       20   1.76M    146K    265K     223K   18.6G   1.60G   2.91G
   16K       35    300K    212K    484K     803K   6.70G   4.74G   10.9G
   32K        5    132K   92.5K    128K     237K   6.06G   4.23G   5.88G
   64K        1    512B    512B   9.14K     111K   55.6M   55.6M   1017M
  256K       47   5.88M    188K    430K    13.7M   1.71T   54.7G    125G
    1M        1   1.50K   1.50K   9.14K    1.62M   2.43G   2.43G   14.8G
 Total    5.22G    665T    602T    603T    6.04G    763T    684T    686T


You're netting 83T of savings from deduplication, the lion's share of that (67T) coming from things that are only stored twice. Is there a possibility to eliminate even just the single-copy duplicates upstream somehow? That could make deduplication significantly less necessary for your use case.
 
Joined
Oct 22, 2019
Messages
3,641
Code:
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 4.66G 594T 538T 539T 4.66G 594T 538T 539T
2 528M 65.9T 59.7T 59.7T 1.10G 140T 127T 127T
4 41.6M 4.20T 3.99T 4.06T 201M 18.1T 17.2T 17.6T
[ . . . ]
64K 1 512B 512B 9.14K 111K 55.6M 55.6M 1017M
256K 47 5.88M 188K 430K 13.7M 1.71T 54.7G 125G

1M 1 1.50K 1.50K 9.14K 1.62M 2.43G 2.43G 14.8G
Total 5.22G 665T 602T 603T 6.04G 763T 684T 686T

Does this mean there are over one million references to a very particular block? (Highlighted in bold.)

If so, that seems crazy. Perhaps a larger recordsize would yield better performance when using deduplication on non-database file storage?

EDIT: I'm speaking from a personal preference: I'd rather have an extra 15GB of storage used up on my pool, if it means I don't require more RAM, more vdevs, and the potential to slow down the systerm. Doesn't seem worth it to me if something like a common file header (or whatever can be represented with a very small recordsize) is represented by millions of metadata blocks. I'd rather just have a simpler pool that consumes more on-disk storage.
 
Last edited:
Joined
Apr 8, 2018
Messages
44
You're netting 83T of savings from deduplication, the lion's share of that (67T) coming from things that are only stored twice. Is there a possibility to eliminate even just the single-copy duplicates upstream somehow? That could make deduplication significantly less necessary for your use case.
Those who go to fill this archive are very undisciplined, the adoption of deduplication was done precisely to make sure that duplicate files are not impactful.

Adding more capacity/throughput will help, but not immediately - any of the DDT that has overflowed to the spinning disks is going to stay there until updated, so deletions or any operations that need to access that section of the DDT will result in the same storm of small mostly-random I/O to those spindles, until those tables are updated. New data will have its DDT written to the newly-added SSDs, but you'll have that "sandwich" of metadata that was too large for the initial capacity.
This relates to another operation that I would like to do which is a rebalancing of the various vdevs, but we are off topic here.
I think a better place is this topic:
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Does this mean there are over one million references to a very particular block? (Highlighted in bold.)
one-million-duplicates-wide.jpg

This is likely a thick-lazy-zero kind of situation, as records get compressed and then deduplicated, so a block of all-zeroes will be squashed to nothing and then logged as an N+1 kind of situation on the DDT. :smile:
Those who go to fill this archive are very undisciplined, the adoption of deduplication was done precisely to make sure that duplicate files are not impactful.
A very fair statement (and a situation I sympathize with!) - I just wanted to illustrate the actual observed savings you're experiencing - it's up to you to make the value judgment as to whether deduplication is worthwhile in your scenario.

Rebalancing the vdevs as mentioned in the linked thread will cause the DDT's to be rewritten, although it will not change the necessity to read/update/delete the HDD-based DDT records, so you will pass through the metaphorical "eye of the needle" in terms of a throughput bottleneck at least once.

Cheers!
 
Joined
Oct 22, 2019
Messages
3,641
and then logged as an N+1 kind of situation on the DDT.
But there's more to it than just that, correct? The concern is the sheer number metadata entries that must exist. Even if metadata doesn't consume much space, per se, that's a lot of entries that need to be available ASAP (whether in RAM or a super-fast vdev.)

Let's say there is a (compressed) block that holds the content: ABCDEF

Let's say that ABCDEF (as a block) occurs very often.

If you're holding one million entries in RAM (in freaking RAM), that means you have less RAM available for other things, be it services, userdata in the ARC, etc. If you do not have enough sufficient RAM, then as was explained earlier, special vdevs serve as a secondary (and slower) fallback to store and read these millions of metadata entries. (And then once that dries up, you're right back at square one, relying on your slow spinner HDDs.)


(See below. My "evil delayed edit" just struck again.)



But it gets crazier if we're talking about "a long sequence of zeros."

So then that brings me to this point...

This is likely a thick-lazy-zero kind of situation, as records get compressed and then deduplicated, so a block of all-zeroes will be squashed
If inline compression is squashing a sequence of "zeros" down to essentially nothing, then why deduplicate these to "over one million" references... of essentially nothing?

That seems ridiculously inefficient and redundant. The purpose of saving space (without sacrificing performance) has already been achieved. Why slap deduplication on top of that, saturating your DDT with millions of extraneous entries?

Is there nothing configurable with Dedup to say "Ignore blocks small than ___" or "Ignore blocks that are a sequence of zeros"?



EDIT: My infamous "delayed edit" strikes again! YOU WILL NEVER DEFEAT IT!

I need to do some more reading. I might have misunderstood how much overhead is added with blocks that point to DDT entries. (My hunch tells me surely they take up more storage on disk / RAM than a "compressed to nothing" series of zeros? (Which defeats the purpose of deduplicating such "squished to nothing" blocks in the first place...
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Top