[SOLVED] Pool import fails with panic: VERIFY message

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
Hello gentlemen,

I think I got my TrueNAS in a pretty bad shape and I need your help troubleshooting it.

  1. I experience unexpected reboot, from one every month to a couple in a day.
  2. I experience unhealthy pool from time to time.
  3. I experience CKSUM error (less than 10) on almost all drives under load
  4. Eventually, the VM FreeNAS is running on freezes totally (transfers stop, CLI hangs, and monitoring stops)

So you could say I'm in a pretty bad situation right now ! And given all these issues, I'm a bit lost where to start looking for the root cause.

Unexpected reboot :
I've taken a look at the logs at the time of the reboot and there is absolutely nothing, neither on TrueNAS itself, nor on Proxmox.​
Also, the uptime of the VM in Proxmox is 216 days (which is normal) but TrueNAS rebooted 14 hours ago at the time of writing. So it looks like a really fast crash​

Unhealthy pool and CKSUM errors:
These 2 are linked and appear during load on the server.​
6 out of 8 HDD experience the CKSUM issue. They are connected through a SAS3008 IT card in PCI passthrough from Proxmox.​
I would exclude an issue with the cabling as it is a pair of breakthrough cables (4 SATA -> 1 mini-SASHD). I would have an issue on both breakthrough which seems unlikely.​
I cannot however exclude the controler running Firmware 16.00.01.00​

Freeze of the VM :
These last 5-10s and appear every 10 minutes approximately. I've only discovered the issue today while troubleshooting the other issues.​
On TrueNAS GUI I can see gap in the monitoring graphs matching these freeze.​

Troubleshooting so far :
  • Reading too many posts about how recabling solves the issue, and about why SAS3008 card is a good card, or a bad one, or a good one, I don't remember
  • Checked the cabling and added a fan dedicated to the SAS card to cool it down
  • Looking for the firmware 16.00.12.00 after reading the post about the bug affecting previous version (no luck though as it is not publicly available)
  • Looking at dmesg on TrueNAS itself : nothing
  • Looking at logs on TrueNAS itself : nothing unusual
  • Looking at dmesg on Proxmox : flooded with "PCIe Bus Error" for the internal SATA Controller. Nothing however mentioning the PCIE SAS card so I don't know if this can be related.
Setup :
  • Proxmox 7.2-11
  • TrueNAS-12.0-U8.1
  • Motherboard : Asrock X399 (I've read some issue with this chipset so...) latest BIOS
  • PCI : Supermicro LSI SAS 3008 IT mode 16.00.01.00
  • 8 Seagate EXOS SATA drives 16TB
Outputs:
I'm not attaching any outputs right now, this post is long enough.

Best regards and thank you in advance for your help.
 

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
I was able to catch the unexpected reboot on Proxmox if it helps

Code:
panic: VERIFY(ddt_object_update(ddt, ntype, class, dde, tx) == 0) failed


No luck on Google except this link
 

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
Hello guys,

Allow me to update this post, the crash are still happening and are triggered by write activity on the disks. A read load does not crash the system but a long-enough (5 minutes) write load reboots the VM.
If you have any idea...

Cheers,
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
PCIe passthrough in Proxmox is, by their own admission, "experimental" - it may be related to IOMMU grouping on your motherboard as well. I'm not intimately familiar with the Threadripper family of hardware.

Code:
panic: VERIFY(ddt_object_update(ddt, ntype, class, dde, tx) == 0) failed

Are you using deduplication on any datasets?
 

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
I know I'm not checking all the boxes for a reliable system unfortunately...
Yes I have dedup on one dataset
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Yes I have dedup on one dataset

That's most likely your problem. Deduplication is extremely memory-hungry, as well as putting a big "4K/random" I/O load on your disks. Question time!

How much RAM is in your host system, and how much have you assigned to the TrueNAS VM? If possible, double it to begin with - especially if you have a small amount like 16GB.

Please run the following commands and post the full output inside of [code][/code] tags:

zpool list

zpool status -D poolname

This will give us an idea of how much of your RAM is being used for deduplication tables, and how much value you're getting from the dedup.
 

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
Things got worse and TrueNas does not even boot anymore unfortunately. I allocated 64GB of RAM (non ECC though).
This morning I was able to copy everything from the dedup-enabled dataset to a dedup-disabled dataset. But it looks like it copied the checksum error too because I was unable to replicate the dataset to my other system.
After this, the TrueNas VM shut down and now is looping in panic mode, see the pic attached...

This is before it goes into panic mode and reboots
TrueNas-b4crash.png


And this is the panic
TrueNas-panic.png
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
That's why it is suggested to run TrueNAS on bare metal.
Probably it's a sum of a few things on top of being a VM (which does not simplify things).
 

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
Hello,

I've installed TrueNas in a spare computer and moved all the disks.
At first I was able to get to the GUI and click on "import existing pool" (despite not exporting it before). At the import the result was exactly the same, and now the bare-metal PC is showing the exact same error.
So I've excluded the virtualization as well as the PCI card (all disks are now directly SATA-connected to the MB) and I'm in the most basic setup possible.
I think it's somehow too late, and the damage is done to the data, or at least the metadata.

Looking for this issue, I've hit some threads where people were using dedup as well and their systems crashed during a zfs send (as it happened for me) :
https://www.truenas.com/community/threads/pool-import-or-system-boot-causes-kernel-panic.83370/
https://www.reddit.com/r/zfs/comments/fpnxkr/zfs_pool_wont_import/
https://github.com/openzfs/zfs/issues/11480
https://github.com/openzfs/zfs/issues/1681

Unfortunately, from what I've read, I could not find a solution in these thread.

Do you guys have any idea how I could solve this issue ?
 
Last edited:

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
And, looking at the last function call in the crash which is fork_trampoline(), I've hit this bug report for a guy experiencing the same issue on a DELL server with the same PCI card ! (LSI 3008)
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243867

It appeared to be a hardware issue and after replacing the card, his server went fine.

So the root cause seems to be the PCI card, now I just need to get the pool back online :rolleyes:
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Looking for the firmware 16.00.12.00 after reading the post about the bug affecting previous version (no luck though as it is not publicly available)

It certainly is publicly available. Broadcom has it fairly buried away, but a copy is available here too.


panic: VERIFY(ddt_object_update(ddt, ntype, class, dde, tx) == 0) failed

That's a dedup panic. Get rid of dedup.
 

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
It certainly is publicly available. Broadcom has it fairly buried away, but a copy is available here too.




That's a dedup panic. Get rid of dedup.
I already did.

I copied everything form the dedup-dataset to a new non-dedup-dataset and got rid of the dedup-dataset. But as I said, I was later unable to replicate the non-dedup-dataset to my backup system, as id the checksum error were copied as well.
"This morning I was able to copy everything from the dedup-enabled dataset to a dedup-disabled dataset"

Regarding the LSI firmware, I did the upgrade to 16.12, but no improvement.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That's unfortunate. rsync is your friend, then, I suppose.
 

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
Well, it can be if I can at least mount the pool ! :tongue:

That's my issue right now, how to mount this pool...
I'll play with it on Wednesday, trying to at least get it read only so I can backup everything again.
 

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
That's unfortunate. rsync is your friend, then, I suppose.
Ok some news on the situation :

I've tried the following without success :

zfs mount Storage --> kernel panic
zfs mount -f Storage --> kernel panic
zfs mouns -Fn Storage --> ignored because the pool is marked as mountable

The same goes on TrueNas Scale. I've read some situations are handled differently between FreeBSD and Linux so I tried Scale instead of Core, no luck.

Also, I understand now how you found that this issue was related to the dedup feature (thanks to the DDT in the logs). However, despite removing the dedup dataset, it seems I still have a DDT histogram of around 500GB.
I've removed the dedup dataset already after copying everything to a non-dedup dataset, but the table is still there.

So either the copied files conserve the dedup parameter, or the deletion of the dedup dataset failed and now we are in this state where TrueNas tries to reference some data with dedup enabled, but this data does not exist anymore...

Code:
root@truenas[~]# zdb -eDD Storage
DDT-sha256-zap-duplicate: 418336 entries, size 1119 on disk, 158 in core
DDT-sha256-zap-unique: 3973803 entries, size 1095 on disk, 155 in core

DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    3.79M    484G    477G    476G    3.79M    484G    477G    476G
     2     357K   44.6G   44.0G   43.9G     731K   91.2G   90.0G   89.8G
     4    50.5K   6.30G   6.20G   6.19G     262K   32.7G   32.2G   32.2G
     8      483   55.3M   49.2M   49.4M    4.71K    549M    478M    481M
    16      110   12.0M   10.5M   10.7M    2.12K    234M    206M    209M
    32       17    196K   71.5K    219K      658   8.27M   2.70M   8.25M
   128        1    128K      4K   9.12K      139   17.4M    556K   1.24M
 Total    4.19M    535G    527G    526G    4.77M    608G    600G    598G

dedup = 1.14, compress = 1.01, copies = 1.00, dedup * compress / copies = 1.16


I've looked on the internet and it seems the table is supposed to shrink itself when dedup is disabled, but this happens only when the pool is mounted, which is not my case, since it desperately ends in kernel panic each time.

So, is there a way to purge this table manually ?

Some other outputs that may help...

root@truenas[~]# zdb -ed Storage
Dataset mos [META], ID 0, cr_txg 4, 4.84G, 1456 objects
Dataset Storage/Bordel@manual-2022-12-12_19-03 [ZPL], ID 26110, cr_txg 11489956, 115G, 56 objects
Dataset Storage/Bordel [ZPL], ID 80419, cr_txg 907303, 115G, 56 objects
Dataset Storage/Emby-DS@manual-2022-12-12_19-03 [ZPL], ID 26490, cr_txg 11489958, 34.8T, 179079 objects
Dataset Storage/Emby-DS [ZPL], ID 122, cr_txg 201, 34.8T, 179086 objects
Dataset Storage/Nextcloud-No-dedup [ZPL], ID 9560, cr_txg 11533040, 801G, 128750 objects
Dataset Storage/.system/rrd-6c934beb66de482e8faef2d3b30acc82@auto-2021-06-20_00-00 [ZPL], ID 25290, cr_txg 2279221, 41.3M, 662 objects
Dataset Storage/.system/rrd-6c934beb66de482e8faef2d3b30acc82@auto-2021-07-18_00-00 [ZPL], ID 92082, cr_txg 2742709, 41.3M, 662 objects
Dataset Storage/.system/rrd-6c934beb66de482e8faef2d3b30acc82@auto-2020-09-28_20-24 [ZPL], ID 147, cr_txg 860, 19.2M, 525 objects
Dataset Storage/.system/rrd-6c934beb66de482e8faef2d3b30acc82@auto-2021-08-01_00-00 [ZPL], ID 406, cr_txg 2980103, 41.9M, 662 objects
Dataset Storage/.system/rrd-6c934beb66de482e8faef2d3b30acc82@auto-2021-07-11_00-00 [ZPL], ID 78257, cr_txg 2628357, 40.7M, 662 objects
Dataset Storage/.system/rrd-6c934beb66de482e8faef2d3b30acc82@auto-2021-06-13_00-00 [ZPL], ID 51242, cr_txg 2161105, 42.5M, 662 objects
Dataset Storage/.system/rrd-6c934beb66de482e8faef2d3b30acc82@manual-2021-01-23_14-48 [ZPL], ID 172, cr_txg 722387, 21.0M, 548 objects
Dataset Storage/.system/rrd-6c934beb66de482e8faef2d3b30acc82@auto-2021-06-27_00-00 [ZPL], ID 125149, cr_txg 2389524, 40.9M, 662 objects
Dataset Storage/.system/rrd-6c934beb66de482e8faef2d3b30acc82@auto-2021-08-08_00-00 [ZPL], ID 110638, cr_txg 3099535, 42.0M, 662 objects
Dataset Storage/.system/rrd-6c934beb66de482e8faef2d3b30acc82@auto-2021-07-04_00-00 [ZPL], ID 101202, cr_txg 2508918, 40.5M, 662 objects
Dataset Storage/.system/rrd-6c934beb66de482e8faef2d3b30acc82@auto-2021-07-25_00-00 [ZPL], ID 52963, cr_txg 2860858, 43.9M, 662 objects
Dataset Storage/.system/rrd-6c934beb66de482e8faef2d3b30acc82 [ZPL], ID 83, cr_txg 15, 47.4M, 690 objects
Dataset Storage/.system/webui@auto-2021-06-20_00-00 [ZPL], ID 24955, cr_txg 2279221, 201K, 7 objects
Dataset Storage/.system/webui@auto-2020-09-28_20-24 [ZPL], ID 149, cr_txg 860, 201K, 7 objects
Dataset Storage/.system/webui@auto-2021-07-18_00-00 [ZPL], ID 92084, cr_txg 2742709, 201K, 7 objects
Dataset Storage/.system/webui@auto-2021-08-01_00-00 [ZPL], ID 163775, cr_txg 2980103, 201K, 7 objects
Dataset Storage/.system/webui@manual-2021-01-23_14-48 [ZPL], ID 174, cr_txg 722387, 201K, 7 objects
Dataset Storage/.system/webui@auto-2021-06-13_00-00 [ZPL], ID 51244, cr_txg 2161105, 201K, 7 objects
Dataset Storage/.system/webui@auto-2021-07-11_00-00 [ZPL], ID 78259, cr_txg 2628357, 201K, 7 objects
Dataset Storage/.system/webui@auto-2021-07-04_00-00 [ZPL], ID 101204, cr_txg 2508918, 201K, 7 objects
Dataset Storage/.system/webui@auto-2021-07-25_00-00 [ZPL], ID 52965, cr_txg 2860858, 201K, 7 objects
Dataset Storage/.system/webui@auto-2021-08-08_00-00 [ZPL], ID 110640, cr_txg 3099535, 201K, 7 objects
Dataset Storage/.system/webui@auto-2021-06-27_00-00 [ZPL], ID 125151, cr_txg 2389524, 201K, 7 objects
Dataset Storage/.system/webui [ZPL], ID 95, cr_txg 19, 201K, 7 objects
Dataset Storage/.system/cores@auto-2021-07-04_00-00 [ZPL], ID 101206, cr_txg 2508918, 201K, 7 objects
Dataset Storage/.system/cores@auto-2021-07-25_00-00 [ZPL], ID 52967, cr_txg 2860858, 201K, 7 objects
Dataset Storage/.system/cores@auto-2021-08-08_00-00 [ZPL], ID 110642, cr_txg 3099535, 201K, 7 objects
Dataset Storage/.system/cores@auto-2021-06-27_00-00 [ZPL], ID 125153, cr_txg 2389524, 201K, 7 objects
Dataset Storage/.system/cores@auto-2021-06-13_00-00 [ZPL], ID 51246, cr_txg 2161105, 201K, 7 objects
Dataset Storage/.system/cores@auto-2021-07-11_00-00 [ZPL], ID 78261, cr_txg 2628357, 201K, 7 objects
Dataset Storage/.system/cores@auto-2021-08-01_00-00 [ZPL], ID 408, cr_txg 2980103, 201K, 7 objects
Dataset Storage/.system/cores@manual-2021-01-23_14-48 [ZPL], ID 181, cr_txg 722387, 1.01M, 8 objects
Dataset Storage/.system/cores@auto-2021-06-20_00-00 [ZPL], ID 24957, cr_txg 2279221, 201K, 7 objects
Dataset Storage/.system/cores@auto-2020-09-28_20-24 [ZPL], ID 151, cr_txg 860, 903K, 8 objects
Dataset Storage/.system/cores@auto-2021-07-18_00-00 [ZPL], ID 92086, cr_txg 2742709, 201K, 7 objects
Dataset Storage/.system/cores [ZPL], ID 65, cr_txg 9, 201K, 7 objects
Dataset Storage/.system/services@manual-2021-01-23_14-48 [ZPL], ID 183, cr_txg 722387, 219K, 7 objects
Dataset Storage/.system/services@auto-2021-06-13_00-00 [ZPL], ID 51248, cr_txg 2161105, 219K, 7 objects
Dataset Storage/.system/services@auto-2021-07-11_00-00 [ZPL], ID 78263, cr_txg 2628357, 219K, 7 objects
Dataset Storage/.system/services@auto-2021-07-25_00-00 [ZPL], ID 52969, cr_txg 2860858, 219K, 7 objects
Dataset Storage/.system/services@auto-2021-07-04_00-00 [ZPL], ID 101208, cr_txg 2508918, 219K, 7 objects
Dataset Storage/.system/services@auto-2021-08-08_00-00 [ZPL], ID 110644, cr_txg 3099535, 219K, 7 objects
Dataset Storage/.system/services@auto-2021-06-27_00-00 [ZPL], ID 125155, cr_txg 2389524, 219K, 7 objects
Dataset Storage/.system/services@auto-2021-07-18_00-00 [ZPL], ID 92569, cr_txg 2742709, 219K, 7 objects
Dataset Storage/.system/services@auto-2021-06-20_00-00 [ZPL], ID 24959, cr_txg 2279221, 219K, 7 objects
Dataset Storage/.system/services@auto-2021-08-01_00-00 [ZPL], ID 410, cr_txg 2980103, 219K, 7 objects
Dataset Storage/.system/services [ZPL], ID 192, cr_txg 52255, 219K, 7 objects
Dataset Storage/.system/configs-6c934beb66de482e8faef2d3b30acc82@auto-2021-06-13_00-00 [ZPL], ID 51250, cr_txg 2161105, 19.7M, 138 objects
Dataset Storage/.system/configs-6c934beb66de482e8faef2d3b30acc82@auto-2021-07-11_00-00 [ZPL], ID 78404, cr_txg 2628357, 24.0M, 166 objects
Dataset Storage/.system/configs-6c934beb66de482e8faef2d3b30acc82@manual-2021-01-23_14-48 [ZPL], ID 185, cr_txg 722387, 6.96M, 52 objects
Dataset Storage/.system/configs-6c934beb66de482e8faef2d3b30acc82@auto-2021-08-08_00-00 [ZPL], ID 110646, cr_txg 3099535, 28.2M, 194 objects
Dataset Storage/.system/configs-6c934beb66de482e8faef2d3b30acc82@auto-2021-07-04_00-00 [ZPL], ID 101210, cr_txg 2508918, 22.9M, 159 objects
Dataset Storage/.system/configs-6c934beb66de482e8faef2d3b30acc82@auto-2021-07-25_00-00 [ZPL], ID 52971, cr_txg 2860858, 26.1M, 180 objects
Dataset Storage/.system/configs-6c934beb66de482e8faef2d3b30acc82@auto-2021-06-27_00-00 [ZPL], ID 124993, cr_txg 2389524, 21.8M, 152 objects
Dataset Storage/.system/configs-6c934beb66de482e8faef2d3b30acc82@auto-2021-06-20_00-00 [ZPL], ID 25541, cr_txg 2279221, 20.8M, 145 objects
Dataset Storage/.system/configs-6c934beb66de482e8faef2d3b30acc82@auto-2020-09-28_20-24 [ZPL], ID 153, cr_txg 860, 201K, 7 objects
Dataset Storage/.system/configs-6c934beb66de482e8faef2d3b30acc82@auto-2021-07-18_00-00 [ZPL], ID 92571, cr_txg 2742709, 25.0M, 173 objects
Dataset Storage/.system/configs-6c934beb66de482e8faef2d3b30acc82@auto-2021-08-01_00-00 [ZPL], ID 412, cr_txg 2980103, 27.1M, 187 objects
Dataset Storage/.system/configs-6c934beb66de482e8faef2d3b30acc82 [ZPL], ID 89, cr_txg 17, 88.6M, 693 objects
Dataset Storage/.system/samba4@manual-2021-01-23_14-48 [ZPL], ID 187, cr_txg 722387, 730K, 102 objects
Dataset Storage/.system/samba4@wbc-1668615601 [ZPL], ID 276, cr_txg 11044699, 840K, 107 objects
Dataset Storage/.system/samba4@wbc-1671017651 [ZPL], ID 38, cr_txg 11518999, 894K, 124 objects
Dataset Storage/.system/samba4@update--2021-03-25-18-12--12.0-U1.1 [ZPL], ID 654, cr_txg 812456, 730K, 124 objects
Dataset Storage/.system/samba4@wbc-1649009692 [ZPL], ID 390, cr_txg 7181732, 785K, 98 objects
Dataset Storage/.system/samba4@auto-2021-07-11_00-00 [ZPL], ID 78406, cr_txg 2628357, 821K, 115 objects
Dataset Storage/.system/samba4@wbc-1636990342 [ZPL], ID 512, cr_txg 4804793, 821K, 115 objects
Dataset Storage/.system/samba4@wbc-1671029166 [ZPL], ID 163, cr_txg 11521085, 894K, 135 objects
Dataset Storage/.system/samba4@auto-2021-06-13_00-00 [ZPL], ID 51252, cr_txg 2161105, 840K, 122 objects
Dataset Storage/.system/samba4@wbc-1648661010 [ZPL], ID 391, cr_txg 7112808, 785K, 95 objects
Dataset Storage/.system/samba4@wbc-1664014700 [ZPL], ID 283, cr_txg 10138249, 803K, 81 objects
Dataset Storage/.system/samba4@wbc-1671029020 [ZPL], ID 569, cr_txg 11521067, 894K, 127 objects
Dataset Storage/.system/samba4@auto-2021-06-27_00-00 [ZPL], ID 125320, cr_txg 2389524, 821K, 118 objects
Dataset Storage/.system/samba4@auto-2021-08-08_00-00 [ZPL], ID 110648, cr_txg 3099535, 821K, 125 objects
Dataset Storage/.system/samba4@auto-2021-07-25_00-00 [ZPL], ID 52973, cr_txg 2860858, 821K, 126 objects
Dataset Storage/.system/samba4@update--2021-08-11-08-54--12.0-U2.1 [ZPL], ID 33868, cr_txg 3158472, 821K, 125 objects
Dataset Storage/.system/samba4@auto-2021-07-04_00-00 [ZPL], ID 101212, cr_txg 2508918, 821K, 117 objects
Dataset Storage/.system/samba4@update--2021-11-15-15-30--12.0-U5 [ZPL], ID 39565, cr_txg 4804742, 858K, 118 objects
Dataset Storage/.system/samba4@wbc-1669862820 [ZPL], ID 518, cr_txg 11291568, 858K, 124 objects
Dataset Storage/.system/samba4@wbc-1671027496 [ZPL], ID 517, cr_txg 11520777, 894K, 128 objects
Dataset Storage/.system/samba4@auto-2021-07-18_00-00 [ZPL], ID 92573, cr_txg 2742709, 840K, 122 objects
Dataset Storage/.system/samba4@wbc-1668616441 [ZPL], ID 552, cr_txg 11044822, 840K, 122 objects
Dataset Storage/.system/samba4@auto-2020-09-28_20-24 [ZPL], ID 155, cr_txg 860, 319K, 33 objects
Dataset Storage/.system/samba4@wbc-1668616742 [ZPL], ID 557, cr_txg 11044852, 840K, 126 objects
Dataset Storage/.system/samba4@update--2022-03-30-17-21--12.0-U6.1 [ZPL], ID 116662, cr_txg 7112758, 785K, 97 objects
Dataset Storage/.system/samba4@auto-2021-06-20_00-00 [ZPL], ID 25543, cr_txg 2279221, 840K, 121 objects
Dataset Storage/.system/samba4@wbc-1663828789 [ZPL], ID 280, cr_txg 10102181, 785K, 77 objects
Dataset Storage/.system/samba4@wbc-1668616988 [ZPL], ID 561, cr_txg 11044875, 840K, 126 objects
Dataset Storage/.system/samba4@wbc-1667754146 [ZPL], ID 399, cr_txg 10877450, 803K, 90 objects
Dataset Storage/.system/samba4@wbc-1671098827 [ZPL], ID 289, cr_txg 11534954, 894K, 129 objects
Dataset Storage/.system/samba4@wbc-1670677500 [ZPL], ID 527, cr_txg 11451985, 894K, 124 objects
Dataset Storage/.system/samba4@update--2022-09-24-10-16--12.0-U8 [ZPL], ID 14202, cr_txg 10138212, 803K, 83 objects
Dataset Storage/.system/samba4@wbc-1664849588 [ZPL], ID 513, cr_txg 10303543, 821K, 92 objects
Dataset Storage/.system/samba4@wbc-1663920750 [ZPL], ID 284, cr_txg 10119831, 803K, 82 objects
Dataset Storage/.system/samba4@wbc-1668546783 [ZPL], ID 414, cr_txg 11034034, 840K, 96 objects
Dataset Storage/.system/samba4@auto-2021-08-01_00-00 [ZPL], ID 163419, cr_txg 2980103, 821K, 125 objects
Dataset Storage/.system/samba4 [ZPL], ID 71, cr_txg 11, 894K, 128 objects
Dataset Storage/.system/syslog-6c934beb66de482e8faef2d3b30acc82@auto-2021-07-18_00-00 [ZPL], ID 92575, cr_txg 2742709, 7.15M, 79 objects
Dataset Storage/.system/syslog-6c934beb66de482e8faef2d3b30acc82@auto-2020-09-28_20-24 [ZPL], ID 157, cr_txg 860, 383K, 35 objects
Dataset Storage/.system/syslog-6c934beb66de482e8faef2d3b30acc82@auto-2021-06-20_00-00 [ZPL], ID 25545, cr_txg 2279221, 5.23M, 76 objects
Dataset Storage/.system/syslog-6c934beb66de482e8faef2d3b30acc82@manual-2021-01-23_14-48 [ZPL], ID 387, cr_txg 722387, 1.21M, 67 objects
Dataset Storage/.system/syslog-6c934beb66de482e8faef2d3b30acc82@auto-2021-08-01_00-00 [ZPL], ID 163546, cr_txg 2980103, 6.18M, 79 objects
Dataset Storage/.system/syslog-6c934beb66de482e8faef2d3b30acc82@auto-2021-07-11_00-00 [ZPL], ID 78408, cr_txg 2628357, 6.99M, 79 objects
Dataset Storage/.system/syslog-6c934beb66de482e8faef2d3b30acc82@auto-2021-06-13_00-00 [ZPL], ID 51019, cr_txg 2161105, 6.67M, 77 objects
Dataset Storage/.system/syslog-6c934beb66de482e8faef2d3b30acc82@auto-2021-06-27_00-00 [ZPL], ID 125322, cr_txg 2389524, 6.68M, 78 objects
Dataset Storage/.system/syslog-6c934beb66de482e8faef2d3b30acc82@auto-2021-08-08_00-00 [ZPL], ID 110650, cr_txg 3099535, 7.14M, 79 objects
Dataset Storage/.system/syslog-6c934beb66de482e8faef2d3b30acc82@auto-2021-07-04_00-00 [ZPL], ID 101214, cr_txg 2508918, 5.99M, 77 objects
Dataset Storage/.system/syslog-6c934beb66de482e8faef2d3b30acc82@auto-2021-07-25_00-00 [ZPL], ID 52975, cr_txg 2860858, 5.77M, 78 objects
Dataset Storage/.system/syslog-6c934beb66de482e8faef2d3b30acc82 [ZPL], ID 77, cr_txg 13, 8.78M, 85 objects
Dataset Storage/.system@auto-2021-08-01_00-00 [ZPL], ID 404, cr_txg 2980103, 773M, 25 objects
Dataset Storage/.system@manual-2021-01-23_14-48 [ZPL], ID 170, cr_txg 722387, 219K, 17 objects
Dataset Storage/.system@auto-2021-06-20_00-00 [ZPL], ID 25747, cr_txg 2279221, 773M, 25 objects
Dataset Storage/.system@auto-2021-07-18_00-00 [ZPL], ID 92080, cr_txg 2742709, 773M, 25 objects
Dataset Storage/.system@auto-2020-09-28_20-24 [ZPL], ID 145, cr_txg 860, 219K, 15 objects
Dataset Storage/.system@auto-2021-06-27_00-00 [ZPL], ID 125147, cr_txg 2389524, 773M, 25 objects
Dataset Storage/.system@auto-2021-07-25_00-00 [ZPL], ID 53538, cr_txg 2860858, 773M, 25 objects
Dataset Storage/.system@auto-2021-07-04_00-00 [ZPL], ID 101200, cr_txg 2508918, 773M, 25 objects
Dataset Storage/.system@auto-2021-08-08_00-00 [ZPL], ID 110636, cr_txg 3099535, 772M, 25 objects
Dataset Storage/.system@auto-2021-07-11_00-00 [ZPL], ID 78721, cr_txg 2628357, 773M, 25 objects
Dataset Storage/.system@auto-2021-06-13_00-00 [ZPL], ID 51139, cr_txg 2161105, 773M, 25 objects
Dataset Storage/.system [ZPL], ID 58, cr_txg 7, 846M, 25 objects
Dataset Storage/Sylvain-DS@manual-2022-12-12_19-03 [ZPL], ID 26494, cr_txg 11489962, 303G, 66152 objects
Dataset Storage/Sylvain-DS [ZPL], ID 782, cr_txg 103618, 303G, 66152 objects
Dataset Storage [ZPL], ID 21, cr_txg 1, 219K, 11 objects
Verified large_blocks feature refcount of 0 is correct
Verified large_dnode feature refcount of 0 is correct
Verified sha512 feature refcount of 0 is correct
Verified skein feature refcount of 0 is correct
Verified userobj_accounting feature refcount of 121 is correct
Verified encryption feature refcount of 0 is correct
Verified project_quota feature refcount of 121 is correct
Verified redaction_bookmarks feature refcount of 0 is correct
Verified redacted_datasets feature refcount of 0 is correct
Verified bookmark_written feature refcount of 0 is correct
Verified livelist feature refcount of 0 is correct
Verified zstd_compress feature refcount of 0 is correct
Verified device_removal feature refcount of 0 is correct
Verified indirect_refcount feature refcount of 0 is correct

root@truenas[~]# zdb -eC Storage

MOS Configuration:
version: 5000
name: 'Storage'
state: 0
txg: 11537028
pool_guid: 7403891478952984555
errata: 0
hostid: 570136887
hostname: ''
com.delphix:has_per_vdev_zaps
vdev_children: 1
vdev_tree:
type: 'root'
id: 0
guid: 7403891478952984555
create_txg: 4
children[0]:
type: 'raidz'
id: 0
guid: 8486786016643024091
nparity: 3
metaslab_array: 45
metaslab_shift: 38
ashift: 12
asize: 127989985574912
is_log: 0
create_txg: 4
com.delphix:vdev_zap_top: 36
children[0]:
type: 'disk'
id: 0
guid: 17013278204322199397
path: '/dev/gptid/650c6126-01ad-11eb-883c-21daca7daacf'
DTL: 22160
create_txg: 4
com.delphix:vdev_zap_leaf: 37
children[1]:
type: 'disk'
id: 1
guid: 14228971779810250931
path: '/dev/gptid/d120cdee-3667-11ed-ac19-7d1fb4d87bfd'
DTL: 120519
create_txg: 4
com.delphix:vdev_zap_leaf: 120146
children[2]:
type: 'disk'
id: 2
guid: 8351602675821245731
path: '/dev/gptid/6845548d-01ad-11eb-883c-21daca7daacf'
DTL: 22158
create_txg: 4
com.delphix:vdev_zap_leaf: 39
children[3]:
type: 'disk'
id: 3
guid: 4518851381625534046
path: '/dev/gptid/7e3e9129-c9fa-11ec-ac19-7d1fb4d87bfd'
DTL: 44485
create_txg: 4
com.delphix:vdev_zap_leaf: 44827
children[4]:
type: 'disk'
id: 4
guid: 3563226997516867113
path: '/dev/gptid/6a4b706d-01ad-11eb-883c-21daca7daacf'
DTL: 22156
create_txg: 4
com.delphix:vdev_zap_leaf: 41
children[5]:
type: 'disk'
id: 5
guid: 17446234952857385234
path: '/dev/gptid/d1602ec6-ea4b-11eb-9f3a-a5e7552b2c10'
DTL: 113
create_txg: 4
com.delphix:vdev_zap_leaf: 669
children[6]:
type: 'disk'
id: 6
guid: 6073306176661101619
path: '/dev/gptid/6d163391-01ad-11eb-883c-21daca7daacf'
not_present: 1
DTL: 22154
create_txg: 4
com.delphix:vdev_zap_leaf: 43
children[7]:
type: 'disk'
id: 7
guid: 10746640750719053050
path: '/dev/gptid/6d24a6e6-01ad-11eb-883c-21daca7daacf'
DTL: 22153
create_txg: 4
com.delphix:vdev_zap_leaf: 44
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
zfs mouns -Fn

I have no idea what "-Fn" is. Lowercase "-f" is force.

So either the copied files conserve the dedup parameter,

That can't happen, at least not the way you outline. When UNIX copies a file, it does an open() syscall on a new file and proceeds to read from the old file, writing into the new file. That's what you want to happen in this case.

or the deletion of the dedup dataset failed and now we are in this state where TrueNas tries to reference some data with dedup enabled, but this data does not exist anymore...

So one of the problems with ZFS is that there is no "fsck" or "chkdsk". As I have said MANY times over the years, once some form of corruption is successfully introduced into the pool, it becomes persistent. This goes for stuff like non-ECC memory corruptions resulting in cosmic ray splorf'ed data being written out to disk with a fresh valid checksum, but it ALSO applies to stuff like metadata blocks. So now play along for what I suspect may have happened.

Metadata is system-critical data, although I believe some of it is exempted from running through the DDT mechanism (I'd have to go digging as it's been awhile; most metadata is not a candidate for effective dedup.) However, even if you delete all the files, directories, snapshots, and other directly user-affectable instances of dedup, there may be some blocks that are still legitimately active and therefore part of the DDT. If so, only the act of freeing all copies of the block will result in the DDT entry being freed.

However, if there was some sort of major I/O operation going on (massive deletion, snapshot reclamation, etc), and the system crashed, I suspect it is possible that things got into a bad state. Once you lose a referencing link to the data, there's no longer anything that will ever cause that data to be purged from the pool, and you have ghost data floating around. Remember what I said about no fsck or chkdsk?

So, is there a way to purge this table manually ?

No, not really. It is easy to know WHAT data has been subjected to DDT, you just dereference the pointer and look at the block. However, that pointer could have come from ANYWHERE. Is it part of a snapshot? Part of a file? You would have to traverse all the metadata on the system looking for references to that DDT table entry.

Those of us who have been doing this for some time treat the DDT as a table best purged by destroying and rebuilding the pool.
 

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
I have no idea what "-Fn" is. Lowercase "-f" is force.
From the man page : Recovery mode for a non-importable pool. Attempt to return the pool to an importable state by discarding the last few transactions. Not all damaged pools can be recovered by using this option. If successful, the datafrom the discarded transactions is irretrievably lost. This option is ignored if the pool is importable or already imported.

But as the last sentence says, this is ignored if the pool is importable, which seems to be my case.
 

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
Try mounting it read-only, as you're getting the panic from a DDT update:

zfs mount -o ro Storage

If you can do this, you'll want to copy the files to an entirely separate pool (not just a dataset)

After some tries, this works. I was trying everything I could to prevent this from happening because I have 38TB to transfer, but it looks like my last resort.
And of course, during the transfer, I'm now hitting a bug on my NIC which prevent me from transfering :rolleyes: I love IT sometimes...
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
After some tries, this works. I was trying everything I could to prevent this from happening because I have 38TB to transfer, but it looks like my last resort.
And of course, during the transfer, I'm now hitting a bug on my NIC which prevent me from transfering :rolleyes: I love IT sometimes...

I'm glad we've got the pool mounted - what's the NIC bug that's choking out transfers? The ASRock X399 series says "Intel NIC" on their page but doesn't specify in any more detail. The only issues I've ever had with Intel cards have been with the i225-V
 
Top