Truenas keeps reporting data corruption in iX-applications/docker/... But there is no zfs checksum errors

SnoppyFloppy

Explorer
Joined
Jun 17, 2021
Messages
77
Hi

Over the last month, TrueNAS have warned me a few times that it has found data corruption on my iX-applications/docker/... which resides on my SSD pool.

Besides the iX-applications dataset the SSD pool also store a lot of personal data - which takes up far more space that iX-applications dataset - but only the iX-applications dataset have been affected by corruption. All apps however, continue to work.

Here is the output of zpool status -v. I was suprised to see that there was no checksum errors.

Code:
admin@truenas:~$ sudo zpool status -v fast1
  pool: fast1
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:39:43 with 0 errors on Sun Sep 25 00:39:52 2022
config:

NAME                                          STATE    READ  WRITE  CKSUM
fast1                                         ONLINE      0      0      0
   mirror-0                                   ONLINE      0      0      0
      87c4023d-3a60-4754-a898-e3dfe191f714    ONLINE      0      0      0
      16ada901-ebec-4318-9b87-c44ffb1d1578    ONLINE      0      0      0

errors: Permanent errors have been detected in the following files:

        fast1/ix-applications/docker/d7b1207cf6dfc4539216bb3b5b091a6979eab14a9b0bc5f6aa1b9a0cb475ae24-init:<0x0>
        <0xd433>:<0x0>
        <0x18d4e>:<0x0>
        <0x25469>:<0x0>
        fast1/ix-applications/docker/26fbd001165268ca4ed3a56bdd08d5560b901c351e4fcb4279fcee07cd7c96e0-init:<0x0>
        fast1/ix-applications/docker/cba084459dd22ee7291c61b3daa5204f99bcea03add21b69a15902ecff44d7ec-init:<0x0>
        fast1/ix-applications/docker/06fe6a99fbc5f83bf960e820351cc1e5c5cd362fbb4180734c3422ef133b605d-init:<0x0>
        fast1/ix-applications/docker/caf2d0a081c08750a355c8494f15eb86ee1b5e55aa1818ab1b3767ad5dcb1a6a-init:<0x0>
        fast1/ix-applications/docker/5cb33588e5493b9cf2bc9574bf09e126da1aba8580f8c26e83f79442a53c64c2-init:<0x0>
        fast1/ix-applications/docker/98afae9a53d418aaaf8307c34b5cab1b0bc05a2c325f310019f8dd292ac8d727-init:<0x0>
        <0x12cb>:<0x0>
        <0xe6>:<0x0>
        <0x12e9>:<0x0>


I have Truetool making backups every night and these are replicated to my other pool so I do have backups but as the errors have accumulated over time I wouldn't like to roll back unless it's absolutely necessary.

I'm on 22.02.4 now but all errors occurred while on 22.04.3

I'm no big expert in zfs so I really don't know what to do from here?

Hardware:
Code:
Supermicro X9DRH-7F
2x Xeon E5-2650v2
4x 32GB 1600 MHz ECC DDR3 ram (SK Hynix)
boot pool: 2x Sandisk Plus 240GB (mirrored)
fast1 pool: 2x Crusial MX500 1TB (mirrored)
rust1 pool 2x Ironwolf Pro 8TB (mirrored)
Nvidia GTX 1660 Ti
Be Quiet! Straight Power 750
 
Last edited:

SnoppyFloppy

Explorer
Joined
Jun 17, 2021
Messages
77
By the way, the SSDs in the fast1 pool are connected to the two only sata3 ports on the MB. The port are in ACHI mode with Agressive Link Power Management Disabled.
 

SnoppyFloppy

Explorer
Joined
Jun 17, 2021
Messages
77
Does anyone have any idea what to do about this problem?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
That smells a bit like the issue with the Silicon controllers like in the WD Green SSDs... related to Trim not working right... although I thought it would be limited to CORE as it was due to that OS and its handling of TRIM, not TRIM itself.

You have metadata corruption and file corruption there and seems like it's corrupted on both sides of the mirror, hence no checksum errors, just bad data and no good copies to correct it from.

You'll need to rebuild your pool and consider trying different media in that process.
 

SnoppyFloppy

Explorer
Joined
Jun 17, 2021
Messages
77
Okay. That doesn't sound too good.

I realized that the Crusial MX500 doesn't support trim a while back when I looked into how it would be supported to connect them to an LSI HBA controller. So I clearly didn't do my research good enough.

I do have Auto TRIM turned off on the pool though, so shouldn't that fix the problem?
 
Top