Bizarre CHKSUM errors

bellmr

Cadet
Joined
Sep 20, 2017
Messages
5
I am a relative newbie, but have a bit of a grasp on TrueNAS. My problem is that one of my pools is showing as unhealthy, and all the drives in that pool are showing the same numbers of CHKSUM errors. If the server is rebooted, the error count drops to 12 (or maybe less), but one can see it climbing steadily, and each of the four drives shows the same number of errors. I have done scrubs, but nothing seems to help. smartctl for each drive shows no errors, but zpool -v status shows the below : (this is after the server has been up for 2hrs 47mins, so shows the steady climb of the CHKSUM errors on NAS_Volume

zpool status -v

Code:
codeblock
zpool status -v


  pool: Jails_Vol


 state: ONLINE


status: Some supported and requested features are not enabled on the pool.


    The pool can still be used, but some features are unavailable.


action: Enable all features using 'zpool upgrade'. Once this is done,


    the pool may no longer be accessible by software that does not support


    the features. See zpool-features(7) for details.


  scan: scrub canceled on Thu Aug 31 13:27:57 2023


remove: Removal of vdev 1 copied 5.35M in 0h0m, completed on Thu Feb 16 12:13:48 2023


    6.12K memory used for removed device mappings


config:





    NAME                                          STATE     READ WRITE CKSUM


    Jails_Vol                                     ONLINE       0     0     0


      gptid/58c3104d-b43b-11ed-a2b1-38d547270c9d  ONLINE       0     0     0





errors: No known data errors





  pool: NAS_Volume


 state: ONLINE


status: One or more devices has experienced an error resulting in data


    corruption.  Applications may be affected.


action: Restore the file in question if possible.  Otherwise restore the


    entire pool from backup.


   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A


  scan: scrub in progress since Mon Sep  4 13:46:09 2023


    1.72T scanned at 520M/s, 1.26T issued at 383M/s, 4.00T total


    0B repaired, 31.62% done, 02:04:45 to go


config:





    NAME                                            STATE     READ WRITE CKSUM


    NAS_Volume                                      ONLINE       0     0     0


      raidz1-0                                      ONLINE       0     0     0


        gptid/80be4e45-a941-11e6-8723-38d547270c9d  ONLINE       0     0 7.50K


        gptid/8176984f-a941-11e6-8723-38d547270c9d  ONLINE       0     0 7.50K


        gptid/8222f585-a941-11e6-8723-38d547270c9d  ONLINE       0     0 7.50K


        gptid/82d5dcfc-a941-11e6-8723-38d547270c9d  ONLINE       0     0 7.50K





errors: Permanent errors have been detected in the following files:





        <0xffffffffffffffff>:<0x1052>





  pool: boot-pool


 state: ONLINE


config:





    NAME        STATE     READ WRITE CKSUM


    boot-pool   ONLINE       0     0     0


      da0p2     ONLINE       0     0     0





errors: No known data errors

Hope someone might have an idea why this is occurring, or what else I can do to rectify or identify faults. All of the drives are SATA using the mobo onboard connectors, I have change all of the SATA cables, changed the PSU, now I am wondering if the mobo is damaged.
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Wait for the scrub to finish.
Please provide your hardware specs, and after the scrub is completed the result of SMART long tests.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
This looks like a cable or controller issue.
 

bellmr

Cadet
Joined
Sep 20, 2017
Messages
5
This looks like a cable or controller issue.
Hi Etorix,

That's what I am thinking. I have changed the cables, the power cables, and the power supply, so my thoughts were with the mobo now. But the fact the 5th drive doesn't show anything and its only in the NAS_Volume pool is tripping me. When the scub has finished and
Code:
smartctl
has finished I should have further info.
 
Top