A lot of checksum errors on first scrub on new disks

ArGGu

Cadet
Joined
Jun 3, 2021
Messages
4
Hello,

I have build my first truenas systems, I build two of them Main and Backup.
For both I bought 13 new Seagate IronWolf 4TB disks, on Main I have single pool with 8 disk(2 RAIDZ2) and on Backup 5 disk(RAIDZ1).

On both system I run the first scrub today, on Main no problems, but my Backup had a lot checksum errors. Main system has ECC Ram and Backup NON-ECC.
So I'm wondering if the NON-ECC Ram is the culprit for the checksum errors and do I need to worry about these?

Here is the zpool status from my Backup
Code:
  pool: Backup
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 7.25M in 01:31:45 with 0 errors on Mon Jul 12 11:01:45 2021
config:

        NAME                                            STATE     READ WRITE CKSUM
        Backup                                          ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/4ec4a4db-d515-11eb-874b-408d5c1fdf05  ONLINE       0     0     9
            gptid/4edf1150-d515-11eb-874b-408d5c1fdf05  ONLINE       0     0     8
            gptid/4ef4118c-d515-11eb-874b-408d5c1fdf05  ONLINE       0     0     6
            gptid/4f1febec-d515-11eb-874b-408d5c1fdf05  ONLINE       0     0     4
            gptid/4f0a47f9-d515-11eb-874b-408d5c1fdf05  ONLINE       0     0     3

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:03 with 0 errors on Fri Jul  9 03:45:03 2021
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          ada0p2    ONLINE       0     0     0

errors: No known data errors


Also here is from my Main
Code:
  pool: Main
 state: ONLINE
scan: scrub repaired 0B in 01:32:28 with 0 errors on Mon Jul 12 11:02:29 2021
config:

NAME STATE READ WRITE CKSUM
Main ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/cc5c3751-ce0e-11eb-9b29-a8a1594c9b7f ONLINE 0 0 0
gptid/cd23f431-ce0e-11eb-9b29-a8a1594c9b7f ONLINE 0 0 0
gptid/cd6dcd14-ce0e-11eb-9b29-a8a1594c9b7f ONLINE 0 0 0
gptid/cd933cff-ce0e-11eb-9b29-a8a1594c9b7f ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
gptid/cc4d5c36-ce0e-11eb-9b29-a8a1594c9b7f ONLINE 0 0 0
gptid/cd3e1b8f-ce0e-11eb-9b29-a8a1594c9b7f ONLINE 0 0 0
gptid/cd807d7c-ce0e-11eb-9b29-a8a1594c9b7f ONLINE 0 0 0
gptid/cd586cb2-ce0e-11eb-9b29-a8a1594c9b7f ONLINE 0 0 0

errors: No known data errors

pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:02 with 0 errors on Fri Jul 9 03:45:02 2021
config:

NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvd0p2 ONLINE 0 0 0
nvd1p2 ONLINE 0 0 0

errors: No known data errors


My Backup system hardware is:
Gigabyte GA-Z170N-WIFI
Intel i7-6700
16GB NON-ECC
Data pool: RAIDZ1 (5 x 4TB Seagate IronWolf)
Boot pool: Samsung 850 EVO 120GB

Also should I run zpool clear and the scrub again?
 

ArGGu

Cadet
Joined
Jun 3, 2021
Messages
4
Hello,

I would like apologize for not investigating the problem futher before posting this, I found that the integrated gpu on the cpu has problems with faster ram than 2133.

I have used the motherboard+cpu+ram combination since 2015/01 and no problems with ram before, but I had discrete gpu before.
So I have only run memtest when I had discrete gpu, which has not produced any errors. I'm going to run it today with integrated gpu to see if it gets errors.

But as I'm newbie on Truenas and ZFS, I would like to ask, that is the data fine on my Backup system, as it seems scrub repaired the data and reports no known data errors. Or should I remove all snapshots on my Backup and replicate again from my Main system
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Also should I run zpool clear and the scrub again?
Check the cables first, checksum errors are often related to bad cabling.

Also look at dmesg for CAM STATUS messages about the disks.
 

ArGGu

Cadet
Joined
Jun 3, 2021
Messages
4
Check the cables first, checksum errors are often related to bad cabling.

Also look at dmesg for CAM STATUS messages about the disks.
Thanks for help, I will check this.

I brought the Backup system to my home, so it is easier to troubleshoot.
I put it to run the memtest already, so I will let it to run that first and then check the cables and the dmesg.
 

ArGGu

Cadet
Joined
Jun 3, 2021
Messages
4
Check the cables first, checksum errors are often related to bad cabling.

Also look at dmesg for CAM STATUS messages about the disks.
Could not figure what was the problem, fixed the problem with moving all my disk to another old pc I had.
It has only two SATA 6gb/s and four SATA 3gb/s, but it does not matter as it is just for backups.
But on that scrub runs without any errors.

Thanks for the help :)
 
Last edited:
Top