Disk Degrades in same slot

mijohnst

Cadet
Joined
Mar 5, 2022
Messages
7
I'm having an issue where the disk on one of the 8 slots in my Dell R520 keeps reporting a disk bad chksums on any disk I put in that particular slot. I've tried at least 4 different disks and they all eventually come back with a warning. I've tried reseating (and moving around) all the memory I have in the system. I've blown out the connector in the slot with canned air, but after a while, the pool degrades again.

When I first swapped around the memory the issue seemed to go away for about 2 months but now it's back again. To me, that indicates it is a memory issue but I would think that moving the memory around would cause the same issue with other slots, which it hasn't.

I'm running TrueNAS-SCALE-22.02.2 but I was having this issue before I migrated to Scale. Any suggestions on a deeper dive into what might be causing my issue?


Code:
        
NAME                                      STATE     READ WRITE CKSUM
        zion-pool                               DEGRADED     0     0     0
          mirror-0                                ONLINE       0     0     0
            cd686b6e-7fe1-11ec-a9e9-90b11c581c20  ONLINE       0     0     0
            80e7be66-80d5-11ec-a573-000e1e394dc8  ONLINE       0     0     0
          mirror-1                                ONLINE       0     0     0
            cd49e613-7fe1-11ec-a9e9-90b11c581c20  ONLINE       0     0     0
            cd711483-7fe1-11ec-a9e9-90b11c581c20  ONLINE       0     0     0
          mirror-2                                ONLINE       0     0     0
            cd5d53f6-7fe1-11ec-a9e9-90b11c581c20  ONLINE       0     0     0
            cd53aa82-7fe1-11ec-a9e9-90b11c581c20  ONLINE       0     0     0
          mirror-3                                DEGRADED     0     0     0
            1810d17a-6618-4ddb-a0f7-4602e0bedac3  ONLINE       0     0     0
            29b79b03-ffe7-4e29-b10d-84cf19812fd7  DEGRADED     0     0    14  too many errors
        logs
          mirror-4                                ONLINE       0     0     0
            89cdf28a-7fe2-11ec-a9e9-90b11c581c20  ONLINE       0     0     0
            89d11d81-7fe2-11ec-a9e9-90b11c581c20  ONLINE       0     0     0
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Have you checked the cables going to the backplane internally?

You might unfortunately just have a bad slot, or one of the contacts has corrosion and makes intermittent contact; all of your errors are CKSUM which is common with cabling/communication errors.
 

mijohnst

Cadet
Joined
Mar 5, 2022
Messages
7
Thanks, HoneyBadger. I really hope it's not a bad backplane, but stranger things have happened. I didn't check the cable into the backplane last time I had it opened, but I will bring it down within the next few days and open it back up to check. I might also pull out half the memory (From 64g down to 32g) just to see if it's a bad DIMM.
 
Top