Troubleshooting Unhealthy ZPool Status

FreeNASUser5129

Dabbler
Joined
Feb 18, 2012
Messages
15
Hi Friends,

I am running TrueNAS Core 12.0-U8.

I recently needed to replace one of my hard disks within my zpool. After replacing and resilvering, my zpool status says Unhealthy.

Screenshot 2022-04-13 065513.png


I ran
Code:
zpool status -x

I saw that it is showing ZFS-8000-8A as the error, and referenced this documentation page.

Based on that page I ran
Code:
zpool status -xv

with this result:

Code:
root@freenas:~ # zpool status -xv
pool: Volume1
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 06:57:34 with 15 errors on Sat Apr  9 14:07:52 2022
config:

        NAME                                                STATE     READ WRITE CKSUM
        Volume1                                             ONLINE       0     0     0
          raidz1-0                                          ONLINE       0     0     0
            gptid/7a341d28-0344-11e5-8c94-002590dbbf1d.eli  ONLINE       0     0    47
            gptid/84901eb6-067c-11e3-8226-d43d7e93e546.eli  ONLINE       0     0    47
            gptid/855049fe-067c-11e3-8226-d43d7e93e546.eli  ONLINE       0     0    47
            gptid/6d5b0978-aefb-11ec-a75a-002590dbbf1d.eli  ONLINE       0     0    47
            gptid/b52424a6-a670-11e5-a798-002590dbbf1d.eli  ONLINE       0     0    47
            gptid/e5c9026a-56f7-11e5-8bb4-002590dbbf1d.eli  ONLINE       0     0    47

errors: Permanent errors have been detected in the following files:

        /var/db/system/syslog-92e6ac8b6342418e99afb49a8441c54c/log/debug.log
        /var/db/system/syslog-92e6ac8b6342418e99afb49a8441c54c/log/console.log
        /var/db/system/syslog-92e6ac8b6342418e99afb49a8441c54c/log/messages
        /var/db/system/syslog-92e6ac8b6342418e99afb49a8441c54c/log/daemon.log
        /var/db/system/syslog-92e6ac8b6342418e99afb49a8441c54c/log/middlewared.log
        /var/db/system/rrd-92e6ac8b6342418e99afb49a8441c54c/localhost/disktemp-ada1/temperature.rrd
        /var/db/system/rrd-92e6ac8b6342418e99afb49a8441c54c/localhost/disktemp-ada3/temperature.rrd


I have done several pool scrubs, and I have run each of the 6 disks through Manual LONG S.M.A.R.T. tests. Everything completes successfully.

The action of 'destroying the pool and re-creating from a backup' seems risky and like something I may not achieve successfully. I'm hoping someone can provide some guidance for what I should do to resolve the unhealthy status?

Thanks very much.
 
Last edited:

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
You should put your command output in code tages to make it easier to read.
Putting that aside, all your disks are reporting checksum errors which are usually caused by a cabling issue.
 

FreeNASUser5129

Dabbler
Joined
Feb 18, 2012
Messages
15
Thanks for the tip. I've edited the post to put the command output into code tags.

Does it make sense to shut down the server and physically re-seat the drives themselves? I have a hot swap cage for the HDDs, so I didn't open anything up to replace the drive.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Does it make sense to shut down the server and physically re-seat the drives themselves?
If the Checksum error counts are still going up, yes, otherwise... and maybe as you found while I was typing... no.

zpool clear Volume1 will reset the counters

The simplest solution may be to just move the system dataset off that pool, make sure all those files are gone, then put it back.
 

FreeNASUser5129

Dabbler
Joined
Feb 18, 2012
Messages
15
I have reset the counters.

Any chance you could help me with moving the system dataset and getting rid of the files? This begins to push the boundaries of my knowledge on how to work on this (I'm a home user).
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
System | System Dataset (select pool other than Volume1... can be the Boot pool temporarily)

at the Shell, cd to the location(s)

cd /var/db/system/syslog-92e6ac8b6342418e99afb49a8441c54c/log/

ls -l to see what's there, then:

remove the file(s)

rm debug.log

or skip the cd if you don't care for it:
rm /var/db/system/syslog-92e6ac8b6342418e99afb49a8441c54c/log/debug.log
 

FreeNASUser5129

Dabbler
Joined
Feb 18, 2012
Messages
15
Hey guys - I'm finally back to troubleshooting this - thanks again for all of the replies so far. I've done as suggested above - moved the System Dataset to my Boot pool and removed the debug logs. There were 2 files there, debug.0 and debug.1 - so I removed both.

Now my pool state says Degraded, and TrueNAS tells me that I can't move the System Dataset back to Volume1 because it's encrypted.

What's my best next course of action please @sretalla ?
 
Last edited:
Top