SOLVED zpool degraded on truenas

polakkenak · Jun 10, 2022

Hi folks. I'm fairly new to TrueNAS and ZFS, so please bear with me.

I recently bought a minisforum HM90 to serve serve my need for running VMs at home.
I put in 2x8GB of new memory and two older sata SSDs I had laying around.

After running for a short while, SCALE warns me that one of the pools are in a degraded state. This happens on a fresh install of SCALE 22.02.1

Code:

root@truenas[~]# zpool status boot-pool
  pool: boot-pool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:00:13 with 0 errors on Fri Jun 10 07:59:52 2022
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   DEGRADED     0     0     0
          sdb3      DEGRADED    23     0    34  too many errors

errors: No known data errors

After clearing the pool status and running a new scrub, the error will appear almost immediately again.

Code:

root@truenas[~]# zpool clear boot-pool
root@truenas[~]# zpool scrub boot-pool
root@truenas[~]# zpool status boot-pool
  pool: boot-pool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:00:13 with 0 errors on Fri Jun 10 08:42:57 2022
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   DEGRADED     0     0     0
          sdb3      DEGRADED    16     0    79  too many errors

errors: No known data errors

Doing some testing, I found that the problem would appear on whichever disk was plugged into the "JFPC1" port, while the drive attached to the other port had no problem.

I thought this problem was caused by a hardware problem (e.g. faulty sata port/controller) and contacted the reseller to enquire about repairs and the reseller said this could be a compatibility problem and I should try to replicate the problem on windows before sending it in for repairs.

I tried this, and I wasn't able to cause windows to complain about the drive during the testing I did. Windows doesn't really offer zfs support, so take this statement with a grain of salt.

I have tried replicating the problem again with a live usb for both proxmox and ubuntu server, and neither of these detected any problem. Running TrueNAS in recovery mode (off the boot drive, not USB) also does not show any problems with the drive.

So, to summarize:

zpool status says a pool is degraded
Changing "sata" port move the problem to the other drive
Testing while running off a live USB doesn't exhibit the same problem
Testing while in truenas recovery mode doesn't exhibit the same problem

I'm stumped as to what is causing this problem. Could a setting be causing this problem?

Dice · Jun 10, 2022

polakkenak said:
reseller said this could be a compatibility problem

I think the reseller is on the right track.
I guestimate, based on your excellent attempts, that the sata controller / or part of if its setup, might not be well supported in the underlying FreeBsd13.

Have a look at the recommended hardware guide - it might appear slightly dated - but it jist of it holds very true to this day.
If you can't find similar system to yours, of similar age as the guide (spoiler - you will not), it is safe to say this is why there is a handy hardware recommendations guide.

Hardware Recommendations Guide

This is the latest edition of the FreeNAS Community hardware recommendations guide. The current major version is R2, dated January 2021, with the last minor update on 2021-01-24. The format has moved away from the forum post form factor, to...

www.truenas.com

Kudos to your efforts.

polakkenak · Jun 12, 2022

Thank you for the link to the hardware guide, I'll keep it handy for next time I'm putting something together.

I managed to solve this by modifying the power settings as described in this post over at minisforum.

My guess is that the recovery mode (and live USBs) weren't mounting the other SSD, so it might be related to power supplied to the two drives, since it looks like they're sharing the same SATA controller according to the output from lshw.

polakkenak · Jun 12, 2022

For what it's worth, Truenas SCALE is using med_power_with_dipm as the power mode for the scsi hosts.
Testing showed that this problem applied to all of the med_power_with_dipm variants, but using medium_power or max_performance works great.

Dice · Jun 13, 2022

Thank you for your contributions!

Important Announcement for the TrueNAS Community.

SOLVED zpool degraded on truenas

polakkenak

Cadet

Dice

Wizard

Hardware Recommendations Guide

polakkenak

Cadet

polakkenak

Cadet

Dice

Wizard

Similar threads

Important Announcement for the TrueNAS Community.

SOLVED zpool degraded on truenas

polakkenak

Cadet

Dice

Wizard

Hardware Recommendations Guide

polakkenak

Cadet

polakkenak

Cadet

Dice

Wizard

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "zpool degraded on truenas"

Similar threads