Pool unhealthy; can I find why from GUI?

alecz · May 6, 2021

I have a 4-disk pool that TrueNAS 12 reports as unhealthy:

If I go to Pool Status, I can't tell what the problem is. There are no errors reported by any of the disks

There is an alert that says:
CRITICAL
Pool data state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.
2021-04-26 06:13:05 (America/Los_Angeles)

All the four disks are SEAGATE ST3300657SS SAS drives (which don't appear to support SMART in the GUI; SMART Test Results give "MatchNotFound" error)

From the GUI, is it possible to find why the pool is unhealthy, and if a device is problematic, which one is it?

From the shell, I ran smartctl -x /dev/daX for all drives and the most details for a suspicious drive were:

Code:

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST3300657SS
Revision:             ES64
Compliance:           SPC-3
User Capacity:        300,000,000,000 bytes [300 GB]
Logical block size:   512 bytes
Rotation Rate:        15000 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c50039582537
Serial number:        6SJ0ZRW2
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Thu May  6 06:42:25 2021 PDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported
Read Cache is:        Enabled
Writeback Cache is:   Disabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d, ascq=0]
Current Drive Temperature:     0 C
Drive Trip Temperature:        0 C

Elements in grown defect list: 2047

Error Counter logging not supported

Device does not support Self Test logging
Device does not support Background scan results logging

All other drives also have non-zero "Elements in grown defect list", but the "SMART Health Status" is OK.

So I suspect the drive above is what is causing the pool to be unhealthy, but there is no way that I can see to confirm that in the GUI.

Did I miss how to find this information from the GUI, or would I have to go to the CLI and query all drives manually?

Patrick M. Hausen · May 6, 2021

1. /dev/da3p2 shows 5 read errors
2. The device for that disk is not correctly shown. Did you ever do a manual replace and used /dev/da3p2 instead of /dev/gptid/<id>?
3. You can get more info on the CLI with zpool status -v

alecz · May 6, 2021

Thanks for pointing out the obvious, I only saw zeros everywhere, didn't even notice the 5 read errors. It is indeed the output of /dev/da3 for SMART that says it's problematic.

I did not do a manual replace, I think I might have shuffled the drives in their bays, but the system was off.

I restarted the system and the pool errors are all gone. I wanted to check zpool status -v, but it is now pointless it shows no errors or problems.
So I think there was that read error which put the pool in unhealthy state, but upon restart it was cleared.

Patrick M. Hausen · May 6, 2021

Still puzzeled about that incosistent device disable in the UI. Would you mind posting the output of zpool status?

alecz · May 6, 2021

Sure, no problem:

Code:

  pool: data
 state: ONLINE
  scan: scrub repaired 0B in 00:00:09 with 0 errors on Thu May  6 05:52:59 2021
config:

    NAME                                            STATE     READ WRITE CKSUM
    data                                            ONLINE       0     0     0
      mirror-0                                      ONLINE       0     0     0
        gptid/182e80cf-93e4-11eb-8db3-782bcb28effa  ONLINE       0     0     0
        gptid/18a2eef3-93e4-11eb-8db3-782bcb28effa  ONLINE       0     0     0
      mirror-1                                      ONLINE       0     0     0
        da3p2                                       ONLINE       0     0     0
        gptid/18bf6bfc-93e4-11eb-8db3-782bcb28effa  ONLINE       0     0     0

errors: No known data errors

I see that the da3p2 disk looks odd (does not use the gptid). I created the pool with the GUI and I didn't make any CLI changes to it. I have no idea how it ended up that way. Maybe the disk went offline and it was "gone" and then added by a reboot or something like that. I might replace it soon anyway, but it's no rush as this is a test pool (no permanent data on it.)

Important Announcement for the TrueNAS Community.

Pool unhealthy; can I find why from GUI?

alecz

Dabbler

Patrick M. Hausen

Hall of Famer

alecz

Dabbler

Patrick M. Hausen

Hall of Famer

alecz

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Pool unhealthy; can I find why from GUI?

alecz

Dabbler

Patrick M. Hausen

Hall of Famer

alecz

Dabbler

Patrick M. Hausen

Hall of Famer

alecz

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Pool unhealthy; can I find why from GUI?"

Similar threads