Pool unhealthy; can I find why from GUI?

alecz

Dabbler
Joined
Apr 2, 2021
Messages
18
I have a 4-disk pool that TrueNAS 12 reports as unhealthy:
1620308211511.png


If I go to Pool Status, I can't tell what the problem is. There are no errors reported by any of the disks
1620308253723.png


There is an alert that says:
CRITICAL
Pool data state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.
2021-04-26 06:13:05 (America/Los_Angeles)

All the four disks are SEAGATE ST3300657SS SAS drives (which don't appear to support SMART in the GUI; SMART Test Results give "MatchNotFound" error)

From the GUI, is it possible to find why the pool is unhealthy, and if a device is problematic, which one is it?


From the shell, I ran smartctl -x /dev/daX for all drives and the most details for a suspicious drive were:
Code:
=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST3300657SS
Revision:             ES64
Compliance:           SPC-3
User Capacity:        300,000,000,000 bytes [300 GB]
Logical block size:   512 bytes
Rotation Rate:        15000 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c50039582537
Serial number:        6SJ0ZRW2
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Thu May  6 06:42:25 2021 PDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported
Read Cache is:        Enabled
Writeback Cache is:   Disabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d, ascq=0]
Current Drive Temperature:     0 C
Drive Trip Temperature:        0 C

Elements in grown defect list: 2047

Error Counter logging not supported

Device does not support Self Test logging
Device does not support Background scan results logging


All other drives also have non-zero "Elements in grown defect list", but the "SMART Health Status" is OK.

So I suspect the drive above is what is causing the pool to be unhealthy, but there is no way that I can see to confirm that in the GUI.

Did I miss how to find this information from the GUI, or would I have to go to the CLI and query all drives manually?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
1. /dev/da3p2 shows 5 read errors
2. The device for that disk is not correctly shown. Did you ever do a manual replace and used /dev/da3p2 instead of /dev/gptid/<id>?
3. You can get more info on the CLI with zpool status -v
 

alecz

Dabbler
Joined
Apr 2, 2021
Messages
18
Thanks for pointing out the obvious, I only saw zeros everywhere, didn't even notice the 5 read errors. It is indeed the output of /dev/da3 for SMART that says it's problematic.

I did not do a manual replace, I think I might have shuffled the drives in their bays, but the system was off.

I restarted the system and the pool errors are all gone. I wanted to check zpool status -v, but it is now pointless it shows no errors or problems.
So I think there was that read error which put the pool in unhealthy state, but upon restart it was cleared.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Still puzzeled about that incosistent device disable in the UI. Would you mind posting the output of zpool status?
 

alecz

Dabbler
Joined
Apr 2, 2021
Messages
18
Sure, no problem:

Code:
  pool: data
 state: ONLINE
  scan: scrub repaired 0B in 00:00:09 with 0 errors on Thu May  6 05:52:59 2021
config:

    NAME                                            STATE     READ WRITE CKSUM
    data                                            ONLINE       0     0     0
      mirror-0                                      ONLINE       0     0     0
        gptid/182e80cf-93e4-11eb-8db3-782bcb28effa  ONLINE       0     0     0
        gptid/18a2eef3-93e4-11eb-8db3-782bcb28effa  ONLINE       0     0     0
      mirror-1                                      ONLINE       0     0     0
        da3p2                                       ONLINE       0     0     0
        gptid/18bf6bfc-93e4-11eb-8db3-782bcb28effa  ONLINE       0     0     0

errors: No known data errors



I see that the da3p2 disk looks odd (does not use the gptid). I created the pool with the GUI and I didn't make any CLI changes to it. I have no idea how it ended up that way. Maybe the disk went offline and it was "gone" and then added by a reboot or something like that. I might replace it soon anyway, but it's no rush as this is a test pool (no permanent data on it.)
 
Top