SOLVED Help Locating Failed Disk

ethanf

Dabbler
Joined
Jul 1, 2017
Messages
26
I'm needing help locating a failed disk. I've followed the steps and advice that's out there, but I'm still not sure if it's correct.

Let me preface that I received an alert:
Code:
New alerts:
* Device: /dev/da28 [SAT], 952 Offline uncorrectable sectors.

Current alerts:
* Pool pool1 state is DEGRADED: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
Disk 18034248878859316976 is FAULTED

* smartd is not running.
* Device: /dev/da28 [SAT], 952 Currently unreadable (pending) sectors.
* Device: /dev/da28 [SAT], 952 Offline uncorrectable sectors.


Side note: why does it say *smartd is not running? I look at the services section and SMART is enabled and set to start automatically?

zpool status
Code:
  pool: pool1
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
  scan: scrub repaired 0B in 1 days 18:36:00 with 0 errors on Mon Nov 22 18:36:07 2021
config:

        NAME                                            STATE     READ WRITE CKSUM
        pool1                                           DEGRADED     0     0 0
          raidz2-0                                      ONLINE       0     0 0
            gptid/c0e00731-d7b3-11eb-8445-0cc47a6b40e6  ONLINE       0     0 0
            gptid/c3666b6b-d7b3-11eb-8445-0cc47a6b40e6  ONLINE       0     0 0
            gptid/c260d2bb-d7b3-11eb-8445-0cc47a6b40e6  ONLINE       0     0 0
            gptid/bf3c0de4-d7b3-11eb-8445-0cc47a6b40e6  ONLINE       0     0 0
            gptid/c5313130-d7b3-11eb-8445-0cc47a6b40e6  ONLINE       0     0 0
            gptid/c2ffbc76-d7b3-11eb-8445-0cc47a6b40e6  ONLINE       0     0 0
            gptid/c1c16aeb-d7b3-11eb-8445-0cc47a6b40e6  ONLINE       0     0 0
            gptid/c0f0e8fb-d7b3-11eb-8445-0cc47a6b40e6  ONLINE       0     0 0
            gptid/c6e88011-d7b3-11eb-8445-0cc47a6b40e6  ONLINE       0     0 0
            gptid/c73e4db4-d7b3-11eb-8445-0cc47a6b40e6  ONLINE       0     0 0
            gptid/c8018b37-d7b3-11eb-8445-0cc47a6b40e6  ONLINE       0     0 0
            gptid/c8756dc8-d7b3-11eb-8445-0cc47a6b40e6  ONLINE       0     0 0
          raidz2-1                                      ONLINE       0     0 0
            gptid/89094cb5-f911-11eb-a4f2-0cc47a6b40e6  ONLINE       0     0 0
            gptid/89a46857-f911-11eb-a4f2-0cc47a6b40e6  ONLINE       0     0 0
            gptid/8a46cb15-f911-11eb-a4f2-0cc47a6b40e6  ONLINE       0     0 0
            gptid/8c65b0b7-f911-11eb-a4f2-0cc47a6b40e6  ONLINE       0     0 0
            gptid/8d2cb002-f911-11eb-a4f2-0cc47a6b40e6  ONLINE       0     0 0
            gptid/8db912ad-f911-11eb-a4f2-0cc47a6b40e6  ONLINE       0     0 0
            gptid/97993e79-f911-11eb-a4f2-0cc47a6b40e6  ONLINE       0     0 0
            gptid/988c9ff9-f911-11eb-a4f2-0cc47a6b40e6  ONLINE       0     0 0
            gptid/9c4d7b2a-f911-11eb-a4f2-0cc47a6b40e6  ONLINE       0     0 0
            gptid/8d05a049-1755-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/9e6bc2ed-f911-11eb-a4f2-0cc47a6b40e6  ONLINE       0     0 0
            gptid/9eb2e15b-f911-11eb-a4f2-0cc47a6b40e6  ONLINE       0     0 0
          raidz2-2                                      ONLINE       0     0 0
            gptid/c9d1a950-197d-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/d86e7bbe-197d-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/e171fa7a-197d-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/e99c5fa1-197d-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/fbae1bc4-197d-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/0ccbf503-197e-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/20d6aeb5-197e-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/2e950b93-197e-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/30e661d1-197e-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/356f4099-197e-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/3a710e23-197e-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/3d4cf72f-197e-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
          raidz2-3                                      DEGRADED     0     0 0
            gptid/0b20cd6f-197f-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/0cb76b8c-197f-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/1e9071b1-197f-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/1f5834f3-197f-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/2195e9c2-197f-11ec-9bf5-90e2bad0eab0  FAULTED     53   256 0  too many errors
            gptid/333670a7-197f-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/3be8cd72-197f-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/46c5deb7-197f-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/4a18e016-197f-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/50f3af90-197f-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/523dd6c5-197f-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/55fba847-197f-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
          raidz2-4                                      ONLINE       0     0 0
            gptid/ffb7e0f8-1ff5-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/02eacba6-1ff6-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/0444d3c2-1ff6-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/0338de8c-1ff6-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/03dd7bb0-1ff6-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/060908b3-1ff6-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/05dada19-1ff6-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/05d94f76-1ff6-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/0724c3ed-1ff6-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/07d61381-1ff6-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/0795a5cb-1ff6-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0
            gptid/0837d921-1ff6-11ec-9bf5-90e2bad0eab0  ONLINE       0     0 0


glabel status
Code:
                                      Name  Status  Components
gptid/0cac60b9-cf9f-11eb-906b-0cc47a6b40e6     N/A  ada0p1
gptid/0444d3c2-1ff6-11ec-9bf5-90e2bad0eab0     N/A  da0p2
gptid/8c65b0b7-f911-11eb-a4f2-0cc47a6b40e6     N/A  da1p2
gptid/9c4d7b2a-f911-11eb-a4f2-0cc47a6b40e6     N/A  da2p2
gptid/8db912ad-f911-11eb-a4f2-0cc47a6b40e6     N/A  da3p2
gptid/bf3c0de4-d7b3-11eb-8445-0cc47a6b40e6     N/A  da4p2
gptid/c2ffbc76-d7b3-11eb-8445-0cc47a6b40e6     N/A  da5p2
gptid/c0f0e8fb-d7b3-11eb-8445-0cc47a6b40e6     N/A  da6p2
gptid/c3666b6b-d7b3-11eb-8445-0cc47a6b40e6     N/A  da7p2
gptid/97993e79-f911-11eb-a4f2-0cc47a6b40e6     N/A  da8p2
gptid/c260d2bb-d7b3-11eb-8445-0cc47a6b40e6     N/A  da9p2
gptid/060908b3-1ff6-11ec-9bf5-90e2bad0eab0     N/A  da10p2
gptid/988c9ff9-f911-11eb-a4f2-0cc47a6b40e6     N/A  da11p2
gptid/c6e88011-d7b3-11eb-8445-0cc47a6b40e6     N/A  da12p2
gptid/c8018b37-d7b3-11eb-8445-0cc47a6b40e6     N/A  da13p2
gptid/9eb2e15b-f911-11eb-a4f2-0cc47a6b40e6     N/A  da14p2
gptid/89a46857-f911-11eb-a4f2-0cc47a6b40e6     N/A  da15p2
gptid/89094cb5-f911-11eb-a4f2-0cc47a6b40e6     N/A  da16p2
gptid/c0e00731-d7b3-11eb-8445-0cc47a6b40e6     N/A  da17p2
gptid/c8756dc8-d7b3-11eb-8445-0cc47a6b40e6     N/A  da18p2
gptid/8a46cb15-f911-11eb-a4f2-0cc47a6b40e6     N/A  da19p2
gptid/9e6bc2ed-f911-11eb-a4f2-0cc47a6b40e6     N/A  da20p2
gptid/8ba4ad47-6ceb-11e7-a075-984be1659364     N/A  da21p2
gptid/8d2cb002-f911-11eb-a4f2-0cc47a6b40e6     N/A  da22p2
gptid/c5313130-d7b3-11eb-8445-0cc47a6b40e6     N/A  da23p2
gptid/95fb36e4-6ceb-11e7-a075-984be1659364     N/A  da24p2
gptid/0ccbf503-197e-11ec-9bf5-90e2bad0eab0     N/A  da25p2
gptid/20d6aeb5-197e-11ec-9bf5-90e2bad0eab0     N/A  da26p2
gptid/c9d1a950-197d-11ec-9bf5-90e2bad0eab0     N/A  da27p2
gptid/a16f84b2-3a34-11e9-b222-984be1659364     N/A  da28p2
gptid/fbae1bc4-197d-11ec-9bf5-90e2bad0eab0     N/A  da29p2
gptid/e171fa7a-197d-11ec-9bf5-90e2bad0eab0     N/A  da30p2
gptid/e99c5fa1-197d-11ec-9bf5-90e2bad0eab0     N/A  da31p2
gptid/94255900-6339-11ea-8636-90e2bad0eab0     N/A  da32p2
gptid/d86e7bbe-197d-11ec-9bf5-90e2bad0eab0     N/A  da33p2
gptid/c73e4db4-d7b3-11eb-8445-0cc47a6b40e6     N/A  da34p2
gptid/c1c16aeb-d7b3-11eb-8445-0cc47a6b40e6     N/A  da35p2
gptid/8d05a049-1755-11ec-9bf5-90e2bad0eab0     N/A  da36p2
gptid/1f5834f3-197f-11ec-9bf5-90e2bad0eab0     N/A  da37p2
gptid/1e9071b1-197f-11ec-9bf5-90e2bad0eab0     N/A  da38p2
gptid/50f3af90-197f-11ec-9bf5-90e2bad0eab0     N/A  da39p2
gptid/333670a7-197f-11ec-9bf5-90e2bad0eab0     N/A  da40p2
gptid/0cb76b8c-197f-11ec-9bf5-90e2bad0eab0     N/A  da42p2
gptid/0b20cd6f-197f-11ec-9bf5-90e2bad0eab0     N/A  da43p2
gptid/3be8cd72-197f-11ec-9bf5-90e2bad0eab0     N/A  da44p2
gptid/4a18e016-197f-11ec-9bf5-90e2bad0eab0     N/A  da45p2
gptid/02eacba6-1ff6-11ec-9bf5-90e2bad0eab0     N/A  da46p2
gptid/03dd7bb0-1ff6-11ec-9bf5-90e2bad0eab0     N/A  da47p2
gptid/ffb7e0f8-1ff5-11ec-9bf5-90e2bad0eab0     N/A  da48p2
gptid/05dada19-1ff6-11ec-9bf5-90e2bad0eab0     N/A  da49p2
gptid/05d94f76-1ff6-11ec-9bf5-90e2bad0eab0     N/A  da50p2
gptid/3a710e23-197e-11ec-9bf5-90e2bad0eab0     N/A  da51p2
gptid/d77743e4-6a2f-11e7-929a-984be1659364     N/A  da52p2
gptid/2e950b93-197e-11ec-9bf5-90e2bad0eab0     N/A  da53p2
gptid/d7e569ad-6a2f-11e7-929a-984be1659364     N/A  da54p2
gptid/d858ad5e-6a2f-11e7-929a-984be1659364     N/A  da55p2
gptid/356f4099-197e-11ec-9bf5-90e2bad0eab0     N/A  da56p2
gptid/3d4cf72f-197e-11ec-9bf5-90e2bad0eab0     N/A  da57p2
gptid/d891793a-6a2f-11e7-929a-984be1659364     N/A  da58p2
gptid/30e661d1-197e-11ec-9bf5-90e2bad0eab0     N/A  da59p2
gptid/d7abacc2-6a2f-11e7-929a-984be1659364     N/A  da60p2
gptid/523dd6c5-197f-11ec-9bf5-90e2bad0eab0     N/A  da61p2
gptid/46c5deb7-197f-11ec-9bf5-90e2bad0eab0     N/A  da62p2
gptid/55fba847-197f-11ec-9bf5-90e2bad0eab0     N/A  da63p2
gptid/0338de8c-1ff6-11ec-9bf5-90e2bad0eab0     N/A  da64p2
gptid/07d61381-1ff6-11ec-9bf5-90e2bad0eab0     N/A  da65p2
gptid/0795a5cb-1ff6-11ec-9bf5-90e2bad0eab0     N/A  da66p2
gptid/0837d921-1ff6-11ec-9bf5-90e2bad0eab0     N/A  da67p2
gptid/0724c3ed-1ff6-11ec-9bf5-90e2bad0eab0     N/A  da68p2
              gpt/Basic%20data%20partition     N/A  da69p1
gptid/30a35b04-2d15-45be-a374-840e807c40a2     N/A  da69p1
gptid/79b84db4-edb6-4582-aadf-7ab92d0968ae     N/A  da70p1
gptid/d7a51acd-6a2f-11e7-929a-984be1659364     N/A  da60p1
gptid/d88a224c-6a2f-11e7-929a-984be1659364     N/A  da58p1
gptid/d84e63cf-6a2f-11e7-929a-984be1659364     N/A  da55p1
gptid/d7dbdbf9-6a2f-11e7-929a-984be1659364     N/A  da54p1
gptid/d76e4dd0-6a2f-11e7-929a-984be1659364     N/A  da52p1
gptid/fee9b553-1ff5-11ec-9bf5-90e2bad0eab0     N/A  da48p1
gptid/026b8fa3-1ff6-11ec-9bf5-90e2bad0eab0     N/A  da47p1
gptid/021ae7ae-1ff6-11ec-9bf5-90e2bad0eab0     N/A  da46p1
gptid/441bb419-197f-11ec-9bf5-90e2bad0eab0     N/A  da45p1
gptid/3860c096-197f-11ec-9bf5-90e2bad0eab0     N/A  da44p1
gptid/030635de-197f-11ec-9bf5-90e2bad0eab0     N/A  da43p1
gptid/06937489-197f-11ec-9bf5-90e2bad0eab0     N/A  da42p1
gptid/25ac6254-197f-11ec-9bf5-90e2bad0eab0     N/A  da40p1
gptid/4c5f71f1-197f-11ec-9bf5-90e2bad0eab0     N/A  da39p1
gptid/15706e32-197f-11ec-9bf5-90e2bad0eab0     N/A  da38p1
gptid/163ee780-197f-11ec-9bf5-90e2bad0eab0     N/A  da37p1
gptid/8c5129c1-1755-11ec-9bf5-90e2bad0eab0     N/A  da36p1
gptid/badbcb72-d7b3-11eb-8445-0cc47a6b40e6     N/A  da35p1
gptid/c493ad2c-d7b3-11eb-8445-0cc47a6b40e6     N/A  da34p1
gptid/ccc67ad6-197d-11ec-9bf5-90e2bad0eab0     N/A  da33p1
gptid/9400ebdb-6339-11ea-8636-90e2bad0eab0     N/A  da32p1
gptid/d4aa5ee7-197d-11ec-9bf5-90e2bad0eab0     N/A  da31p1
gptid/d4577ce2-197d-11ec-9bf5-90e2bad0eab0     N/A  da30p1
gptid/f5295051-197d-11ec-9bf5-90e2bad0eab0     N/A  da29p1
gptid/a14a5534-3a34-11e9-b222-984be1659364     N/A  da28p1
gptid/bb5384d8-197d-11ec-9bf5-90e2bad0eab0     N/A  da27p1
gptid/146c3d00-197e-11ec-9bf5-90e2bad0eab0     N/A  da26p1
gptid/fbbef0a9-197d-11ec-9bf5-90e2bad0eab0     N/A  da25p1
gptid/95ce298b-6ceb-11e7-a075-984be1659364     N/A  da24p1
gptid/c24fe735-d7b3-11eb-8445-0cc47a6b40e6     N/A  da23p1
gptid/8b860f66-f911-11eb-a4f2-0cc47a6b40e6     N/A  da22p1
gptid/8b8b0314-6ceb-11e7-a075-984be1659364     N/A  da21p1
gptid/9ddc1522-f911-11eb-a4f2-0cc47a6b40e6     N/A  da20p1
gptid/87875e7f-f911-11eb-a4f2-0cc47a6b40e6     N/A  da19p1
gptid/c793269e-d7b3-11eb-8445-0cc47a6b40e6     N/A  da18p1
gptid/bec5121b-d7b3-11eb-8445-0cc47a6b40e6     N/A  da17p1
gptid/87993f5d-f911-11eb-a4f2-0cc47a6b40e6     N/A  da16p1
gptid/8712f26c-f911-11eb-a4f2-0cc47a6b40e6     N/A  da15p1
gptid/9e233760-f911-11eb-a4f2-0cc47a6b40e6     N/A  da14p1
gptid/c6028b1e-d7b3-11eb-8445-0cc47a6b40e6     N/A  da13p1
gptid/c4073a3f-d7b3-11eb-8445-0cc47a6b40e6     N/A  da12p1
gptid/969f7fc0-f911-11eb-a4f2-0cc47a6b40e6     N/A  da11p1
gptid/04d0c235-1ff6-11ec-9bf5-90e2bad0eab0     N/A  da10p1
gptid/bd671f2f-d7b3-11eb-8445-0cc47a6b40e6     N/A  da9p1
gptid/9558c186-f911-11eb-a4f2-0cc47a6b40e6     N/A  da8p1
gptid/bfe8c6fb-d7b3-11eb-8445-0cc47a6b40e6     N/A  da7p1
gptid/bd77ff67-d7b3-11eb-8445-0cc47a6b40e6     N/A  da6p1
gptid/bfd74ac6-d7b3-11eb-8445-0cc47a6b40e6     N/A  da5p1
gptid/bbe34c32-d7b3-11eb-8445-0cc47a6b40e6     N/A  da4p1
gptid/8c5693fb-f911-11eb-a4f2-0cc47a6b40e6     N/A  da3p1
gptid/9b70468f-f911-11eb-a4f2-0cc47a6b40e6     N/A  da2p1
gptid/8aec17ac-f911-11eb-a4f2-0cc47a6b40e6     N/A  da1p1
gptid/032be68a-1ff6-11ec-9bf5-90e2bad0eab0     N/A  da0p1
gptid/0caf5f9c-cf9f-11eb-906b-0cc47a6b40e6     N/A  ada0p3


The confusing part is that zpool status shows gptid 2195e9c2-197f-11ec-9bf5-90e2bad0eab0 being the faulty drive, but that ID isn't listed in glabel status.

The output on the SMART results of da28 seem fine. It's an older drive but doesn't show any failures:
Code:
root@truenas[~]# smartctl -a /dev/da28
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     HGST Ultrastar He6
Device Model:     HGST HUS726060ALA640
Serial Number:    AR11001EV14WWB
LU WWN Device Id: 5 000cca 231c086c3
Firmware Version: AHGNT1E2
User Capacity:    6,001,175,126,016 bytes [6.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Dec 14 20:36:36 2021 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (   57) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 882) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   132   132   054    Pre-fail  Offline      -       85
  3 Spin_Up_Time            0x0007   192   192   024    Pre-fail  Always       -       519 (Average 518)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       99
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       3
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   130   130   020    Pre-fail  Offline      -       12
  9 Power_On_Hours          0x0012   091   091   000    Old_age   Always       -       66313
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       99
 22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       6579300
192 Power-Off_Retract_Count 0x0032   098   098   000    Old_age   Always       -       2890
193 Load_Cycle_Count        0x0012   098   098   000    Old_age   Always       -       2890
194 Temperature_Celsius     0x0002   139   139   000    Old_age   Always       -       43 (Min/Max 5/50)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       952
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       952
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     64090         -
# 2  Short offline       Completed without error       00%     64089         -
# 3  Short offline       Completed without error       00%     64088         -
# 4  Short offline       Completed without error       00%     64087         -
# 5  Short offline       Completed without error       00%     64086         -
# 6  Short offline       Completed without error       00%     64085         -
# 7  Short offline       Completed without error       00%     64084         -
# 8  Short offline       Completed without error       00%     64083         -
# 9  Short offline       Completed without error       00%     64082         -
#10  Short offline       Completed without error       00%     64081         -
#11  Short offline       Completed without error       00%     64080         -
#12  Short offline       Completed without error       00%     64079         -
#13  Short offline       Completed without error       00%     64078         -
#14  Short offline       Completed without error       00%     64077         -
#15  Short offline       Completed without error       00%     64076         -
#16  Short offline       Completed without error       00%     64075         -
#17  Short offline       Completed without error       00%     64074         -
#18  Short offline       Completed without error       00%     64073         -
#19  Short offline       Completed without error       00%     64072         -
#20  Short offline       Completed without error       00%     64071         -
#21  Short offline       Completed without error       00%     64070         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Any help is greatly appreciated. Hope I was detailed.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
The output on the SMART results of da28 seem fine.
Umm, no it doesn't. If I had a drive with 952 current pending sectors and 952 offline uncorrectable sectors I'd be replacing it in a hot minute.
 
Joined
Jan 7, 2015
Messages
1,155
First how many disks are in the system and how many is it reporting are in the system right now? I bet the count is off by one, the bad one. In my experience once the disk shows up as that long number you will have to go by process of elimination looking at what disks are there and ruling them out one at a time, the one thats not in the list is the one that is bad. Unless by some chance it shows up in dmesg as detached with a serial number, which is really what you need. The disk that is da28 now is healthy, but has never had a long test ran on it, and also has 952 in both 197 and 198, that has always been considered a sign of failure. Id replace it ASAP

If your up into the da28's this could be a tedious process if the disks arent labeled.
 

ethanf

Dabbler
Joined
Jul 1, 2017
Messages
26
First how many disks are in the system and how many is it reporting are in the system right now? I bet the count is off by one, the bad one. In my experience once the disk shows up as that long number you will have to go by process of elimination looking at what disks are there and ruling them out one at a time, the one thats not in the list is the one that is bad. Unless by some chance it shows up in dmesg as detached with a serial number, which is really what you need. The disk that is da28 now is healthy, but has never had a long test ran on it, and also has 952 in both 197 and 198, that has always been considered a sign of failure. Id replace it ASAP

If your up into the da28's this could be a tedious process if the disks arent labeled.
I see what you mean by 197 and 198 fields--I see the 952 in there now. The weird thing is that da28, which was the source of the alert, was never assigned to the pool that is now degraded. It was intended to be as a (manual) hot spare. I have 2x 6TB drives that don't really fit into any of my vdevs so I was going to keep them as a spare if one of my 5TB drives failed. da28 is a 6TB HGST. So while I did get an alert on this drive, it doesn't seem to be the drive that is degrading my pool according to the gptid listing.

The problematic drive that’s affecting my pool is gptid/2195e9c2-197f-11ec-9bf5-90e2bad0eab0 according to zpool status. That gptid does NOT show up on glabel status. Is it because that drive is totally offline? Does a drive status of "FAULTED" mean that the drive is offline as well?

I see what you mean by having to go through and see which one is missing from the glabel status output…boy that’s going to be a pain. Having to grab each serial from every gptid and mark it off my list. I do have a list of all drive serials and their physical location. Was hoping there was a simpler way. If gptids aren’t supposed to change I guess I can add the gptid to my list for the future.
 
Last edited:
Joined
Jan 7, 2015
Messages
1,155
Im not 100% sure, but I think if its not in glabel then that disk is dead-dead, not detected, missing. Now a different disk has taken over da28, but it is also questionable. You should run a long smart test on every disk in your system. Note serial numbers, gptids, da#, model etc. Check for errors and what-have-you. Ive seen a well passed around script that will return all the pertinent disk info like this. Save to a file for future. I think in the end if you dont have a record of whats what you might be playing the elimination game. If you have hot spares available in the system then replacing the failed, now missing disk should trigger a resilver. But somewhere in the system will remain a dead failed disk.

As I said an easy way to verify this is comparing the expected number of disks against the detected readable number of disks. If its in there somewhere you should be able to at least get its serial number which ultimately is what you are going to need.

If you dont have records of what disk is what now is a good time to make a cheat sheet.
 

ethanf

Dabbler
Joined
Jul 1, 2017
Messages
26
Thanks for the help. I think that clarifies it. I’ll spend some time going through matching gptids to my list of serials to locate the paperweight. I checked the other disks in the degraded vdev and it’s a set of 8TB drives so that narrows it down a bit since I also have 5TB and 10TB vdevs.
 
Joined
Jan 7, 2015
Messages
1,155
Yeah my pleasure. Ive dealt with this before myself. I labeled the last 4 of the serial and keep it with the first set of numbers in gptid with the da#. It helps. You have tons of parity. Youll be alright.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
That disk with the 900+ errors is also over 7.5 years spinning, you're doing pretty well to not have more of them failing if they are all the same age.
 

blanchet

Guru
Joined
Apr 17, 2018
Messages
516
If you use a LSI HBA 3008 with a Supermicro enclosure then you can use disklist.pl and sas3ircu to activate the location led
These are the scripts that I use to locate the disk in my Supermicro enclosure.
You can adapt them for other HBA by using sas2ircu or sesutil

disk-blink.sh

Code:
#!/usr/local/bin/bash

if [ $# -eq 0 ]; then
    echo "blink disk in enclosure"
    echo "$0 serial"
fi

PATTERN=$1
DISKLIST=`dirname $0`/disklist.pl
LOCATION=`${DISKLIST} -all | grep SAS3008 | grep -i $PATTERN | sed -e 's/.*SAS3008(\(.*\)):\(.*\)#\(.*\).*/\1@\2:\3/' `
echo "LOCATION=$LOCATION"

# convert to array
arrIN=(${LOCATION//@/ })
CTRL=${arrIN[0]}  
SLOT=${arrIN[1]}
#echo "CTRL=$CTRL"
#echo "SLOT=$SLOT"
sas3ircu $CTRL LOCATE $SLOT ON | grep SAS3IRCU


disk-unblink.sh
Code:
#!/usr/local/bin/bash

if [ $# -eq 0 ]; then
    echo "unblink disk in enclosure"
    echo "$0 serial"
fi

PATTERN=$1
DISKLIST=`dirname $0`/disklist.pl
LOCATION=`${DISKLIST} -all | grep SAS3008 | grep -i $PATTERN | sed -e 's/.*SAS3008(\(.*\)):\(.*\)#\(.*\).*/\1@\2:\3/' `
echo "LOCATION=$LOCATION"

# convert to array
arrIN=(${LOCATION//@/ })
CTRL=${arrIN[0]}  
SLOT=${arrIN[1]}
#echo "CTRL=$CTRL"
#echo "SLOT=$SLOT"
sas3ircu $CTRL LOCATE $SLOT OFF | grep SAS3IRCU
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
If you have activity lights, you can use dd if=/dev/da1 of=/dev/null bs=1024k count=5000

No matter which controller you have.

You can even do it by sound or vibration if you don't have lights.
 

ethanf

Dabbler
Joined
Jul 1, 2017
Messages
26
I’ll try the script on the off chance that it’s in the supermicro box. I have a second 4U box attached as a JBOD to an external port on a separate LSI HBA. Work has been keeping me really busy so I haven’t had a chance yet to dig in.

Thank you for all the suggestions every—it’s been very helpful and reassuring.
 

ethanf

Dabbler
Joined
Jul 1, 2017
Messages
26
Process of elimination worked! Took some time to add gptid and da# to my list but I was able to identify the missing 8TB drive and replace it.

Thank again for your help everyone--especially @John Digital.
 
Top