Are 2 of My Drives Failed? (See Edit: Moving Data Onto To New Vdev, To Remove Old)

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
EDIT: SEE FROM POST #106 FOR UPDATES
(https://www.truenas.com/community/threads/are-2-of-my-drives-failed.111640/post-777996)



I find the way the GUI and CLI list the pool very confusing when looking at spare drives, and failed/failing drives.
From what it looks like, 2 spare drives are in use. However I don't recall ever seeing notifications for a drive having issues or dying, for a drive I have not already replaced? Also it says my pool is healthy. So I am confused.

It shows my 2 spares, and says currently in use, unavailable.
Under mirror-5 it has a gptid and then it says spare-1 with 2 gptids.
And then under mirror-6 it has a gptid, and then it says spare-1 with 2 more gptids.

I'm extremely confused.

And as a side note, I really think TrueNAS could improve the obviousness/clearness of failing/failed drives, and which ones they are (as shown in the GUI), as well as which spares are being used where. Status indicators under the "Disks" tab would be start. Green if good, red if issue. For spares maybe Gray if not in use, Yellow if in use. Same with under Status > Pools.
It feels like a confusing mess the way it's laid out imho.

Code:
# zpool status -v
  pool: PrimaryPool
 state: ONLINE
  scan: scrub repaired 0B in 14:29:29 with 0 errors on Mon Jul 24 17:29:30 2023
config:

        NAME                                              STATE     READ WRITE CKSUM
        PrimaryPool                                       ONLINE       0     0   0
          mirror-0                                        ONLINE       0     0   0
            gptid/d7476d46-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/d8d6aa36-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-1                                        ONLINE       0     0   0
            gptid/d9a6f5dc-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/db71bcb5-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-2                                        ONLINE       0     0   0
            gptid/d8b2f42f-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/d96847a9-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-3                                        ONLINE       0     0   0
            gptid/d9fb7757-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/da1e1121-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-4                                        ONLINE       0     0   0
            gptid/9fd0872d-8f64-11ec-8462-002590f52cc2    ONLINE       0     0   0
            gptid/9ff0f041-8f64-11ec-8462-002590f52cc2    ONLINE       0     0   0
          mirror-5                                        ONLINE       0     0   0
            gptid/14811777-1b6d-11ed-8423-ac1f6be66d76    ONLINE       0     0   0
            spare-1                                       ONLINE       0     0   0
              gptid/03daa071-505c-11ed-a9fe-ac1f6be66d76  ONLINE       0     0   0
              gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   0
          mirror-6                                        ONLINE       0     0   0
            gptid/749a1891-1b5c-11ee-941f-ac1f6be66d76    ONLINE       0     0   0
            spare-1                                       ONLINE       0     0   0
              gptid/4710dd39-1b6d-11ed-8423-ac1f6be66d76  ONLINE       0     0   0
              gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   0
        spares
          gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use
          gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use

errors: No known data errors

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:01:11 with 0 errors on Sat Jul 29 03:46:12 2023
config:
 
Last edited:

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Somehow, two drives had issues, or were temporarily not available, and spares have kicked in. You are responsible for monitoring this, including setting up email alerts. It is also your responsibility, as ZFS administrator, to assess the situation (are some drives unhealthy? what's in SMART reports?) and decide which drive to keep in pool, which drive should possibly be replaced, and which drive may be returned as spare.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
I understand I am responsible for monitoring this, I'm not sure where I indicated in my post that I am not?
E-mail alerts are set up, as well as notifications in the web gui both which I check often. Ontop of this, I mainly use the multi_report script for daily e-mail reports on the pool and smart status. (https://www.truenas.com/community/resources/multi_report-sh-version-for-core-and-scale.179/ https://github.com/JoeSchmuck/Multi-Report/blob/main/multi_report_v2.4.3_2023_06_16.txt)

I am no expert on reading SMART stats. But I am not seeing anything as far as errors go?
So again, my question is based around confusion as to why the spare drives are in use.

I currently do not see the pool showing as degraded or errored. I'm not seeing any smart errors, they all say passed.
And normally I will get a notification from TrueNAS when a drive has an issue, and everytime that has occured, I have replaced the drive. I do not recall seeing the spare drives in use after it is replaced, and re-silvered. And usually when I get a notification that it is having issues, the pool also shows as degraded.
None of these things are currently showing, so I do not understand why spare drives are assigned to those 2 mirrors.
Furthermore, why each mirror has 2 GPTID's showing as spares. I only have 2 spare drives, not 4.

This was the last report the script gave this morning:
Code:
Multi-Report Text Section

External Configuration File in use dtd:2023-06-16
TrueNAS Configuration File --> Emailed every: Mon
Statistical Data File Created.

Statistical Export Log Located: /root/bin/report-generator/statisticalsmartdata.csv --> Emailed every: Mon

WARNING LOG FILE
Drive: N8GEX1NY - Test Age = 405 Days
Drive: N8GEBBRY - Test Age = 405 Days
Drive: NHG9ZP7Y - Test Age = 405 Days
Drive: NHG9JAAY - Test Age = 405 Days
Drive: N8GG3NYY - Test Age = 405 Days
Drive N8GEW7XY High Drive Temp 45 - Threshold set at 45
Drive: N8GEW7XY - Test Age = 405 Days
Drive N8GEX1PY High Drive Temp 45 - Threshold set at 45
Drive: N8GEX1PY - Test Age = 405 Days
Drive: N8GEBDYY - Test Age = 405 Days
Drive: N8GG150Y - Test Age = 405 Days


END

########## ZPool status report for PrimaryPool ##########
  pool: PrimaryPool
 state: ONLINE
  scan: scrub repaired 0B in 14:29:29 with 0 errors on Mon Jul 24 17:29:30 2023
config:

    NAME                                              STATE     READ WRITE CKSUM
    PrimaryPool                                       ONLINE       0     0     0
      mirror-0                                        ONLINE       0     0     0
        gptid/d7476d46-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0
        gptid/d8d6aa36-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0
      mirror-1                                        ONLINE       0     0     0
        gptid/d9a6f5dc-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0
        gptid/db71bcb5-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0
      mirror-2                                        ONLINE       0     0     0
        gptid/d8b2f42f-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0
        gptid/d96847a9-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0
      mirror-3                                        ONLINE       0     0     0
        gptid/d9fb7757-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0
        gptid/da1e1121-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0
      mirror-4                                        ONLINE       0     0     0
        gptid/9fd0872d-8f64-11ec-8462-002590f52cc2    ONLINE       0     0     0
        gptid/9ff0f041-8f64-11ec-8462-002590f52cc2    ONLINE       0     0     0
      mirror-5                                        ONLINE       0     0     0
        gptid/14811777-1b6d-11ed-8423-ac1f6be66d76    ONLINE       0     0     0
        spare-1                                       ONLINE       0     0     0
          gptid/03daa071-505c-11ed-a9fe-ac1f6be66d76  ONLINE       0     0     0
          gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0     0
      mirror-6                                        ONLINE       0     0     0
        gptid/749a1891-1b5c-11ee-941f-ac1f6be66d76    ONLINE       0     0     0
        spare-1                                       ONLINE       0     0     0
          gptid/4710dd39-1b6d-11ed-8423-ac1f6be66d76  ONLINE       0     0     0
          gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0     0
    spares
      gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use
      gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use

errors: No known data errors

Drives for this pool are listed below:
d7476d46-32ca-11ec-b815-002590f52cc2 -> da10
d8d6aa36-32ca-11ec-b815-002590f52cc2 -> da11
d9a6f5dc-32ca-11ec-b815-002590f52cc2 -> da0
db71bcb5-32ca-11ec-b815-002590f52cc2 -> da1
d8b2f42f-32ca-11ec-b815-002590f52cc2 -> da6
d96847a9-32ca-11ec-b815-002590f52cc2 -> da7
d9fb7757-32ca-11ec-b815-002590f52cc2 -> da8
da1e1121-32ca-11ec-b815-002590f52cc2 -> da2
9fd0872d-8f64-11ec-8462-002590f52cc2 -> da4
9ff0f041-8f64-11ec-8462-002590f52cc2 -> da5
14811777-1b6d-11ed-8423-ac1f6be66d76 -> da3
03daa071-505c-11ed-a9fe-ac1f6be66d76 -> da14
0d56b97d-1e91-11ed-a6aa-ac1f6be66d76 -> da13
749a1891-1b5c-11ee-941f-ac1f6be66d76 -> da15
4710dd39-1b6d-11ed-8423-ac1f6be66d76 -> da9
0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76 -> da12
0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76 -> da12
0d56b97d-1e91-11ed-a6aa-ac1f6be66d76 -> da13


########## ZPool status report for boot-pool ##########
  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
    The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:01:11 with 0 errors on Sat Jul 29 03:46:12 2023
config:

    NAME        STATE     READ WRITE CKSUM
    boot-pool   ONLINE       0     0     0
      da16p2    ONLINE       0     0     0

errors: No known data errors


########## SMART status report for da0 drive (HGST HUS726040AL4210 : N8GEX1NY) ##########

SMART Health Status: OK

Current Drive Temperature:     42 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 48557:09
Manufactured in week 29 of year 2016
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  154
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2177
Elements in grown defect list: 1

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       58         0        58   13372934     454302.142           0
write:         0     2828         0      2828    3389882      36543.682           0
verify:        0        0         0         0     410966          0.000           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background long Self test in progress ... - NOW - [- - -]




########## SMART status report for da1 drive (HGST HUS726040AL4210 : N8GEBA2Y) ##########

SMART Health Status: OK

Current Drive Temperature:     44 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 48556:03
Manufactured in week 29 of year 2016
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  161
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2177
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       55         0        55   12561060     454022.668           0
write:         0        2         0         2    2074132      36441.851           0
verify:        0        0         0         0     515545          0.000           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background short Completed - 48542 - [- - -]




########## SMART status report for da2 drive (HGST HUS726040AL4210 : N8GEBBRY) ##########

SMART Health Status: OK

Current Drive Temperature:     43 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 48555:50
Manufactured in week 29 of year 2016
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  155
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2172
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       33         0        33   13027642     463491.625           0
write:         0        1         0         1    2206321      36533.819           0
verify:        0        0         0         0     393835          0.000           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background long Self test in progress ... - NOW - [- - -]




########## SMART status report for da3 drive (HGST HUS726040AL4210 : NHGAJWEY) ##########

SMART Health Status: OK

Current Drive Temperature:     42 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 48287:24
Manufactured in week 06 of year 2016
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  101
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2107
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       24         0        24   14422317     225627.825           0
write:         0       10         0        10    1030410      34952.512           0
verify:        0        0         0         0     630896          0.000           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background short Completed - 48273 - [- - -]




########## SMART status report for da4 drive (HGST HUS726040AL4210 : NHG9ZP7Y) ##########

SMART Health Status: OK

Current Drive Temperature:     40 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 48339:00
Manufactured in week 06 of year 2016
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  92
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2110
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       13         0        13   13703330     268578.100           0
write:         0        1         0         1     596788      38887.596           0
verify:        0        0         0         0     940657          0.000           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background long Self test in progress ... - NOW - [- - -]




########## SMART status report for da5 drive (HGST HUS726040AL4210 : NHG9JAAY) ##########

SMART Health Status: OK

Current Drive Temperature:     43 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 48288:40
Manufactured in week 06 of year 2016
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  96
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2106
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0      253         0       253   13668338     256222.301           0
write:         0       10         0        10    2612198      39639.922           0
verify:        0        0         0         0    4764455          0.000           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background long Self test in progress ... - NOW - [- - -]




########## SMART status report for da6 drive (HGST HUS726040AL4210 : N8GG3NYY) ##########

SMART Health Status: OK

Current Drive Temperature:     44 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 48556:05
Manufactured in week 29 of year 2016
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  150
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2171
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        3         0         3   13486546     468597.936           0
write:         0        0         0         0    1081671      35595.475           0
verify:        0        0         0         0     854699          0.000           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background long Self test in progress ... - NOW - [- - -]




########## SMART status report for da7 drive (HGST HUS726040AL4210 : N8GEW7XY) ##########

SMART Health Status: OK

Current Drive Temperature:     45 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 48555:24
Manufactured in week 29 of year 2016
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  145
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2161
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        8         0         8   14054491     463057.483           0
write:         0       15         0        15    1651557      35976.650           0
verify:        0        0         0         0     540146          0.000           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background long Self test in progress ... - NOW - [- - -]




########## SMART status report for da8 drive (HGST HUS726040AL4210 : N8GEX1PY) ##########

SMART Health Status: OK

Current Drive Temperature:     45 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 48541:30
Manufactured in week 29 of year 2016
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  146
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2162
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       66         0        66   12497339     468147.414           0
write:         0        1         0         1    2412800      36459.509           0
verify:        0        0         0         0     523748          0.000           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background long Self test in progress ... - NOW - [- - -]




########## SMART status report for da9 drive (HITACHI HUS72604CLAR4000 : K4K0BBJB) ##########

SMART Health Status: OK

Current Drive Temperature:     42 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 37508:38
Manufactured in week 17 of year 2017
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  177
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  1709
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       25         0        25   20915848     413392.882           0
write:         0        7         0         7     674346     107703.687           0
verify:        0        2         0         2    1626590        645.024           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background short Completed - 37494 - [- - -]




########## SMART status report for da10 drive (HGST HUS726040AL4210 : N8GEBDYY) ##########

SMART Health Status: OK

Current Drive Temperature:     39 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 48541:05
Manufactured in week 29 of year 2016
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  136
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2151
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0      165         0       165   13039442     454258.839           0
write:         0        7         0         7    1532709      35740.830           0
verify:        0        0         0         0     527317          0.000           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background long Self test in progress ... - NOW - [- - -]




########## SMART status report for da11 drive (HGST HUS726040AL4210 : N8GG150Y) ##########

SMART Health Status: OK

Current Drive Temperature:     43 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 48540:39
Manufactured in week 29 of year 2016
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  142
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2160
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        7         0         7   11730523     438780.105           0
write:         0        2         0         2    2201948      35673.478           0
verify:        0        0         0         0     382295          0.000           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background long Self test in progress ... - NOW - [- - -]




########## SMART status report for da12 drive (HITACHI HUS72604CLAR4000 : K4K7KU5B) ##########

SMART Health Status: OK

Current Drive Temperature:     42 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 36851:41
Manufactured in week 20 of year 2017
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  240
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  1635
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        1         0         1   10847030     499394.074           0
write:         0        0         0         0     699292     115861.436           0
verify:        0        0         0         0      50685        785.723           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background short Completed - 36838 - [- - -]




########## SMART status report for da13 drive (HITACHI HUS72604CLAR4000 : K4K6EM8B) ##########

SMART Health Status: OK

Current Drive Temperature:     41 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 36852:32
Manufactured in week 20 of year 2017
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  236
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  1625
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        1         0         1   10259926     501568.651           0
write:         0        4         0         4     506902     118108.487           0
verify:        0        0         0         0      54679        771.884           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background short Completed - 36838 - [- - -]




########## SMART status report for da14 drive (HITACHI HUS72604CLAR4000 : K4J3EXNB) ##########

SMART Health Status: OK

Current Drive Temperature:     41 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 35452:54
Manufactured in week 20 of year 2017
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  198
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  1540
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        2         0         2   11315287     499496.148           0
write:         0        0         0         0     762551     116007.798           0
verify:        0        0         0         0      64344        771.487           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background short Completed - 35439 - [- - -]




########## SMART status report for da15 drive (HITACHI HUS72604CLAR4000 : K3GM0E6L) ##########

SMART Health Status: OK

Current Drive Temperature:     40 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 42673:33
Manufactured in week 37 of year 2017
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  9
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  1784
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       21         0        21   11282273     420332.175           0
write:         0      175         0       175    1471694     445962.959           0
verify:        0      716         0       716     363406       3614.828           0

Non-medium error count:        0

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Background short Completed - 42659 - [- - -]




########## SMART status report for da16 drive (SanDisk SDSSDH3 512G : 21280S800845) ##########

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   ---    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   ---    Old_age   Always       -       15142
 12 Power_Cycle_Count       0x0032   100   100   ---    Old_age   Always       -       88
165 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       327683
166 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       0
167 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       21
168 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       1
169 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       181
170 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       0
171 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       0
173 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       0
174 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       15
184 End-to-End_Error        0x0032   100   100   ---    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   ---    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   ---    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   067   040   ---    Old_age   Always       -       33 (Min/Max 11/40)
199 UDMA_CRC_Error_Count    0x0032   100   100   ---    Old_age   Always       -       136
230 Unknown_SSD_Attribute   0x0032   001   001   ---    Old_age   Always       -       0
232 Available_Reservd_Space 0x0033   100   100   004    Pre-fail  Always       -       100
233 Media_Wearout_Indicator 0x0032   100   100   ---    Old_age   Always       -       11
234 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       33
241 Total_LBAs_Written      0x0030   253   253   ---    Old_age   Offline      -       21
242 Total_LBAs_Read         0x0030   253   253   ---    Old_age   Offline      -       659
244 Unknown_Attribute       0x0032   000   100   ---    Old_age   Always       -       0

No Errors Logged

Num Test_Description  (Most recent Short & Extended Tests - Listed by test number)
# 1 Short offline Completed without error 00% 15128 -
# 3 Extended offline Completed without error 00% 15082 -


SCT Error Recovery Control:  SCT Commands not supported


End of data section


The only thing I am seeing that stands out is this in the report script.. which is stating these drives have not been tested in 405 days. This makes no sense to me.
WARNING LOG FILE
Drive: N8GEX1NY - Test Age = 405 Days
Drive: N8GEBBRY - Test Age = 405 Days
Drive: NHG9ZP7Y - Test Age = 405 Days
Drive: NHG9JAAY - Test Age = 405 Days
Drive: N8GG3NYY - Test Age = 405 Days
Drive N8GEW7XY High Drive Temp 45 - Threshold set at 45
Drive: N8GEW7XY - Test Age = 405 Days
Drive N8GEX1PY High Drive Temp 45 - Threshold set at 45
Drive: N8GEX1PY - Test Age = 405 Days
Drive: N8GEBDYY - Test Age = 405 Days
Drive: N8GG150Y - Test Age = 405 Days

If I go into TrueNAS's Web GUI > Storage > Disks > N8GEX1NY (da0) > SMART Test Results > I see 20 listings, all say status "Success" and it is currently running a long background test.
Here is the smartctl -a for that drive:

Code:
# smartctl -a /dev/da0
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              HUS726040AL4210
Revision:             A980
Compliance:           SPC-4
User Capacity:        4,000,787,030,016 bytes [4.00 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca244194888
Serial number:        N8GEX1NY
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Wed Aug  2 16:04:58 2023 EDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     42 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 48557:18
Manufactured in week 29 of year 2016
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  154
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2177
Elements in grown defect list: 1

Vendor (Seagate Cache) information
  Blocks sent to initiator = 5855174732021760

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       58         0        58   13372934     454302.142  0
write:         0     2828         0      2828    3389888      36543.693  0
verify:        0        0         0         0     410966          0.000  0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Self test in progress ...   -     NOW                 - [-   -    -]
# 2  Background short  Completed                   -   38827                 - [-   -    -]
# 3  Background short  Completed                   -   38803                 - [-   -    -]
# 4  Background short  Completed                   -   38779                 - [-   -    -]
# 5  Background short  Completed                   -   38755                 - [-   -    -]
# 6  Background short  Completed                   -   38731                 - [-   -    -]
# 7  Background short  Completed                   -   38707                 - [-   -    -]
# 8  Background short  Completed                   -   38683                 - [-   -    -]
# 9  Background long   Completed                   -   38671                 - [-   -    -]
#10  Background short  Completed                   -   38659                 - [-   -    -]
#11  Background short  Completed                   -   38635                 - [-   -    -]
#12  Background short  Completed                   -   38611                 - [-   -    -]
#13  Background short  Completed                   -   38587                 - [-   -    -]
#14  Background short  Completed                   -   38563                 - [-   -    -]
#15  Background short  Completed                   -   38539                 - [-   -    -]
#16  Background short  Completed                   -   38515                 - [-   -    -]
#17  Background long   Completed                   -   38510                 - [-   -    -]
#18  Background short  Completed                   -   38491                 - [-   -    -]
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
zpool events -v PrimaryPool will hopefully shed light on what happened, look for vdev.spare and what happened before it. I agree with @Etorix's evaluation that two drives had some kind of issues (maybe due to the high temperature), were temporarely kicked out of the pool (spares kick in), and then after a while they were accepted back in; hot spares don't (always?) automatically return to being spares.
In mirror-5 gptid/03daa071-505c-11ed-a9fe-ac1f6be66d76 is the original drive and gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76 is the spare.
In mirror-6 gptid/4710dd39-1b6d-11ed-8423-ac1f6be66d76 is the original drive and gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76 is the spare.

I'd try detaching the spares. @joeschmuck might solve your question regarding the drive test age warning, I suggest you to run the script with the -dump email parameter.​
 
Last edited:

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
zpool events -v PrimaryPool seems to show stuff just from today. They are all timestamped with Aug 2 2023.
I am not sure when the spares kicked in, it could have been a month or two ago. Is there a better way to be running this?

And ok, I maybe misunderstood and assumed drive issues implied failing/erroring drives. I did not know high temperature or something like that would be enough to kick them offline. That being said, it seems weird only 2 drives would hit a high temperature. Is 45 even really a "high temperature"? The other drives seem to all be hovering around 43.

And what does the -dump email parameter do? It did not offer that during the script configuration.

Also as far as the GPTIDs go, that makes a bit more sense. Not sure why, but I was thinking the 1 outside of the spare-1 section was the original drive. But that is just the other drive in the mirror duh.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
zpool events -v PrimaryPool seems to show stuff just from today. They are all timestamped with Aug 2 2023.
I am not sure when the spares kicked in, it could have been a month or two ago. Is there a better way to be running this?
Try zpool events PrimaryPool | grep spare, although this will show only the spare kicking in events.

And ok, I maybe misunderstood and assumed drive issues implied failing/erroring drives. I did not know high temperature or something like that would be enough to kick them offline. That being said, it seems weird only 2 drives would hit a high temperature. Is 45 even really a "high temperature"? The other drives seem to all be hovering around 43.
Which are the two that likely got thrown out; I'd consider anything past 36 celsius to be high: your drives are uncomfortably hot, and that's bad for their lifespan.
Screenshot_1.png

And what does the -dump email parameter do? It did not offer that during the script configuration.
It asks you to describe the issue and sends a copy of the same data you receive via mail using only the -dump parameter to him so he can look into it and, hopefully, resolve the issue. You have to manually run ./multi_report.sh -dump email in the cli while in the folder you placed the script file.

Also as far as the GPTIDs go, that makes a bit more sense. Not sure why, but I was thinking the 1 outside of the spare-1 section was the original drive. But that is just the other drive in the mirror duh.
I agree it's a bit confusing.
 
Last edited:

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Try zpool events PrimaryPool | grep spare, although this will show only the spare kicking in events.
I actually tried running grep on it. But I had tried grep vdev.spare. Nothing had shown.
But I just tried grep spare and still not getting any output from it.

I'd consider anything past 36 celsius to be high: your drives are uncomfortably hot, and that's bad for their lifespan.
Hm okay, I guess I will have to look into dealing with better cooling. Not sure what my (aesthetic) options are seeing I already have a massive fanwall in this thing. I'll have to look around, I forget how high the fan speed is. At worst case I'll strap a fan onto the back of my rack maybe.

And I will try generating a -dump email with the script.
Thanks
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
The test age is because the last successful run of a SMART test was 9,730 hours ago as of the data posted above. I would say that over a year ago the drives with the alarms stopped SMART testing, you may have performed an update or something. Examine your SMART testing schedules, make sure all your drives are accounted for. I have the dump data and will look at it but it looks like the spares are in use as previously indicated.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
The test age is because the last successful run of a SMART test was 9,730 hours ago as of the data posted above.
That's so strange. I double checked my SMART tasks and they are all accounted for.
Also you may have missed it, but I stated in my previous message that I checked one of those drives that it stated had a SMART run on it a long time ago, and it shows up:


The only thing I am seeing that stands out is this in the report script.. which is stating these drives have not been tested in 405 days. This makes no sense to me.
WARNING LOG FILE
Drive: N8GEX1NY - Test Age = 405 Days

If I go into TrueNAS's Web GUI > Storage > Disks > N8GEX1NY (da0) > SMART Test Results > I see 20 listings, all say status "Success" and it is currently running a long background test.
Here is the smartctl -a for that drive:

Code:
# smartctl -a /dev/da0
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              HUS726040AL4210
Revision:             A980
Compliance:           SPC-4
User Capacity:        4,000,787,030,016 bytes [4.00 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca244194888
Serial number:        N8GEX1NY
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Wed Aug  2 16:04:58 2023 EDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     42 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 48557:18
Manufactured in week 29 of year 2016
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  154
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2177
Elements in grown defect list: 1

Vendor (Seagate Cache) information
  Blocks sent to initiator = 5855174732021760

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       58         0        58   13372934     454302.142  0
write:         0     2828         0      2828    3389888      36543.693  0
verify:        0        0         0         0     410966          0.000  0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Self test in progress ...   -     NOW                 - [-   -    -]
# 2  Background short  Completed                   -   38827                 - [-   -    -]
# 3  Background short  Completed                   -   38803                 - [-   -    -]
# 4  Background short  Completed                   -   38779                 - [-   -    -]
# 5  Background short  Completed                   -   38755                 - [-   -    -]
# 6  Background short  Completed                   -   38731                 - [-   -    -]
# 7  Background short  Completed                   -   38707                 - [-   -    -]
# 8  Background short  Completed                   -   38683                 - [-   -    -]
# 9  Background long   Completed                   -   38671                 - [-   -    -]
#10  Background short  Completed                   -   38659                 - [-   -    -]
#11  Background short  Completed                   -   38635                 - [-   -    -]
#12  Background short  Completed                   -   38611                 - [-   -    -]
#13  Background short  Completed                   -   38587                 - [-   -    -]
#14  Background short  Completed                   -   38563                 - [-   -    -]
#15  Background short  Completed                   -   38539                 - [-   -    -]
#16  Background short  Completed                   -   38515                 - [-   -    -]
#17  Background long   Completed                   -   38510                 - [-   -    -]
#18  Background short  Completed                   -   38491                 - [-   -    -]

Notice it shows # 1 Background long Self test in progress ... - NOW - [- - -]
Which aligns with my task schedule. I have all my drives set to run a SHORT daily, a LONG weekly, and a CONVEYENCE monthly.

I haven't looked at what TrueNAS is reporting for the other drives beyond your script, that show they haven't had a smart test ran in all that time.
But it looks like they should all have run no issues.
But it is extremely concerning to me that it has pulled in two spares.
Should I start looking into replacing anyways?
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Also you asked how many drives were in the system, I figured I'd just answer it here for consistency and ease.
It is x15 4tb HDDs, and x1 boot SSD.


Also to add onto something..
I did make this post earlier: https://www.truenas.com/community/threads/console-messages-concerning-or-not.111635/
And stated I saw these messages in my console:

Code:
Jul 31 03:15:01 1 2023-07-31T03:15:01.692041-04:00 nas.lan smartd 2550 - - Device: /dev/da9, not capable of Long Self-Test
Jul 31 03:15:01 1 2023-07-31T03:15:01.697325-04:00 nas.lan smartd 2550 - - Device: /dev/da13, not capable of Long Self-Test
Jul 31 03:15:01 1 2023-07-31T03:15:01.702506-04:00 nas.lan smartd 2550 - - Device: /dev/da12, not capable of Long Self-Test
Jul 31 03:15:01 1 2023-07-31T03:15:01.707749-04:00 nas.lan smartd 2550 - - Device: /dev/da14, not capable of Long Self-Test
Jul 31 03:15:01 1 2023-07-31T03:15:01.713081-04:00 nas.lan smartd 2550 - - Device: /dev/da15, not capable of Long Self-Test


However none of those are the ones your script is reporting back as issue.
To make things maybe easy for you to associate the serial with the drive name:

Drive: N8GEX1NY - /da0 - Test Age = 405 Days
Drive: N8GEBBRY - /da2 - Test Age = 405 Days
Drive: NHG9ZP7Y - /da4 - Test Age = 405 Days
Drive: NHG9JAAY - /da5 - Test Age = 405 Days
Drive: N8GG3NYY - /da6 - Test Age = 405 Days
Drive N8GEW7XY - /da7 - High Drive Temp 45 - Threshold set at 45
Drive: N8GEW7XY - /da7 - Test Age = 405 Days
Drive N8GEX1PY - /da8 - High Drive Temp 45 - Threshold set at 45
Drive: N8GEX1PY - /da8 - Test Age = 405 Days
Drive: N8GEBDYY - /da10 - Test Age = 405 Days
Drive: N8GG150Y - /da11 - Test Age = 405 Days
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Look at the SMART information, I will use the example above:
Code:
Accumulated power on time, hours:minutes 48557:18
# 2  Background short  Completed                   -   38827                 - [-   -    -]

Doing the math: 48557 minus 38827 = 9730 hours difference.

"NOW" is not a completed test result.

As for the error messages provided by TrueNAS that it cannot run a SMART Long Test, well the drive has completed several Long tests and it takes 570 minutes to complete.
Code:
# 9  Background long   Completed                   -   38671                 - [-   -    -]


Also the drive is telling you that a Long test is in progress.

When the drive reaches 49127 hours (9.5 hours) then check the SMART data again. You should have a test completed in the 48,xxx or 49,xxx range. You should not have this line
Code:
# 2  Background short  Completed                   -   38827                 - [-   -    -]
, it should have moved into spot #3. If it didn't then the failure is in your drive firmware.

Please update when you can, I'm very curious if the drive updated the test results or not.

EDIT =========
I've examined all the data you sent me and all the drives which indicate 405 test age should be all cleared by now. Note that the Last Test Type is "Background long" in a blue background, that indicates that test is running and the SMART data does indicate that. I don't believe it is your drive firmware since some of the drive models that match have the same firmware. So we wait until you can provide more information, I expect the tests to be recorded as completed with the proper hour count.
 
Last edited:

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
I'm honestly a little confused by the response.
So from the first codeblock, I see what you are saying. The last short test was 9730 hours ago. Okay. So it has not completed a SMART test since then.

As for the error messages, I do not understand what your point is. Ok the long test takes 570 minutes to complete, but why is TrueNAS stating it;s not capable?

And as for the last codeblock, you are saying to check the drive again in about 10 hours. And see if the short test that is currently in place #2 @ 38827, moves into slot #3? Ok.

As far as updating, what are you suggesting updating? Or are you just asking me to report back here?

And this still leaves me not understanding, this was only for 1 drive (da0), however the script reported multiple drives.
So if this drive does not move that 38827 into slot #3, then it should be considered a "failed" drive, and replace the hard drive? In that case, all of those drives will need replacing?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
but why is TrueNAS stating it;s not capable?
I have no idea. If it continues I would report it as a bug.

As far as updating, what are you suggesting updating? Or are you just asking me to report back here?
Yes, just reporting back with the results.
And this still leaves me not understanding, this was only for 1 drive (da0), however the script reported multiple drives.
Because all the drives reporting 405 days have not had a SMART test run on them in a long time. The data indicates all these drives are currently running a SMART Long test. I'm curious if your started them or if the system did it automatically. I suspect you just played around with the SMART Test settings and that kicked off the tests.

So if this drive does not move that 38827 into slot #3, then it should be considered a "failed" drive, and replace the hard drive? In that case, all of those drives will need replacing?
This does not indicate your drives are failing. It just means the drives are not tracking the test results. Let the tests complete and then lets examine the SMART Test log, see if it reflects a current hour count. If it doesn't then you might want to consider replacing the drives just becasue you would not know if they failed a SMART test. The alternative is to pay attention to your SCRUBs, if they start throwing errors then I'd replace theses drives.

But let's see how the test goes before jumping the gun.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Ok that gives me a bit of reassurance.
Now just one more thing, those are multiple drives.
Are we highly concerned about the two amongst them that have been kicked off, and the mirrors are being used? Those concern me most.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
How are all the drives connected - all to the HBA, or some to the motherboard ports / some to the HBA?

Do you know which drives are connected where if it's a mix of mobo/HBA?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
How are all the drives connected - all to the HBA, or some to the motherboard ports / some to the HBA?

Do you know which drives are connected where if it's a mix of mobo/HBA?
His signatures states he's using a AOC-S3008L-L8E Rev2 (IT MODE)... I have no idea what that thing is, google shows LSI. But he's also using a backplane?

Output of sas3flash -list or sas2flash -listall, maybe a camcontrol devlist as well?

Also, Hinata is truly a better waifu than Sakura; Ino is very good waifu material as well.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Two HBA controllers in IT mode ((Initiator Target) mode, the controller will not work in RAID mode, so that the operating system can see each disks individually).
They connect to the backplane. The backplane hosts the drives.
(Although technically I think I only have 1 HBA controller connected at the moment, making a solid 60% of my bays unusable. So it won't show two controllers connected. Unrelated to my hard drives, all the bays the hard drives are in are active and being used by the HBA that is connected.)

Output of sas3flash -list, maybe a camcontrol devlist as well?


# sas3flash -list
librewolf_Sb9Q01BtiG.png


camcontrol devlist
librewolf_HDeYyvXXk8.png



Also, Hinata is truly a better waifu than Sakura; Ino is very good waifu material as well.
Hinata-Hyuga-Shippuden-wallpapers33.jpg
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
You have wrong firmware version. See the following resource.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
You have listed 13 out of your 16 drives. Time to match serial numbers with drives and find out which ones are not functional (unless you've already done so)?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
There are 16 drives in the system. It appears that da12 and da13 are the spares and are being used to replace da9 and da14 respectively, if I'm reading the output correctly (I added the device idents and I've never paid attention to an output that had spares active). This would be drives K4K0BBJB and K4J3EXNB (I have the full multi_report output). But what I do not understand is it looks like the Spares were manually assigned, the drive it replaced is not "UNAVAIL" and the mirrors are not "DEGRADED". As far as I can tell, you can manually detach the spares HOWEVER, WAIT until someone tells you that it's safe. Don't do it because I mentioned it, I don't consider myself the expert on this topic.

Code:
    NAME                                              STATE     READ WRITE CKSUM
    PrimaryPool                                       ONLINE       0     0     0
      mirror-0                                        ONLINE       0     0     0
        gptid/d7476d46-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0  da10
        gptid/d8d6aa36-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0  da11
      mirror-1                                        ONLINE       0     0     0
        gptid/d9a6f5dc-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0  da0
        gptid/db71bcb5-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0  da1
      mirror-2                                        ONLINE       0     0     0
        gptid/d8b2f42f-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0  da6
        gptid/d96847a9-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0  da7
      mirror-3                                        ONLINE       0     0     0
        gptid/d9fb7757-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0  da8
        gptid/da1e1121-32ca-11ec-b815-002590f52cc2    ONLINE       0     0     0  da2
      mirror-4                                        ONLINE       0     0     0
        gptid/9fd0872d-8f64-11ec-8462-002590f52cc2    ONLINE       0     0     0  da4
        gptid/9ff0f041-8f64-11ec-8462-002590f52cc2    ONLINE       0     0     0  da5
      mirror-5                                        ONLINE       0     0     0
        gptid/14811777-1b6d-11ed-8423-ac1f6be66d76    ONLINE       0     0     0  da3
        spare-1                                       ONLINE       0     0     0
          gptid/03daa071-505c-11ed-a9fe-ac1f6be66d76  ONLINE       0     0     0  da14
          gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0     0  da13
      mirror-6                                        ONLINE       0     0     0
        gptid/749a1891-1b5c-11ee-941f-ac1f6be66d76    ONLINE       0     0     0  da15
        spare-1                                       ONLINE       0     0     0
          gptid/4710dd39-1b6d-11ed-8423-ac1f6be66d76  ONLINE       0     0     0  da9
          gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0     0  da12
    spares
      gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use  da12
      gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use  da13

errors: No known data errors

Drives for this pool are listed below:
d7476d46-32ca-11ec-b815-002590f52cc2 -> da10
d8d6aa36-32ca-11ec-b815-002590f52cc2 -> da11
d9a6f5dc-32ca-11ec-b815-002590f52cc2 -> da0
db71bcb5-32ca-11ec-b815-002590f52cc2 -> da1
d8b2f42f-32ca-11ec-b815-002590f52cc2 -> da6
d96847a9-32ca-11ec-b815-002590f52cc2 -> da7
d9fb7757-32ca-11ec-b815-002590f52cc2 -> da8
da1e1121-32ca-11ec-b815-002590f52cc2 -> da2
9fd0872d-8f64-11ec-8462-002590f52cc2 -> da4
9ff0f041-8f64-11ec-8462-002590f52cc2 -> da5
14811777-1b6d-11ed-8423-ac1f6be66d76 -> da3
03daa071-505c-11ed-a9fe-ac1f6be66d76 -> da14
0d56b97d-1e91-11ed-a6aa-ac1f6be66d76 -> da13
749a1891-1b5c-11ee-941f-ac1f6be66d76 -> da15
4710dd39-1b6d-11ed-8423-ac1f6be66d76 -> da9
0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76 -> da12
0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76 -> da12
0d56b97d-1e91-11ed-a6aa-ac1f6be66d76 -> da13


########## ZPool status report for boot-pool ##########
  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
    The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:01:11 with 0 errors on Sat Jul 29 03:46:12 2023
config:

    NAME        STATE     READ WRITE CKSUM
    boot-pool   ONLINE       0     0     0
      da16p2    ONLINE       0     0     0

errors: No known data errors
 
Top