Replaced One Drive, Resilvering, Now Multiple Failed AND Data Corruption!?

isopropyl · Jan 6, 2024

I replaced one of my hard drives that has had an issue for a bit. Everything was fine though it was a vdev of 3 drives.
It was resilvering over night, and I checked it this morning and now MULTIPLE drives have failed, and it shows that a bunch of files are corrupted. I went and checked those files, and they seem fine unless I am missing something? I didn't check all of them it gave me a big list.

The drive I replaced was one of the seagate 20tb seagate sata drives in mirror7 (ZVT06KLB)

Wtf happened, and what do I do?

Code:

# zpool status -v
  pool: PrimaryPool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 4.63T in 1 days 00:53:41 with 11632 errors on Fri Jan  5 22:45:41 2024
remove: Removal of vdev 6 copied 1.39T in 3h55m, completed on Tue Sep  5 16:54:55 2023
        28.2M memory used for removed device mappings
config:

        NAME                                              STATE     READ WRITE CKSUM
        PrimaryPool                                       DEGRADED     0     0   0
          mirror-0                                        ONLINE       0     0   0
            gptid/d7476d46-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   4
            gptid/d8d6aa36-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   4
          mirror-1                                        ONLINE       0     0   0
            gptid/d9a6f5dc-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   4
            gptid/db71bcb5-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   4
          mirror-2                                        ONLINE       0     0   0
            gptid/d8b2f42f-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/d96847a9-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-3                                        ONLINE       0     0   0
            gptid/d9fb7757-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/da1e1121-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-4                                        DEGRADED     0     0   0
            spare-0                                       DEGRADED     0     0 11.6K
              gptid/9fd0872d-8f64-11ec-8462-002590f52cc2  DEGRADED     0     0 986  too many errors
              gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   0
            spare-1                                       DEGRADED     0     0 12.1K
              gptid/9ff0f041-8f64-11ec-8462-002590f52cc2  DEGRADED     0     0 448  too many errors
              gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   0
          mirror-5                                        DEGRADED     0     0   0
            gptid/14811777-1b6d-11ed-8423-ac1f6be66d76    DEGRADED     0     0 13.1K  too many errors
            gptid/0cd1e905-3c2e-11ee-96af-ac1f6be66d76    DEGRADED     0     0 13.1K  too many errors
          mirror-7                                        DEGRADED     0     0   0
            gptid/82164f5e-ab11-11ee-b98e-ac1f6be66d76    DEGRADED     0     0716K  too many errors
            gptid/8ab75bbc-4c0d-11ee-8b4c-ac1f6be66d76    DEGRADED     0     0 1.40M  too many errors
            gptid/8aa4f83e-4c0d-11ee-8b4c-ac1f6be66d76    DEGRADED     0     0 1.40M  too many errors
        spares
          gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use
          gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use

errors: Permanent errors have been detected in the following files:

[REDACTED FILE LIST]


  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:01:14 with 0 errors on Wed Jan  3 03:46:15 2024
config:

Code:

CRITICAL
Pool PrimaryPool state is DEGRADED: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
The following devices are not healthy:

    Disk HITACHI HUS72604CLAR4000 K4K7KU5B is UNAVAIL
    Disk HITACHI HUS72604CLAR4000 K4K6EM8B is UNAVAIL
    Disk HGST HUS726040AL4210 NHGAJWEY is DEGRADED
    Disk SEAGATE ST4000NXCLAR4000 Z1Z2428B 0000C4118VDG is DEGRADED
    Disk ATA ST20000NM007D-3D ZVT5JR8S is DEGRADED
    Disk ATA ST20000NM007D-3D ZVT5J3MY is DEGRADED
    Disk ATA ST20000NM007D-3D ZVT5JPF5 is DEGRADED
    Disk HGST HUS726040AL4210 NHG9ZP7Y is DEGRADED
    Disk HGST HUS726040AL4210 NHG9JAAY is DEGRADED

2024-01-05 22:45:56 (America/New_York)

isopropyl · Jan 6, 2024

I'm letting a scrub run again while I head to work

isopropyl · Jan 7, 2024

Nothing changed with scrub.
Not sure what to do now.
I am in a time crunch though because I need to send back the old drive to be RMA'd.

Just For An Example:
It says one of the issue files is South Park S26E02.

I launch the file, it is abnormally slow to launch. Then it's having trouble loading certain parts of the video. Then after like 2min the video is completely fine, plays, can seek everything, audio is ok, visuals seem fine.

I read the backplane or hba could overheat during resilvering. I don' think that's the case though.
If I am guessing, which was my first instinct, the system just thinks the drives are erroring because of some other issue. But all is actually ok.
But I need to resolve these errors still.

I have never had issues in the past when resilvering. All cables seem good nothing loose.
All the drives have been well already.
The only reason I replaced one in the first place was because one drive was giving me SMART errors saying it could not read the SMART data, and it followed the drive when swapping bays too.

Constantin · Jan 7, 2024

This is why it’s essential to prequalify and then set aside at least one spare drive. The minute a drive starts throwing serious errors is the minute you make a backup followed by a disconnect, replacement, resilver. It sounds like you do not have backups though and that’s really concerning, if you value that data.

I suspect the reason your files still play is that the video codec can manage some mangled data. You might not get so lucky if the mangling starts involving metadata.

Fixing an overheating problem could be as simple as fitting some bigger blowers. Have the OEM blowers been replaced with something quieter?

isopropyl · Jan 7, 2024

Constantin said:
This is why it’s essential to prequalify and then set aside at least one spare drive. The minute a drive starts throwing serious errors is the minute you make a backup followed by a disconnect, replacement, resilver. It sounds like you do not have backups though and that’s really concerning, if you value that data.

I suspect the reason your files still play is that the video codec can manage some mangled data. You might not get so lucky if the mangling starts involving metadata.

I have backups of my critical data, but not of all my media because it's almost 15tb of media.
I still would love to not lose any of this data though if it can be avoided.

Either way, the drive I replaced was one in a 3-way mirror vdev. So even though 1 drive failed, there were still 2 other mirror drives. There shouldn't have been any issues with the data unless both those other drives magically died at the same time, which is possible but I feel is unlikely.

However, this does not seem to be the case, because it is throwing errors for multiple drives across multiple vdevs. Not just drives within the vdev I replaced.

Fixing an overheating problem could be as simple as fitting some bigger blowers. Have the OEM blowers been replaced with something quieter?

No, and I don't believe it's overheating. But I'm not entirely sure how to tell?
Like I said I have had no issues like this when re-silvering ever.

I'm not really sure what to do, or try at this point.
If the files actually are corrupted, what is the best way to try to determine that if my player is playing them?
And what else can I try here?

And why can't TrueNAS just continue the resilver and try to do it again without file issues or something? Like am I just in a massive loop of errors? Because scrubbing isn't resolving anything.
I tried running zpool clear and then rebooting and starting another scrub. It was scrubbing for a little, then I just checked it and it's resilvering for some reason again I don't understand why.

Code:

# zpool status
  pool: PrimaryPool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Jan  7 10:19:32 2024
        821G scanned at 255M/s, 696G issued at 216M/s, 19.2T total
        42.7G resilvered, 3.54% done, 1 days 00:59:37 to go
remove: Removal of vdev 6 copied 1.39T in 3h55m, completed on Tue Sep  5 16:54:55 2023
        28.2M memory used for removed device mappings
config:

        NAME                                              STATE     READ WRITE CKSUM
        PrimaryPool                                       DEGRADED     0     0   0
          mirror-0                                        ONLINE       0     0   0
            gptid/d7476d46-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/d8d6aa36-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-1                                        ONLINE       0     0   0
            gptid/d9a6f5dc-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/db71bcb5-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-2                                        ONLINE       0     0   0
            gptid/d8b2f42f-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/d96847a9-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-3                                        ONLINE       0     0   0
            gptid/d9fb7757-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/da1e1121-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-4                                        ONLINE       0     0   0
            gptid/9fd0872d-8f64-11ec-8462-002590f52cc2    ONLINE       0     0 422
            gptid/9ff0f041-8f64-11ec-8462-002590f52cc2    ONLINE       0     0 422
          mirror-5                                        DEGRADED     0     0   0
            gptid/14811777-1b6d-11ed-8423-ac1f6be66d76    ONLINE       0     0 478
            spare-1                                       DEGRADED     0     0   0
              gptid/0cd1e905-3c2e-11ee-96af-ac1f6be66d76  DEGRADED     0     0 370  too many errors
              gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0 612  (resilvering)
          mirror-7                                        ONLINE       0     0   0
            gptid/82164f5e-ab11-11ee-b98e-ac1f6be66d76    ONLINE       0     0 55.5K
            gptid/8ab75bbc-4c0d-11ee-8b4c-ac1f6be66d76    ONLINE       0     0 55.5K
            gptid/8aa4f83e-4c0d-11ee-8b4c-ac1f6be66d76    ONLINE       0     0 55.5K
        spares
          gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use
          gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76      AVAIL

errors: 902 data errors, use '-v' for a list

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:01:14 with 0 errors on Wed Jan  3 03:46:15 2024
config:

Constantin · Jan 7, 2024

I’m afraid this is outside my wheelhouse and I hope that some of the Demi-gods that roam these halls will take an interest in this topic. The good news is that since you have backups, the worst that can happen is a pool rebuild followed by re-transfer. It’s time consuming, annoying, etc. but 15 TB should be done in two days max.

isopropyl · Jan 7, 2024

With the way it's behaving, how it's not just one or two drives, and my instincts, I feel like it will not be necessary, and nothing is actually corrupt.
But that could also be wishful thinking.

WI_Hedgehog · Jan 7, 2024

It could be the case most things are fine. The most common problem I've personally experienced with replacing a drive is a cabling issue, either by a not-fully-plugged cable or a cable that went bad (oddly, they do go bad, usually when a bend radius gets too tight).

Personally, I would use a USB fob with diagnostic tools on it to inspect the system and see if anything stood out. This helps me see what the system sees and isolate where the problem is occuring, because if I start messing with cabling a failing cable can be bumped into the "right" position only to "go bad" again somewhere down the road. If, on the other hand, a diagnostic can show where the issue at least seems to be and I can re-run that test and show that issue does exist, when I go in and physically muck about in the system I can run the same test/set of tests and determine if that fixed the issue, then start to prove which piece of hardware failed (if it is hardware). For me, proving things work and proving things don't work makes for a really stable system (that's easier to troubleshoot in the future when I refer to the notes I take during testing), though you may not wish to go through that amount of effort (and I respect that).

isopropyl · Jan 7, 2024

WI_Hedgehog said:
It could be the case most things are fine. The most common problem I've personally experienced with replacing a drive is a cabling issue, either by a not-fully-plugged cable or a cable that went bad (oddly, they do go bad, usually when a bend radius gets too tight).

Personally, I would use a USB fob with diagnostic tools on it to inspect the system and see if anything stood out. This helps me see what the system sees and isolate where the problem is occuring, because if I start messing with cabling a failing cable can be bumped into the "right" position only to "go bad" again somewhere down the road. If, on the other hand, a diagnostic can show where the issue at least seems to be and I can re-run that test and show that issue does exist, when I go in and physically muck about in the system I can run the same test/set of tests and determine if that fixed the issue, then start to prove which piece of hardware failed (if it is hardware). For me, proving things work and proving things don't work makes for a really stable system (that's easier to troubleshoot in the future when I refer to the notes I take during testing), though you may not wish to go through that amount of effort (and I respect that).

I'm letting it do it's resilvering with whatever it's doing currently. Because I don't understand what it's resilvering. Maybe it is just doing something with one of the spares idk. Don't wanna interrupt it though.

However, what sort of diag would I even do on it?
I understand reseating cables and stuff. But usb diag tools, what would I even be looking for?

isopropyl · Jan 11, 2024

It did a resilver again, and then another one right after it so I've been letting it do it's thing.
It is still throwing errors. I don't know what to do.

I have this other hard drive I need to send back to be RMA'd. Is there any benefit to me holding onto it?
Should I try popping it back into the system and replacing the current drive and seeing if the errors go away?

Really not sure what to do.

Code:

# zpool status -v
  pool: PrimaryPool state: DEGRADED
status: One or more devices has experienced an error resulting in data        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 1.74T in 1 days 16:17:58 with 11558 errors on Thu Jan 11 08:2
2:02 2024
remove: Removal of vdev 6 copied 1.39T in 3h55m, completed on Tue Sep  5 16:54:5
5 2023
        28.2M memory used for removed device mappings
config:

        NAME                                              STATE     READ WRITE C
KSUM
        PrimaryPool                                       DEGRADED     0     0
   0
          mirror-0                                        ONLINE       0     0
   0
            gptid/d7476d46-32ca-11ec-b815-002590f52cc2    ONLINE       0     0
   4
            gptid/d8d6aa36-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   4
          mirror-1                                        ONLINE       0     0   0
            gptid/d9a6f5dc-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   4
            gptid/db71bcb5-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   4
          mirror-2                                        ONLINE       0     0   0
            gptid/d8b2f42f-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/d96847a9-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-3                                        ONLINE       0     0   0
            gptid/d9fb7757-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/da1e1121-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-4                                        DEGRADED     0     0   0
            gptid/9fd0872d-8f64-11ec-8462-002590f52cc2    DEGRADED     0     0 10.0K  too many errors
            gptid/9ff0f041-8f64-11ec-8462-002590f52cc2    DEGRADED     0     0 10.0K  too many errors
          mirror-5                                        DEGRADED     0     0   0
            spare-0                                       DEGRADED     0     0 10.1K
              gptid/14811777-1b6d-11ed-8423-ac1f6be66d76  DEGRADED     0     0 834  too many errors
              gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   0
            spare-1                                       DEGRADED     0     0 10.4K
              gptid/0cd1e905-3c2e-11ee-96af-ac1f6be66d76  DEGRADED     0     0 370  too many errors
              gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76  ONLINE      15     0 626
          mirror-7                                        DEGRADED     0     0   0
            gptid/82164f5e-ab11-11ee-b98e-ac1f6be66d76    DEGRADED     0     0 1.45M  too many errors
            gptid/8ab75bbc-4c0d-11ee-8b4c-ac1f6be66d76    DEGRADED     0     0 1.45M  too many errors
            gptid/8aa4f83e-4c0d-11ee-8b4c-ac1f6be66d76    DEGRADED     0     0 1.45M  too many errors
        spares
          gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use
          gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use

errors: Permanent errors have been detected in the following files:

[READACTED FILE PATHS]

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:01:02 with 0 errors on Wed Jan 10 03:46:02 2024
config:

WI_Hedgehog · Jan 12, 2024

In my signature:
Show: Resources and Help -> Resources List -> Building ->
- Hard Drive Reliability Testing
- Hard Drive Troubleshooting Guide

Mind you, I'm anal about finding problems, so when a system goes bad if I don't know for sure what went bad I assume every part in the system is bad until proven to be good, because in my long career of troubleshooting ^&*# problems that's worked well for keeping systems stable. I tend to walk up to it with a custom-built USB thumb drive on my keychain and boot off it, then inspect the whole system methodically; often there's more than one problem that's gone undiagnosed, usually the problems are related (like a failed fan caused other fans to overload and a few more failed and drives got cooked and are just starting to get to that "itchy" state of having lots of recoverable read errors, right before the state of having unrecoverable read errors).

Since I keep thorough burn-in notes the change in status of any component becomes apparent (usually), and I've read pretty much everything here that anyone has written (because man, these guys are smart!). TrueNAS, thanks to the members here, has been the best, unexpected "next-level" I.T. career move I could have made.

isopropyl · Jan 16, 2024

Looking at your guides now, is there a way to check the smart output in a way that would help me, but not individually 1 by 1 every single drive?
I have 20+ drives in my system so this will get a bit tedious and confusing.

Also my RMA window is close to up, so I have to send back the old drive now. Hope I didn't need it, don't see why I would've the data on it was mirrored to the other 2 drives. But both show degraded now. What a disaster :/

dak180 · Jan 16, 2024

For a quick overview of the smart status of your drives (also good for ongoing monitoring) I would suggest my Report Script; I also have a link to the script I use to burn in drives in my sig too.

Seeing what the status summary tables show would give you good idea of where to drill in for more detail.

Whattteva · Jan 16, 2024

I didn't really read every post in this thread, but from what I see with your status, it looks like you either have bad cables or bad HBA/SATA controller. It is highly unlikely for multiple drives to throw up the same amount of errors. I mean, it's possible, but not very likely. I haven't encountered an instance where that happens and it's actually the drives that are the culprit. It's like tossing a coin 50 times and getting heads 50 times... It's mathematically possible for sure, but also very unlikely.

You have 3 drives throwing up exactly 1.45M errors, 2 drives throwing up 10.0K errors.

isopropyl · Jan 17, 2024

Whattteva said:
You have 3 drives throwing up exactly 1.45M errors, 2 drives throwing up 10.0K errors.

This is actually a really good observation. I didn't take notice of this.
I haven't had a chance to open it up and reseat everything, I will hopefully have the time thursday. I don't know if I have any spare cables. I'll take a peek.

I know Hedgehog mentioned a fan might've died and could've fried the drives. I really find it hard to believe in all honesty. But again that could just be wishful thinking.

I am curious however, if it was simply a cable, and I reseat it or replace them.. how would I know if it is resolved?
Will I have to clear the pool errors then run and entire scrub every single time just to see? Like if I reseat all the cables, I have to run a scrub and if it throws errors then I have to replace cables then run another scrub? If that makes sense.

isopropyl · Jan 17, 2024

dak180 said:
For a quick overview of the smart status of your drives (also good for ongoing monitoring) I would suggest my Report Script; I also have a link to the script I use to burn in drives in my sig too.

Seeing what the status summary tables show would give you good idea of where to drill in for more detail.

Thanks, I was using a script that sent e-mail reports but it broke randomly so I will try this one! Thanks!

isopropyl · Jan 27, 2024

Ok so last night, I replaced all 4 of the LSI card cables that go to the backplane. Reseated the cards, mixed up the entire order of the hard drives so they're not in the same slots, and swapped the PSU it was actively using. When I plugged back in I noticed a large group of the drives in the front were not even lighting up.

I then noticed that the front backplane was all crooked, very slightly but the drives were not making a connection fully or at all. However I'm not sure when this happened. I'm not sure if it's been like this, and maybe that's why it was throwing errors. But the drives were all lit up before this so idk. However I fixed the backplane, and recently booted up.
All the fans are working, also normally if one dies or is taken out the system freaks tf out and starts beeping and ramps up the other fans. So I don't think there were ever fan issues.

Once I booted up, it was auto resilvering for some reason. I had ran zpool clear poolname and then just let it sit overnight. I just woke up and t he resilver is only at 5% however it shows degraded again and a handful of drives degraded.
I'll let it do it's thing, then run a scrub. All disk temps look around the same within reason even while resilvering.

But I honestly do not know besides this. Maybe one of the cards is dying?
Is there a way to determine if a card is having issues?
What else could it be.

isopropyl · Feb 6, 2024

So the scrub and resilvering is all finished and it is still showing degraded. I'm really losing hope here and it's getting frustrating.
Is there a way to check the health of my LSI cards maybe? Both the green lights are on physically on them.

All the drives are detected, and I have been using the NAS as normal and haven't noticed anything, however I could maybe just not be accessing those files that are errored. But I Just tried a movie file over MPV over SMB that zpool status -v said was errored, and it launched and played no issues I skipped throughout it quickly it was instantaneous and didn't seem to have issues.

Code:

       
root@X[~]# zpool status
  pool: PrimaryPool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 2.33T in 21:28:58 with 11520 errors on Thu Feb  1 13:14:15 2024
remove: Removal of vdev 6 copied 1.39T in 3h55m, completed on Tue Sep  5 16:54:55 2023
        28.2M memory used for removed device mappings
config:

        NAME                                              STATE     READ WRITE CKSUM
        PrimaryPool                                       DEGRADED     0     0KSUM
          mirror-0                                        ONLINE       0     0   0
            gptid/d7476d46-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   6
            gptid/d8d6aa36-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   6
          mirror-1                                        ONLINE       0     0   0
            gptid/d9a6f5dc-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   6
            gptid/db71bcb5-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   6
          mirror-2                                        ONLINE       0     0   0
            gptid/d8b2f42f-32ca-11ec-b815-002590f52cc2    ONLINE       0     0
        mirror-2                                        ONLINE       0     0   0
            gptid/d8b2f42f-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/d96847a9-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-3                                        ONLINE       0     0   0
            gptid/d9fb7757-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/da1e1121-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-4                                        DEGRADED     0     0   0
            spare-0                                       DEGRADED     0     0 8.29K
              gptid/9fd0872d-8f64-11ec-8462-002590f52cc2  DEGRADED     0     0 4.53K  too many errors
              gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   0
            gptid/9ff0f041-8f64-11ec-8462-002590f52cc2    DEGRADED     0     0 12.8K  too many errors
          mirror-5                                        DEGRADED     0     0   0
            gptid/14811777-1b6d-11ed-8423-ac1f6be66d76    DEGRADED     0     0 9.29K  too many errors
            spare-1                                       DEGRADED     0     0 8.98K
              gptid/0cd1e905-3c2e-11ee-96af-ac1f6be66d76  DEGRADED     0     0 318  too many errors
              gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   0
            gptid/0cd1e905-3c2e-11ee-96af-ac1f6be66d76  DEGRADED     0     0 318  too many errors
              gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   0
          mirror-7                                        DEGRADED     0     0   0
            gptid/82164f5e-ab11-11ee-b98e-ac1f6be66d76    DEGRADED     0     0 2.18M  too many errors
            gptid/8ab75bbc-4c0d-11ee-8b4c-ac1f6be66d76    DEGRADED     0     0 2.18M  too many errors
            gptid/8aa4f83e-4c0d-11ee-8b4c-ac1f6be66d76    DEGRADED     0     0 2.18M  too many errors
           gptid/8aa4f83e-4c0d-11ee-8b4c-ac1f6be66d76    DEGRADED     0     0 2.18M  too many errors
        spares
          gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use
          gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use

errors: 11520 data errors, use '-v' for a list

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.

dak180 · Feb 7, 2024

@isopropyl can you post the full output of the Report Script (which will include smart output for all disks) so we can get a better idea as to what might be the cause of the errors.

isopropyl · Feb 10, 2024

dak180 said:
@isopropyl can you post the full output of the Report Script (which will include smart output for all disks) so we can get a better idea as to what might be the cause of the errors.

Hey, trying to do this now.
I keep running into an error probably something stupid I'm missing?
I get the e-mail but it simply says "please edit the config file for your setup".

I had added the e-mail and changed default config from 1 to 0, then setup the cron. I click run now on the cron and that's the e-mail I get.
It created the report.conf file, I see it in the directory. E-mail address still is in there, and defaultconfig is still set to 0.

Important Announcement for the TrueNAS Community.

Replaced One Drive, Resilvering, Now Multiple Failed AND Data Corruption!?

Contributor

Contributor

Contributor

Vampire Pig

Contributor

Vampire Pig

Contributor

Guru

Contributor

Contributor

Guru

Contributor

Patron

Wizard

Contributor

Contributor

Contributor

Contributor

Patron

Contributor

Similar threads