disk pool throwing alert

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
Bunker, 3wide RaidZ1,
1 GOOD / dev/sda
1 HDD throwing errors /dev/sdd - SMART shows 0
1 HDD throwing errors /dev/sde - SMART shows 2
You need to understand what's going on there... is it the connection to those disks (perhaps the first case) or the disk itself (perhaps the second case)?

If you can't find a connection problem and fix it for the first (or both), then you need to consider replacement of those.

Resilvering 1 at a time would be recommended in such a small RAIDZ1 pool, you can do that with the "bad" disks still in place (or not, removing them one at a time)
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
... doing it this way allows to replace a drive now, giving some stability to Bunker, with potentially a 2nd drive tomorrow, which then means the disk pool is stable, and then as soon as the 3rd drive goes in and the size increased then I relocate the dataset from the current degraded tank to the now larger bunker.
G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
You need to understand what's going on there... is it the connection to those disks (perhaps the first case) or the disk itself (perhaps the second case)?

If you can't find a connection problem and fix it for the first (or both), then you need to consider replacement of those.

Resilvering 1 at a time would be recommended in such a small RAIDZ1 pool, you can do that with the "bad" disks still in place (or not, removing them one at a time)
I've got 3 x 8TB one the way, first one is here.
I'm just trying to figure out where to introduce it/start rebuilding, while of course checking connections at the same time.

For Bunker I don't think it's connection, think it's disks.

Tank, could very well be a bad power connection, meaning once I've done all of this I will have those 3 drives working again.
I have a external case so will put the drive in there and see how much testing I can do onto it from my MAC, confirm if it was a bad connection or a larger problem.
G
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
For Bunker I don't think it's connection, think it's disks.
If SMART is showing 0 errors, why do you think that?

For the second disk, I agree it may well be the disk.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
Getting a couple of these emails a day, for a long time the error count on /dev/sdd and /dev/sde stays the same, so was not to worried, recently they /dev/sde started climbing and SMART now shows errors on it.

TrueNAS @ vaultx

New alerts:
  • Device: /dev/sdd [SAT], new Self-Test Log error at hour timestamp 40118.
Current alerts:
  • Device: /dev/sde [SAT], 16 Currently unreadable (pending) sectors.
  • Device: /dev/sde [SAT], 16 Offline uncorrectable sectors.
  • Device: /dev/sdd [SAT], 24 Currently unreadable (pending) sectors.
  • Device: /dev/sdd [SAT], 24 Offline uncorrectable sectors.
  • Device: /dev/sde [SAT], 32 Currently unreadable (pending) sectors.
  • Device: /dev/sde [SAT], 32 Offline uncorrectable sectors.
  • Space usage for pool "tank" is 88%. Optimal pool performance requires used space remain below 80%.
  • Device: /dev/sdd [SAT], new Self-Test Log error at hour timestamp 40118.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
OK, so you're probably right then... I'm a little confused why you would say
1 HDD throwing errors /dev/sdd - SMART shows 0
Maybe a look at smartctl -A /dev/sdd would help to sort that out as I don't see how that could be without error based on the alert you list above.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
I need to move it out of the rack where it is atm, (rack is above a cupboard), to where I can work on it, will cable the new drive it, start it up and start the /dev/sde replacement process, while thats busy we can do more screenshots of /dev/sdd and see, / run additional diagnostic commands ?
PS: how many drives do you have in your 804.
Currently I have 6 x HDD and 2 x SSD (sitting behind the front face plate/cover).

G
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
how many drives do you have in your 804.
I have had 8 in the 8-drive bay for a number of years when that was my primary NAS (currently only 5).

I have 2 SSDs on the floor of the Mobo section and 4 (2 mounted and 2 loose) in the front pull-off section)
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
thinking TANK problem was a bad power cable.
going to retighten things, bring it back up, with the new 8TB installed in a place it could perm stay. Tank might be good... or make that the 3 x 4TB drives.

Power supplies seem to be my limiting thing on my setup atm, as I only have 6 outputs that i need to split (not a good idea...)

G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
If I work on premise I only have one bad drive.... that will then leave me with 5 x 4TB and 3 x 8TB's... planning to place them in the 2 cages.
Will for now leave the 2 x 250GB SSD's in the front cover, but will start looking at doing cabling for them on the floor at the MB and then open those 2 slots.
or leave as is and get 2 larger SSD's for the MB floor.
but then might be good idea to leave these open... these type of fixes need open chair during musical chairs.
G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
well got some movement...

Tank is back, 3 wide, all good, so that was a power supply problem, Bunker has decided to totally go offline now...

Have to see how to rewire or potentially reposer the entire unit. think i have a potential power problem... thats also causing problems in addition to the potential bad drive in Bunker.

G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
I would say based on the below think the 3 drives in tank are actually all good:

so my problem atm is more bunker.

Code:
ZFS has finished a resilver:

eid: 50
class: resilver_finish
host: vaultx
time: 2023-08-03 14:20:02+0200
pool: tank
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: resilvered 12.7G in 00:04:11 with 0 errors on Thu Aug 3 14:20:02 2023
config:

    NAME STATE READ WRITE CKSUM
    tank ONLINE 0 0 0
      raidz1-0 ONLINE 0 0 0
        eeecce3d-8009-11eb-894e-1c1b0dce071c ONLINE 0 0 0
        f1d9f025-7a88-11eb-ae9f-1c1b0dce071c ONLINE 0 0 0
        743606b0-865d-11eb-9d22-e0d55e575a95 ONLINE 0 0 0
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
ok... power is plugged in good everywhere... all drives are up/spinning.
system is reporting 1 spare drive, that will be the new 8TB drive.
Time to diagnose and remove a bad drive..
G

Output:

Linux vaultx 5.15.107+truenas #1 SMP Tue Jul 4 16:27:21 UTC 2023 x86_64

TrueNAS (c) 2009-2023, iXsystems, Inc.
All rights reserved.
TrueNAS code is released under the modified BSD license with some
files copyrighted by (c) iXsystems, Inc.

For more information, documentation, help or support, go here:
http://truenas.com

Welcome to TrueNAS
Last login: Tue Aug 1 06:36:16 SAST 2023 on pts/24

Warning: the supported mechanisms for making configuration changes
are the TrueNAS WebUI and API exclusively. ALL OTHERS ARE
NOT SUPPORTED AND WILL RESULT IN UNDEFINED BEHAVIOR AND MAY
RESULT IN SYSTEM FAILURE.

root@vaultx[~]# zpool status bunker tank
pool: bunker
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: resilvered 2.09M in 00:00:10 with 0 errors on Thu Aug 3 14:49:10 2023
config:

NAME STATE READ WRITE CKSUM
bunker ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
633c2841-a6a1-11eb-9a55-e0d55e575a95 ONLINE 0 0 0
6328366f-a6a1-11eb-9a55-e0d55e575a95 ONLINE 0 0 1
65e51131-a6a1-11eb-9a55-e0d55e575a95 ONLINE 0 0 0

errors: No known data errors

pool: tank
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: resilvered 12.7G in 00:04:11 with 0 errors on Thu Aug 3 14:20:02 2023
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
eeecce3d-8009-11eb-894e-1c1b0dce071c ONLINE 0 0 0
f1d9f025-7a88-11eb-ae9f-1c1b0dce071c ONLINE 0 0 0
743606b0-865d-11eb-9d22-e0d55e575a95 ONLINE 0 0 0

errors: No known data errors
root@vaultx[~]#
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
The second drive in bunker seems to have one checksum error... could still be from cabling or some other issue, so maybe run zpool clear on it and then run a scrub to see if it comes back or not.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
zpool clear bunker executed
starting a scrub now.

G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
ok, me semi lost, how do i run a scrub... looked and got to the point of nearly running a SMART and then realised thats not a scrub.

G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
found scrubs, started.

G
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
or...

zpool scrub poolname
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
busy sourcing a new higher power psi with more output points, (so i don't need to use splitters)
scrub is running, apparently 3.5hours. so will update later.
G
 
Last edited:

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
so scrubs came back, see below...
note while scribe was running was still getting those error/emails.

root@vaultx[~]# zpool status bunker tank
pool: bunker
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 05:22:15 with 0 errors on Thu Aug 3 21:08:02 2023
config:

NAME STATE READ WRITE CKSUM
bunker ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
633c2841-a6a1-11eb-9a55-e0d55e575a95 ONLINE 0 0 0
6328366f-a6a1-11eb-9a55-e0d55e575a95 ONLINE 0 0 0
65e51131-a6a1-11eb-9a55-e0d55e575a95 ONLINE 0 0 0

errors: No known data errors

pool: tank
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: resilvered 12.7G in 00:04:11 with 0 errors on Thu Aug 3 14:20:02 2023
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
eeecce3d-8009-11eb-894e-1c1b0dce071c ONLINE 0 0 0
f1d9f025-7a88-11eb-ae9f-1c1b0dce071c ONLINE 0 0 0
743606b0-865d-11eb-9d22-e0d55e575a95 ONLINE 0 0 0

errors: No known data errors
root@vaultx[~]#
 
Top