Phantom Alert Emails

k9bm

Dabbler
Joined
Apr 18, 2016
Messages
14
A couple of months ago I suffered a power supply issue that caused multiple drives in my zfs-2 pool to start throwing errors. I subsequently replaced the power supply and purchased a new set of (larger) HDDs, one by one installing them while each new one was resilvered. At the end of the process I had a larger capacity pool along with brand new drives and new power supply. I cleared the literally hundreds of alerts that had been generated during the episode.

Within a week, I started getting email alerts that the pool was degraded because of a bad drive, but about one minute later another email came in that the error had been cleared. Another time the message would be that smartd was not running, a minute later another email that it been cleared. Etc. etc. etc. These were all alerts that had been triggered previously with the old HDDs, many alerts even referenced specific hard drive ID numbers that no longer exist. The most troubling part is that logging into the GUI, no alerts are shown, none active, none cleared.

So this doesn't seem have a functional significance, but it's become a major irritation. Some days I receive a dozen of these emails, and my concern is that I have started to disregard them so that sooner or later when a REAL alert comes in I may not notice it. I have searched the forums and found only a couple of threads of how to completely delete the alert history, which is apparently kept in a database file of some sort. But the process seems to be very geeky and perhaps not without risk. Advice? This is happening in TrueNAS-13.0-U6.1.
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
And there are for sure newly created alerts?

There are probably ways to view the alert history with timestamps.

Personally if you are on top of your data protection game (smart tests, scrubs) and there are no errors logged (zpool status you could ignore it and see if it stops after all "old" alerts have been sorted out.

You could also start with a fresh install and a config backup.

Personally I'd go with the latter option and see if that resolved anything, but I don't want to rule out that there are ways to sort this out without a reinstall.
 

k9bm

Dabbler
Joined
Apr 18, 2016
Messages
14
They are definitely new emails, but of course the alert email itself does not give the time the alert was generated, the email header gives the timestamp of when it was sent by TrueNAS. Zpool status shows no errors, nor does the GUI, although curiously it does show that a resilver has started even though these are new drives with no reported errors according to individual smartd status reports on each drive. Maybe that's normal zfs behavior, I don't know. And yes, I am seriously considering doing a config save and then wiping the boot drive and doing a TrueNAS reinstall....
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
Can you post the output of zpool status? There shouldn't be a resilver going on you don't know about.

And yes, I am seriously considering doing a config save
Save yourself from tears and start backing up your config regularly anyway, boot drives die from time to time.
 

k9bm

Dabbler
Joined
Apr 18, 2016
Messages
14
cabinet# zpool status
pool: Cabinet
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Mar 11 21:26:31 2024
41.6T scanned at 0B/s, 41.3T issued at 306M/s, 41.6T total
771G resilvered, 99.42% done, 00:13:43 to go
config:

NAME STATE READ WRITE CKSUM
Cabinet ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/f1e9bc7f-c2e5-11ee-bdcf-0cc47aaa6366 ONLINE 0 0 0
gptid/f16bebec-c58b-11ee-96db-0cc47aaa6366 ONLINE 0 0 0 (resilvering)
gptid/da41a534-c6cf-11ee-84bd-0cc47aaa6366 ONLINE 0 0 0 (resilvering)
gptid/b1f4ffca-c446-11ee-ae05-0cc47aaa6366 ONLINE 0 0 0
gptid/ba071f68-c939-11ee-8d81-0cc47aaa6366 ONLINE 0 0 0
gptid/9ce50913-ca8e-11ee-8367-0cc47aaa6366 ONLINE 0 0 0
gptid/c740b371-cbe1-11ee-b3ad-0cc47aaa6366 ONLINE 0 0 0
gptid/614cb2b8-cc3c-11ee-8da1-0cc47aaa6366 ONLINE 0 0 0

errors: No known data errors

pool: freenas-boot
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 00:00:28 with 0 errors on Sun Mar 10 03:45:28 2024
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ada0p2 ONLINE 0 0 0
ada1p2 ONLINE 0 0 0

errors: No known data errors
 

k9bm

Dabbler
Joined
Apr 18, 2016
Messages
14
I was afraid you were going to say that, I was just hoping that resilvering was a normal operational or maintenance activity that zfs performed, not necessarily only when I replace a drive....
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
Please use the CODE Tags for better readability.

I was afraid you were going to say that, I was just hoping that resilvering was a normal operational or maintenance activity that zfs performed,
No this is not a standard maintenance operation.

Per your output the resilvering started yesterday night and should be done in a few.

This is weird if you don't know anything about that, but from my POV the output looks good.

If your smart tests are also good I'm in favor of a fresh installation, or you wait if someone has an idea on how to tackle this in place.
 

k9bm

Dabbler
Joined
Apr 18, 2016
Messages
14
I posted that zpool status 15 minutes ago, and sat here as I watched as it counted down from 3 minutes to finish (in the GUI). Now it is steadily incrementing and as I type is up to over 3 hours remaining. That just can't be normal behavior, but I can't see how it could be related to my phantom alert email problem. Still, after I sleep I am going to take your advice and reinstall the OS....

EDIT: A few minutes later it is up to 6 hours left resilvering and still incrementing up....
 

k9bm

Dabbler
Joined
Apr 18, 2016
Messages
14
cabinet# zpool status
pool: Cabinet
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Mar 12 02:19:03 2024
3.08T scanned at 7.77G/s, 1.11T issued at 2.81G/s, 41.6T total
24.6G resilvered, 2.68% done, 04:06:07 to go
config:

NAME STATE READ WRITE CKSUM
Cabinet ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/f1e9bc7f-c2e5-11ee-bdcf-0cc47aaa6366 ONLINE 0 0 0
gptid/f16bebec-c58b-11ee-96db-0cc47aaa6366 ONLINE 0 0 0 (resilvering)
gptid/da41a534-c6cf-11ee-84bd-0cc47aaa6366 ONLINE 0 0 0 (resilvering)
gptid/b1f4ffca-c446-11ee-ae05-0cc47aaa6366 ONLINE 0 0 0
gptid/ba071f68-c939-11ee-8d81-0cc47aaa6366 ONLINE 0 0 0
gptid/9ce50913-ca8e-11ee-8367-0cc47aaa6366 ONLINE 0 0 0
gptid/c740b371-cbe1-11ee-b3ad-0cc47aaa6366 ONLINE 0 0 0
gptid/614cb2b8-cc3c-11ee-8da1-0cc47aaa6366 ONLINE 0 0 0

errors: No known data errors

pool: freenas-boot
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 00:00:28 with 0 errors on Sun Mar 10 03:45:28 2024
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ada0p2 ONLINE 0 0 0
ada1p2 ONLINE 0 0 0

errors: No known data errors
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
Please use CODE Tags

1710232543448.png

1710232599954.png

Code:
your code goes here


Code:
  scan: resilver in progress since Tue Mar 12 02:19:03 2024

        3.08T scanned at 7.77G/s, 1.11T issued at 2.81G/s, 41.6T total

        24.6G resilvered, 2.68% done, 04:06:07 to go


Seems like the resilver has started again, previously it was starting at Mon Mar 11 21:26:31 2024 (I hope I interpret in progress since correctly here).

You may want to hold on with that reinstall, you need to find out why the resilver progress keeps restarting.
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
What does zpool history give you?

Check the logs, I don't know which one, maybe /var/log/messages, if there is anything.

Are you sure all drives are properly connected, no lose cable?

Searching the forum indicates a resilver will survive a reboot. Maybe after investigating the logs you could make sure everything is properly connected/seated.

Do you have backups?

Please attach all your used hardware in detail, possibly should have asked that to start with: What drives did you buy? Please state the specific model. I want to know if they are CMR or SMR.

Edit: did you physically remove all old drives? Seems like two drives are being delivered but I would have assumed after finished the first it will not keep the (resilvering) part in zpool status.
 
Last edited:

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Please attach all your used hardware in detail, possibly should have asked that to start with: What drives did you buy? Please state the specific model. I want to know if they are CMR or SMR.
Yes this. Without this information any attempts to diagnose or troubleshoot this issue are nothing more than wild guesses.
 
Top