event coalescing

Status
Not open for further replies.

willnx

Dabbler
Joined
Aug 11, 2013
Messages
49
It would be really awesome if I didn't get a massive pile of emails from the NAS, but instead got one email saying "This thing happened a pile of times."

+Bonus points: If I could toggle coalescing on/off or configure only specific events to be coalesced.

Example:
My volume is filling up. I get the email that it's over 90%, so I start deleting files. I can't always delete as fast as ingest, so I keep bouncing between 90 and 95%. Now I have a huge pile of spam in my inbox.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
If you truly want a feature like this you will need to submit it via the "BugReports" as a feature request.
 

willnx

Dabbler
Joined
Aug 11, 2013
Messages
49
Did you uncheck the box next to the alert in Alert?

Yeah, but that only seemed to stop the day after day email of the alert in the "daily run output". But say I hit 93%, then deleted/moved some stuff so I'm down to 90%, then before I can delete/move more stuff my ingest pushes it back up to 93%, I'd get another "Yo, your NAS is crazy full at 93%." Which is surprisingly annoying while I'm busy looking though the files on my NAS and seeing what I shuffle about/ live without/ determine what broke (in my automation to explain why it got that full in the 1st place) to get it back down to the recommended 80%.

*Fun note: While it didn't account for any significant usages, getting the spam emails increased my utilization of my FreeNAS box while trying to get the utilization below 80%. Which in hind sight I find rather humorous.
"I'm getting really full, and with each email I send, I'm getting fuller..." :P


If you truly want a feature like this you will need to submit it via the "BugReports" as a feature request.

Thanks for the tip - I submitted https://bugs.freenas.org/issues/5959
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
Just a comment that coallescing seems like it'd be really hard to get right. Especially hard getting it right for everyone. For example...

You probably want to get that first email as soon as it occurs so you can work on the problem. But then you're going to need settings for how soon the event recurs before triggering a new message, or just getting rolled into the next.

(If you delete a bunch of files and think you're done, wouldn't you want to be notified that your pool is back over the threshold? As a side note, if you're actually in the situation where just as soon as you bring your pool down 3% it pushes back up 3% again, you really need to be looking at upsizing your pool. What if you're away or the emails are hidden by coallescing and it pushes up to 96% and then 99%? That seems like you could really easily hit the 100% that kills the pool.)

And then there are some alerts that you're going to want to be alerted to every time and not have subjected to coallescing.

Personally, I feel like this would be better handled by some rules on your mail server or client that helps cut down on the amout that you see.

The one improvement that I could see, based on your report, is that a new message shouldn't be sent out until the error is acknowledged in the web gui. That would allow you to delete files and then clear the alert after you're satisfied with the level of free space. If you drop to 90 and back up to 93, you wouldn't see a new message until you've deleted enough to get down below 80 and then manually cleared the alert.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
To be honest, if your pool is 90-95% full you are dangerously close to that 100%. We've got about 100 threads of people complaining about what happens when you hit 100%.. it's not pretty. So frankly, that email is one that I would *never* put off and try to coalesce.
 

willnx

Dabbler
Joined
Aug 11, 2013
Messages
49
You're missing part of what I said in my example - I'm actively working to reduce my utilization, but my ingest is (at times) faster that what I can remove, so my NAS spams me with emails.

*Note:
I'll bet the worst stores about getting to 100% full are from sys admins that didn't set up a reserve space for the volume, which led them to writing so much data to the disk that they literally were too full to delete any of it.
I, however have set up reserve space and tested filling the NAS up to see what would happen (before putting production data on it). I was still able to read the data, and was able to delete to reduce utilization, but the system (especially the web GUI) was a bit sluggish.
Having said that, to anyone from the future that reads this, do not fill up your NAS! The only, ONLY exception is if you're not a home user, and you need to document what happens, and how to fix it; which of course you would do during the test/ burnin phase - aka before putting real data on it.


I do like the suggestion that fracai made:
The one improvement that I could see, based on your report, is that a new message shouldn't be sent out until the error is acknowledged in the web gui. That would allow you to delete files and then clear the alert after you're satisfied with the level of free space. If you drop to 90 and back up to 93, you wouldn't see a new message until you've deleted enough to get down below 80 and then manually cleared the alert.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
You're missing part of what I said in my example - I'm actively working to reduce my utilization, but my ingest is (at times) faster that what I can remove, so my NAS spams me with emails.

I heard that just fine. In fact, to me, that means you are loading faster than you can delete, which means you were even closer to that 100% than you think and if you had hesitated (such as waiting for a coalesced email) you would have been too late.

If you can't delete fast enough to keep up, then you've got SERIOUS problems if you are getting an email. 5% of slack space while ingest is greater than how fast you delete is SERIOUSLY dangerous territory.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The normal sysadmin answer to the case where you have variable rate ingest and space-filling-faster-than-you-can-delete problem is to specifically target the largest stuff to delete first.
 
Status
Not open for further replies.
Top