Email when I have drive failure?

Status
Not open for further replies.

rgreenwalt

Cadet
Joined
Nov 10, 2012
Messages
4
I have just built my first freenas box. Before I put in it production I'm testing some things. I put in some known flaky drives, built up a raidz and loaded it up with data. A manually triggered scrub found the problem and repaired it but I didn't get an email. The GUI showed the problem and put up a yellow blinky light in the corner (says "Alert System - WARNING: The volume TestRaid (ZFS) status is UNKNOWN: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'.")

I can go into the root user gui and request a test email and I get that fine. Shouldn't a NAS system scream and shout when it it detects this sort of issue? I'd expect an email. Did I mis-configure something?

Mmm - just a thought: the admin user is set to "admin" (the default) but my email config and testing has been on the "root" user. I wouldn't guess that admin==root but there is no "admin" in the user list. A strange mismatch, that.

Thanks for any help.

Robert
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
For starters, this has ALL been hashed out in other threads. I did some testing of my own and posted some interesting results from those tests in the forum.

1. If it repaired it why even send you an email? It doesn't need any immediate action to be taken from you. If you got an email, what actions do you think you'd want to take? I can't think of any at that point. If you've done your job and setup SMART testing you shouldn't have to be too worried about it.
2. FreeNAS changes the ZFS status to UNKNOWN when there is any kind of serious error. Examples I can think of include a hard drive detaching from the zpool, unrecoverable read or write error, etc. If you check out your nightly emails, you'll see that those emails will give you information that something is wrong. Then you can log into FreeNAS at your leisure and check out the problem. Serious errors that the system deems it wants your attention for do NOT self clear(unless you reboot I believe). You MUST log in and manually clear the error from the shell.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Actually, failures where something is "repaired" are nice to be aware of. Just because you can get reporting from the drives at one level (via SMART) doesn't mean that it might not be nice to be made aware of the fact that some automated system has massaged your data into what it believes is the correct form. It really depends on how valuable your data is and how much you care.
 

rgreenwalt

Cadet
Joined
Nov 10, 2012
Messages
4
For starters, this has ALL been hashed out in other threads. I did some testing of my own and posted some interesting results from those tests in the forum.

1. If it repaired it why even send you an email? It doesn't need any immediate action to be taken from you. If you got an email, what actions do you think you'd want to take? I can't think of any at that point. If you've done your job and setup SMART testing you shouldn't have to be too worried about it.
2. FreeNAS changes the ZFS status to UNKNOWN when there is any kind of serious error. Examples I can think of include a hard drive detaching from the zpool, unrecoverable read or write error, etc. If you check out your nightly emails, you'll see that those emails will give you information that something is wrong. Then you can log into FreeNAS at your leisure and check out the problem. Serious errors that the system deems it wants your attention for do NOT self clear(unless you reboot I believe). You MUST log in and manually clear the error from the shell.


Thanks for the reply. To your points:

1. If I read the output of "zpool status" it directs me to a removed sun website for info on error ZFS-8000-9P. Searching for the error you can still find the text elsewhere which reads (in part):
On the other hand, errors may very well indicate that the device has failed or is about to fail. If there are continual I/O errors to a device that is otherwise attached and functioning on the system, it most likely needs to be replaced. The administrator should check the system log for any driver messages that may indicate hardware failure. If it is determined that the device needs to be replaced, then the 'zpool replace' command should be used:

In that spirit I want to know about drive errors even if ZFS/ZRAID is able to work around it.

2. If you read my email, the gui warning states that the status is UNKNOWN. I would have expected an email but didn't get one. People pay to get 2 or 3 drive redundancy because of the risk of permanent loss in the repair time window. Why give away 12 hours on average waiting for a daily email? Surely this could/should send an email immediately when a volume goes to UNKNOWN status. Is there a way to get immediate email in this situation from FreeNAS?

Thanks

Robert
 

rgreenwalt

Cadet
Joined
Nov 10, 2012
Messages
4
What noobsauce80 could have told me was to verify the SMART Service configuration had a valid email addr. I assumed it would inherit the other system-wide notification email (rott user's email setting), but SMART has it's own email setting.
 

Stephens

Patron
Joined
Jun 19, 2012
Messages
496
That sounds an awful lot like you blame noobsauce80 for not telling you what you don't know. He's a fellow user using his personal time to help. His responsibility to fill your knowledge gaps ranks significantly below your own. I'm not trying to slap your hand, but it does rankle me a bit when I see folks on the side of the road with a busted tire and no lug wrench, someone else stops to help, and you complain because his lug wrench doesn't fit your lug nuts. Personally, I'd feel compelled to say, "Thanks anyway," for the attempt.
 

rgreenwalt

Cadet
Joined
Nov 10, 2012
Messages
4
I don't blame noobsauce80 for not telling me what I don't know. I was unhappy with his dismissive tone and bad advice while pretending to be an expert, but I can't fault anybody for honestly not knowing stuff - there's tons I don't know too (obviously).

If you want to go into tire analogies though - what I felt was I was driving my first car and had a flat. I stopped and called for help and a guy shows up in a tow truck apparently an expert (see noob's sig - read my guide, etc). I ask him how to change the tire and he looks at me and asks "can you drive on it?" I say "uh, I guess so" and he says "then you don't really need to change it, do you" and laughs while he drives off. I feel this isn't quite right, do more reading and find that doing that (driving on the flat) puts me at risk of a crash and the manual says not to - it also explains how to change it.

What I had asked for was help and what I got back was snarky comments ("If you've done your job" - I was asking what my job was!), dismissive attitude ("If it repaired it why even send you an email?") and dangerous advice ("you shouldn't have to be too worried about it").

Anyway, the crux of it was that if you setup the email in the smartd service you DO get more immediate feedback (polling interval - 30m default) when drives are having problems and don't have to wait for daily emails.

Thanks anyway
Robert
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
2. If you read my email, the gui warning states that the status is UNKNOWN. I would have expected an email but didn't get one. People pay to get 2 or 3 drive redundancy because of the risk of permanent loss in the repair time window. Why give away 12 hours on average waiting for a daily email? Surely this could/should send an email immediately when a volume goes to UNKNOWN status. Is there a way to get immediate email in this situation from FreeNAS?

To put it briefly, no. ZFS is not exactly a traditional UNIX filesystem, and doesn't fit cleanly in the UNIX kernel model. There are some things that you'd expect it to "magically do" in the way a hardware RAID controller could and would do, and it doesn't. These shortcomings are mostly known and understood, but from an architectural point, these things need to be done in userland, not kernelland. FreeBSD 9 will include zfsd, which is intended to address many of these shortcomings, though I'm not carefully tracking the specifics at this time, so don't ask me what (Google is your friend!) It would have been possible for FreeNAS to "build their own" tools to do these things, but quite frankly they're probably making the right decision to focus on new development and not reinventing the wheel. So it's pretty much a "wait for FreeNAS 9" or whatever they're going to call it sort of thing. I'm guessing that failure management will become much more robust at that time.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
What noobsauce80 could have told me was to verify the SMART Service configuration had a valid email addr. I assumed it would inherit the other system-wide notification email (rott user's email setting), but SMART has it's own email setting.

You're right, SMART has it's own email system. It would have been "SMART"(haha.. pun kinda intended) to fill in that block when setting up SMART. After all, you did RTM, right? None of the windows should be new to you unless it was a feature added recently you just haven't explored. At some point I have to expect a level of knowledge and "get it done" attitude. I'm sorry I set my bar too high for you.

I do accept responsibility for not telling you if you'll admit you didn't look for the field. I'd expect most people could figure out that field on their own, so I didn't think it would be something you'd get wrong.

A dismissive attitude would have either not responded or responded with RTFM or 'search the forums airhead'. Believe me, I considered not responding because I've seen this question too many times. Instead I explained that there are long detailed answers in the forum if you wish to search, but I still gave you enough of a basic jist to see what the potential system limitations are. I've started ignoring more and more thread(and almost ignored yours) because I'm finding that I'm spending far too much time answering the same few questions over and over instead of trying to hash out new problems with people.

Also, in many cases the hardware used does not support SMART testing and results, (3 RAID controllers I have tested do not return serial number or provide SMART functions). In your case in particular you also failed to provide ANY hardware, FreeNAS software version, or ANYTHING except your complaint. If you had consulted the forum guidelines(or used your brain.. yes... now I'm being dismissive) you'd have at least included some of your hardware or software versions. But, you didn't. You didn't provide me with ANY information at all except the generic error that EVERYONE gets when virtually any error occurs. So I made a few generalizations that are most often correct, that SMART doesn't always work(hence I didn't mention you would receive a SMART email per the polled setting) and that you are capable of seeing a blank email field and completing it properly. Typically I'd expect someone experimenting with FreeNAS to be able to do a few things correctly like know what a "username", "password", and "email address" are. If you didn't know something I'd have expected you to ask or read the manual and not spend your time criticizing my "incomplete" advice. I had no intention to provide incomplete advice, but there are only so many problems that can be easily identified with the minimal information provided.

Anyway, I am now done with this thread. You can find the other thread I had posted in where I spent almost 6 hours playing around with and breaking a zpool for the benefit of the community just to provide the information I have provided you in my first post. There's lots of detailed information in that thread and it's probably 2-4 months old. Go use the search feature or wait for someone else to hash it out with you if you really want the long explanation.
 
Status
Not open for further replies.
Top