raidcontroller, smart and monitoring

Status
Not open for further replies.

MadsRC

Dabbler
Joined
Jul 14, 2013
Messages
20
Having set up my zpool, datasets and shares, I've taken a look at monitoring my disks...

I've set up mail, so the system can send me mails. But I understand that monitoring the disks more closely is a good idea. For this I should use S.M.A.R.T.

My systems a 24drive SuperMicro SC846 Xeon E5520 2,25GHz 12Gb ECC RAM with a 3Ware RAID controller. All disks are in JBOD mode.

My question is, will smart testing and polling be possible behind a JBOD configuration?

And would it be necessary? Doesn't the controller monitor this? I suppose it would start flashing red on the LED in the drive-bay?

I also read somewhere on this forum that ZFS monitors the disks, would this turn up in the daily mails (the standard ones) - and would, if the controller marks a disk as dead/bad, it be added to the daily report?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Having set up my zpool, datasets and shares, I've taken a look at monitoring my disks...

I've set up mail, so the system can send me mails. But I understand that monitoring the disks more closely is a good idea. For this I should use S.M.A.R.T.

My systems a 24drive SuperMicro SC846 Xeon E5520 2,25GHz 12Gb ECC RAM with a 3Ware RAID controller. All disks are in JBOD mode.

As a former 3ware RAID controller user(I upgraded and had no use for the 3ware anymore) let me ask you these questions. When you have the answers you will be able to decide for yourself how important thing are. No, I can't provide the answers for you because just because my 3ware controller worked doesn't tell me if yours will. It is also about how much risk you are willing to take. I am very conservative and do not like to take much risk at all. Other people are completely okay with taking very large risks.

Will smart testing and polling be possible behind a JBOD configuration?

Does your 3ware controller work with smartctl and have you tried to monitor a disk in FreeNAS? Have you tried to do a SMART test in FreeNAS?

And would it be necessary? Doesn't the controller monitor this? I suppose it would start flashing red on the LED in the drive-bay?

Does the manual for your card say it will monitor it? If so, what is the 'threshold' for deciding a drive is bad? Is it a single read or write error or only when a drive disconnects itself from the controller?

I also read somewhere on this forum that ZFS monitors the disks, would this turn up in the daily mails (the standard ones) - and would, if the controller marks a disk as dead/bad, it be added to the daily report?

You should get daily emails, but you should definitely test it by using a failed disk or disconnecting a disk while the system is on and see if you get an email that says that the disk is bad.

FreeNAS is nothing more than a pretty UI for FreeBSD. As such, it does try to do a lot of things that the server admin would normally do entirely from the command line. So you should definitely validate any assumptions with your hardware before you trust that you'd get an email if "something were wrong". Plenty of people seem to think that they'll get a warning light, a buzzer on their speaker, an email, etc but didn't try to validate anything.

Hint: if you don't setup the email function you shouldn't ever expect FreeNAS to email you(yes.. prior admins HAVE been that dumb and even admitted to it in their threads asking for mercy from the ZFS data gods). /hangsheadindisgust

Bottom line, you are asking very good questions. You should take the time to validate that all of this stuff really works for you before you start relying on it to store your data safely. I know that it will be time consuming. But your data depends on you knowing how the system is supposed to work and making sure you know how and when you should get warnings if things start going bad. If you are like most people you have done almost everything in Windows and are just starting to dabble outside of the Windows World(that was me 18 months ago). It's going to be a slow and steep learning curve. Remember back to when you first started learning about Windows/DOS. It was very slow to figure stuff out and you had lots of questions and darn few answers. If you aren't ready to take the time to do it right one of two things will happen:

1. You'll get very lucky and not lose your data.
2. You'll lose your data(hopefully you'll have good backups).

More often than not it seems people fall into #2 because they heard about FreeNAS on Thursday and wanted the server up and running by Sunday.

Data on a zpool cannot be recovered with any "recovery tools" I'm familiar with so if poor management on your part leads to an unmountable pool you are going to be very unhappy with the results. One recovery company was quoted as saying that they would attempt recovery for a flat $15000 per month for 40 man-hours per week of labor but there was no guarantee of any data ever being recovered. So keep that in mind when you start assuming things should work a particular way and if you want to take the time to verify they work or not.
 

MadsRC

Dabbler
Joined
Jul 14, 2013
Messages
20
Once again, thank you for your fast reply CyberJock!

I got a little carried away and did some testing, and it seems I can talk to my 3ware controller.
Upon setting up a short test on my disks, I started getting a whole bunch of mail about disk 4 being bad... Guess it works, and guess I have a bad drive to test it out with.

All I need now is figure out circa how long the different tests will last, so I can schedule them... I'm thinking long tests once a month for short tests once a day... Does that sound okay?

I'm no stranger to Unix systems, In fact I feel more at home in a Unix system than In Windows - But learning a new interface and it's quirks is always a daunting task (Luckily I got 2 weeks vacation!)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Upon setting up a short test on my disks, I started getting a whole bunch of mail about disk 4 being bad... Guess it works, and guess I have a bad drive to test it out with.

I keep a bad disk around JUST for tests like that.

All I need now is figure out circa how long the different tests will last, so I can schedule them... I'm thinking long tests once a month for short tests once a day... Does that sound okay?

If you look at all of the smart data from smartctl -a the disks will tell you how long the tests should take. The only test that really matters is the long test. I do scrubs every 14 days(1st and 15th of every month) and I do a long smart test only on the 7th. Of course, being that I tested about 20 disks this weekend via short, long, and badblocks, I can vouch that the long test can ignore errors that appear consistently with badblocks. All of the disks that failed the long test failed badblocks. But 3 of the disks failed badblocks but passed the long test I performed before AND after failing badblocks. So now I'm really questioning the use of long tests at all. I've never used them before until this year and I've had very good luck just by monitoring the smart outputs via emails (http://forums.freenas.org/threads/setup-smart-reporting-via-email.6211/)
 

MadsRC

Dabbler
Joined
Jul 14, 2013
Messages
20
Currently I've set up short tests for the first 12 even hours and the last 12 disks at uneven hours. Then I run a long test once a week for every drive, and currently setting up Offline tests once a month...

My god, smartd.conf get enormous :P

Badblocks you say? Never heard of it, will look it up...

Also: Doing scrups every 14th days (I think that was what I chose...)

Does it seem overkill? All those tests, do they put strain on the drives?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
They all do put some loading on the disks. The short test are completely useless. If the disk were in a condition that the short test would ever fail, you'll know long before a short test is performed that something is horribly wrong with the disk. You'll probably already have gotten lots of emails from the SMART monitoring that things are not going well. That's why I said the only test that matters is the long test. Scrubs are very I/O intensive.

Be careful how you schedule your stuff. You want to be sure that a long test and a scrub are never ever running at the same time. A few people have done it and it doesn't go well. That's why I do 1st and 15th for scrubs and the 7th for long tests. If I were to do 1st and 15th for scrubs and every other Sunday for long tests there would be a possibility that both could run on Sunday the 1st or Sunday the 15th. That will spell disaster for you. :)
 
Status
Not open for further replies.
Top