SMTP reporting for mfi arrays

Status
Not open for further replies.

whosmatt

Dabbler
Joined
Jun 6, 2012
Messages
20
Hi all,

I know this question has been asked before but it's been a while and I haven't seen an answer.

Background:

We've got a SuperMicro chassis with an Intel SRCSASJV controller and two expanders. Proc is Xeon E5620, RAM currently is 32GB. Connected are 36 2TB Segate drives for data and 2 WD 160GB laptop drives in a hardware mirror for the operating system.

Previously we were running Solaris 11 Express but never really put the system into production (just hosting some archives and backups) because of performance issues with having a single LUN presented to ZFS (at least, that's what we suspect). Plus, Solaris isn't open and we felt a little beholden to Oracle, which is not a good position to be in.

We've decided to give FreeNAS a go, and we tested it on a smaller system (which we're using to hold the data from this box as we migrate) and it's passed with flying colors.

Using mfiutil, we assigned 16 of the 2 TB drives into four RAID10 VDs and then striped those in ZFS into one ZFS pool. We took another 18 drives and made JBODs with them in mfiutil and then aggregated those into a ZFS pool with dual parity and 2 hot spares. The remaining 2 drives are global hot spares for the aforementioned RAID10 arrays.

We did it this way so we could swap data back and forth if we decide to re-arrange our drives, and to compare the performance differences between hardware and software RAID, and with presenting ZFS with more devices vs. having the controller handle it.

Make sense so far?

What we're missing now, over running Solaris, is RWC2, Intel's management utility for the controller and drives. mfiutil is completely functional, and we're fine with cli management, except we need to be notified of certain events.

SMTP would be great, but if we can do it with SNMP or something else that would be ok too. One thought we've had is devising a script that runs `mfiutil show events` or something like that and greps for failures, and if it finds any, shoots us an email... or, grep dmesg...

If this isn't possible or is too clunky then it's back to Solaris (yuck, Oracle) or opensolaris (which we've not yet evaluated). I'd like to stick with FreeNAS if we can get over the notfication hurdle.

Is there anyone else with a similar setup?

-M

EDIT: looks like opensolaris is dead? oh well.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Using mfiutil, we assigned 16 of the 2 TB drives into four RAID10 VDs and then striped those in ZFS into one ZFS pool.
FYI, in a stripe ZFS can only report errors and not correct them.

We took another 18 drives and made JBODs with them in mfiutil and then aggregated those into a ZFS pool with dual parity and 2 hot spares.
What is this pool made of? It has dual parity and how many vdevs?


What we're missing now, over running Solaris, is RWC2, Intel's management utility for the controller and drives. mfiutil is completely functional, and we're fine with cli management, except we need to be notified of certain events.

SMTP would be great, but if we can do it with SNMP or something else that would be ok too. One thought we've had is devising a script that runs `mfiutil show events` or something like that and greps for failures, and if it finds any, shoots us an email... or, grep dmesg...
So, the main question(?) is for a script to send SMTP notifications? I would think you need the mfiutil one or how would FreeNAS know a drive in one of the RAID10 arrays was down?
 

whosmatt

Dabbler
Joined
Jun 6, 2012
Messages
20
FYI, in a stripe ZFS can only report errors and not correct them.

Right, we're not looking for ZFS to report errors on the pool that's composed of RAID10 vdevs. We'd have to lose at least 2 drives in a single vdev and in a quicker fashion than they could be replaced by hot spares for anything to happen. There are 2 hot spares assigned on the controller for these four vdevs. 18 drives in total. The reason for doing this is that we had problems with massive i/o when presenting zfs with a single vdev. This was on Solaris 11, though. It was suggested to us that zfs likes to see more vdevs or spindles and make i/o assumptions on its own.

What is this pool made of? It has dual parity and how many vdevs?

This pool is made of a total of 18 vdevs. They are all single drives. (EDIT: I can see how this is confusing. mfiutil uses the term JBOD if you want to assign a single drive to a single mfid device.) Sixteen are active and 2 are spares (assigned in ZFS, not on the controller obviously). If you think this is risky we can easily rearrange and assign more spares.


So, the main question(?) is for a script to send SMTP notifications? I would think you need the mfiutil one or how would FreeNAS know a drive in one of the RAID10 arrays was down?

Yes. We can write the script no problem but I was wondering if there was something about mfiutil or megacli of which we're not aware, or a script out there that will do what we need without spending the time.

One other question is whether freenas can query the vdevs that are single drives using S.M.A.R.T or does the controller get in the way of that by presenting the drive as a mfid device? Does that make sense?

Anyway, did some more testing today and performance seems good. I tested with max compression (gzip -9 I think) on the software RAID pool and was still able to maintain about 500Mbps throughput. Backed it off to lzjb though as the system was pretty well loaded at that point.

Thanks,

Matt
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Right, we're not looking for ZFS to report errors on the pool that's composed of RAID10 vdevs. We'd have to lose at least 2 drives in a single vdev and in a quicker fashion than they could be replaced by hot spares for anything to happen.
Well, not quite.

In a well-done paper from Dell and EMC the problem is described this way:

System administrators may feel that because they store their data on a redundant disk array and maintain a well-designed tape-backup regimen, their data is adequately protected. However, undetected data corruption can occur between backup periods-backing up corrupted data yields corrupted data when restored. Scenarios that can put data at risk include:

* Controller failure while data is in cache
* Power outage of extended duration with data in cache
* Power outage or controller failure during a write operation
* Errors reading data from disk
* Latent disk errors
ZFS checksums protects against this. If you did want to take advantage of it you could e.g. change the pool to 8 x 2 disk RAID0 VDs and have FreeNAS create striped mirrors from them.


This pool is made of a total of 18 vdevs. They are all single drives. (EDIT: I can see how this is confusing. mfiutil uses the term JBOD if you want to assign a single drive to a single mfid device.) Sixteen are active and 2 are spares (assigned in ZFS, not on the controller obviously). If you think this is risky we can easily rearrange and assign more spares.
This sounds like you are saying it's a stripe of 16 disks? Maybe I'm just not understanding. What does the output of the following look like:
Code:
zpool status -v poolname



One other question is whether freenas can query the vdevs that are single drives using S.M.A.R.T or does the controller get in the way of that by presenting the drive as a mfid device? Does that make sense?
Yes, as it does get in the way. You have a LSI1078 I believe. You may need to play with it a little, but you can probably get it working. See smartmontools RAID-controller page.


Anyway, did some more testing today and performance seems good. I tested with max compression (gzip -9 I think) on the software RAID pool and was still able to maintain about 500Mbps throughput. Backed it off to lzjb though as the system was pretty well loaded at that point.
IMO, max compression, gzip -9, isn't worth it. As you saw it just slows you down with zero or next to zero benefit. LZJB is fast or a lower gzip value is fine as well.


I'd finish testing with FreeNAS first, but I'd want to be running FreeBSD 8.3 with ZFS v28 myself.
 

whosmatt

Dabbler
Joined
Jun 6, 2012
Messages
20
Well, not quite.

ZFS checksums protects against this. If you did want to take advantage of it you could e.g. change the pool to 8 x 2 disk RAID0 VDs and have FreeNAS create striped mirrors from them.

Fair enough, will consider creating 9 x 2 disk RAID0 VDs and having ZFS use the 9th as a spare. And I'll amend my previous statement to "for anything to happen due to a simple disk failure."


This sounds like you are saying it's a stripe of 16 disks? Maybe I'm just not understanding. What does the output of the following look like:
Code:
zpool status -v poolname

Here's the output:

Code:
[root@hermes] /mnt/zfs6/cifs1/backups# zpool status -v zfs6
  pool: zfs6
 state: ONLINE
 scrub: none requested
config:

	NAME          STATE     READ WRITE CKSUM
	zfs6          ONLINE       0     0     0
	  raidz2      ONLINE       0     0     0
	    mfid5p2   ONLINE       0     0     0
	    mfid6p2   ONLINE       0     0     0
	    mfid7p2   ONLINE       0     0     0
	    mfid8p2   ONLINE       0     0     0
	    mfid9p2   ONLINE       0     0     0
	    mfid12p2  ONLINE       0     0     0
	    mfid13p2  ONLINE       0     0     0
	    mfid14p2  ONLINE       0     0     0
	    mfid15p2  ONLINE       0     0     0
	    mfid16p2  ONLINE       0     0     0
	    mfid17p2  ONLINE       0     0     0
	    mfid18p2  ONLINE       0     0     0
	    mfid19p2  ONLINE       0     0     0
	    mfid20p2  ONLINE       0     0     0
	    mfid21p2  ONLINE       0     0     0
	    mfid22p2  ONLINE       0     0     0
	spares
	  mfid10p2    AVAIL   
	  mfid11p2    AVAIL 



Yes, as it does get in the way. You have a LSI1078 I believe. You may need to play with it a little, but you can probably get it working. See smartmontools RAID-controller page.
Will have a look. Yes, you're correct, the SRCSASJV is a LSI1078.


IMO, max compression, gzip -9, isn't worth it. As you saw it just slows you down with zero or next to zero benefit. LZJB is fast or a lower gzip value is fine as well.

I'd finish testing with FreeNAS first, but I'd want to be running FreeBSD 8.3 with ZFS v28 myself.

Agreed on the compression; I just wanted to see what happened when I jacked it up. What, in your opinion, are the advantages of running FREEBSD 8.3? I honestly hadn't considered that as an option. Our previous hangup with FreeNAS was AD integration, but it seems better in the latest version.

Again, thanks for your insight.

Matt
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Fair enough, will consider creating 9 x 2 disk RAID0 VDs and having ZFS use the 9th as a spare. And I'll amend my previous statement to "for anything to happen due to a simple disk failure."
Or 6 x 3 disk RAID0 VDs or ...

Code:
[root@hermes] /mnt/zfs6/cifs1/backups# zpool status -v zfs6
  pool: zfs6
 state: ONLINE
 scrub: none requested
config:

	NAME          STATE     READ WRITE CKSUM
	zfs6          ONLINE       0     0     0
	  raidz2      ONLINE       0     0     0
	    mfid5p2   ONLINE       0     0     0
	    mfid6p2   ONLINE       0     0     0
	    mfid7p2   ONLINE       0     0     0
	    mfid8p2   ONLINE       0     0     0
	    mfid9p2   ONLINE       0     0     0
	    mfid12p2  ONLINE       0     0     0
	    mfid13p2  ONLINE       0     0     0
	    mfid14p2  ONLINE       0     0     0
	    mfid15p2  ONLINE       0     0     0
	    mfid16p2  ONLINE       0     0     0
	    mfid17p2  ONLINE       0     0     0
	    mfid18p2  ONLINE       0     0     0
	    mfid19p2  ONLINE       0     0     0
	    mfid20p2  ONLINE       0     0     0
	    mfid21p2  ONLINE       0     0     0
	    mfid22p2  ONLINE       0     0     0
	spares
	  mfid10p2    AVAIL   
	  mfid11p2    AVAIL 
Ok, raidz2 array of 16 disks and 2 spares. Multiple issues here;)


  1. Maximum record size in ZFS = 128kb, default record size in ZFS = 128kb
    128kb is effectively ZFS stripe size. Take the total array - parity disks, 16-2=14, 128kb/14≈9.12kb. Not good.
  2. The array is too large. Recommendations are to keep it 9 or less. If you have more disks create multiple ZFS vdevs.
    R. Elling said:
    The raidz pathological worst case is a random read from many-column raidz where files have records 128 KB in size. The inflated read problem is why it makes sense to match recordsize for fixed record workloads. This includes CIFS workloads which use 4 KB records. It is also why having many columns in the raidz for large records does not improve performance. Hence the 3 to 9 raidz disk limit recommendation in the zpool man page.
  3. Spares are broken in FreeBSD 8.2. They don't work automatically. I'm not sure if this has been patched in FreeBSD 9 or 8.3 yet.
  4. This entire array has the random write performance of a single drive. Multiple vdevs increase write performance.

What capacity and iops are you looking to get out of this? You seem to have enough spindles to do whatever you want. If you needed dual parity and maximum iops you could run striped 3-way mirrors. It just eats a lot of your space. You could also run 3 x 6 x disk raidz2. Though I don't know if that will be enough performance for you.



What, in your opinion, are the advantages of running FREEBSD 8.3? I honestly hadn't considered that as an option.
Mainly ZFS v28. Not just for the added ZFS features, remove log device, but the better ZFS code, pool disaster recovery.
 

whosmatt

Dabbler
Joined
Jun 6, 2012
Messages
20
My testing has come to a standstill because of a networking issue I'm having. I tried to post down in the networking area, but need moderator approval first. Gist of the issue is that my lagg group (lacp) seems limited to 100Mbps since my first reboot. When I first configured it, it was fine. I've temporarily dropped one of my NICs from the group and configured it as an untagged interface on the vlan I'm using most and that seems fine performance wise, but I really want to have a single lagg interface as my parent interface for my vlan interfaces. I'm not thrilled with the way that freenas handles network configuration (or perhaps I simply don't understand it) and that seems to be the best way to set myself up for the future. If anyone here is a mod and can help me get that thread going, I'd really appreciate it. And I apologize for duplicate submissions on that thread; I was really busy and wasn't really reading the message saying it was held for moderation. D'oh!

-M

EDIT: a quick pm to the mod got my post going, so thanks! It's here if anyone reading this wants to chime in: http://forums.freenas.org/showthread.php?7466-Networking-slow-after-first-reboot-of-new-install
 

whosmatt

Dabbler
Joined
Jun 6, 2012
Messages
20
Just wanted to give an update on this thread... ended up going with OpenIndiana on the "big" box.. the one I'm referencing here. Only thing I'm missing from when I was running Solaris is the encryption..

I ended up creating 18x raid0 vdevs in the hardware and then making a raidz2 + one hot spare with 9 of them in zfs. The other 9 are as yet to be configured. I've yet to run into any of the issues I was having with Solaris, perhaps due to upgrading from 8 to 32GB of RAM.

Anyway, thanks for the insights into maximizing the performance and benefits of zfs with my hardware. I've got Intel's RAID Web Console up and running but also I modded a PERL script I found so that it will send an email alert if any zfs pool is degraded (well, it will send the alert once cron runs the script). Tested the efficacy of the hot spare by unceremoniously offlining a drive that was reporting predicted failure and the hot spare took over at once.

So, once again, thanks for the help. I'm sorry that FreeNAS 8 wasn't the solution for us, because it's a really slick platform and I'm a big fan of the platforms that (as I understand it) spawned it (M0n0wall and later pfsense) but OpenIndiana is filling the bill quite nicely, and the best part is no more Oracle (I hope).

If there's a forum dedicated to this kind of help on zfs (regardless of the OS), let me know. I'd love to be a participant there.

Matt
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Just wanted to give an update on this thread... ended up going with OpenIndiana on the "big" box.. the one I'm referencing here.
To each his own. At least you reported back.

I ended up creating 18x raid0 vdevs in the hardware and then making a raidz2 + one hot spare with 9 of them in zfs. The other 9 are as yet to be configured. I've yet to run into any of the issues I was having with Solaris, perhaps due to upgrading from 8 to 32GB of RAM.
ZFS does like its RAM. If you want a single raidz2 pool I would do a 10 disk pool. Yes, that is one more than the recommended 9, but worth it in this case. 128kb/(9-2)≈18.28kb vs 128kb/(10-2)=16kb. 16kb is on the small side, but so it 18.28kb and 16kb is evenly divisible.

If there's a forum dedicated to this kind of help on zfs (regardless of the OS), let me know. I'd love to be a participant there.
None that I've come across yet. It would certainly be worth a read if anyone knows of one.

If you haven't found http://www.zfsbuild.com you should go read it. As a bonus for you OpenIndiana is much closer to Nexenta than FreeBSD is to it.
 
Status
Not open for further replies.
Top