[RFC] Planning install on LSI MegaRAID SAS 9271-8i (no HBA mode)

HoneyBadger · Aug 22, 2016

Andrii Stesin said:
Dear HoneyBadger, you a perfectly right that I'm "unlikely to get an official "blessing" and I swear - "blessing" is not even close to what I'm looking for :) I'm seeking some technical advice, backed with practical experience. My guess is that probably just nobody ever used this kind of setup in real production, this simple. Or am I wrong?

Nobody uses this kind of setup for the reasons outlined previously in this thread by myself and others - it is completely against the original design of ZFS, it can compromise array performance, and puts your data at unnecessary risk.

You say, "consider that using ZFS for "software RAID" means that your monster quad-core Xeon is now your "RAID processor" and it's packing 128GB of RAM" - so why on the Earth people are using NVidia GPUs, for example? main CPUs are so powerful and fast now! Intel's onboard graphics is so much fast er than any videocard we were using while playing Doom-2! (just kidding). Offloading some tasks to "hardware" (which in fact is just yet another specialized computer) is not that bad an idea, yes I can emulate i.e. ethernet adapter with main CPU, or even the whole switch (like VMWare does), of I can use a general purpose computer to perform as a BGP border router which holds some 3-4 full views. It's possible and sometimes even brings a better price/performance ratio. (BTW do you recall 80287 floating point coprocessors?) But I already got MegaRAIDs as given, so why should I reject getting the most out of them? Especially when risks are negligible and managed?

A GPU is a specialized chip that is optimized to solve the problem of "rendering graphics" - the mathematics done for RAID are not so complex as calculating inverse square-roots really quickly. And you should reject the MegaRAID for the exact same reason I wouldn't use a hammer to drive a screw - it is the wrong tool for the job. Simply put, if you want to build a ZFS system, build it to ZFS specifications and expectations.

You say "In this scenario, ZFS will not be able to protect you from corrupted data." - Ok, let controller alone take this responsibility. BTW in case I use a RAID-given "disk" for zpool, doesn't ZFS do it's own checksumming of data? Yes, I'll be relying 100% on the MegaRAID card for that. (And free pretty much CPU cycles for other purposes, for LZ4 first of all).

ZFS checksums are not full parity data - it can tell you the data is bad, and needs to be reconstructed from the other copies in the mirror or parity RAID, but is not enough data on its own to reconstruct it. In order to do that, you need another copy - and in this case, you won't have one.

"have to write your own scripts to monitor the array status via megacli and submit emails if/when something goes sideways" - I don't see any problem in writing some scripts and put these into crontab, and even call REST endpoints of my ticketing system to open tickets immediately as soon a problem symptoms are observed (I hate email for this, it's 20th century stuff). I remember the whole SQL interpreter written in plain /bin/sh 20+ years ago, so this is obvious.

Understand that even if the scripts work, you will be 100% on your own doing this. Not just "here at the FreeNAS forums" but on any BBS, forum, or ZFS mailing list. If anything in your system behaves abnormally, the immediate first and only response from any support technician is going to be "you have grossly misconfigured your array and we are unable to support you until it is within the ZFS design specifications."

And great thanks you for the note about the probable performance hit. This is the really valuable remark, I already noticed some mentions of this scenario somewhere while reading around, and I certainly will make some tests for this "massive write case" (and tickle some tunables then) before I'll put the whole beast under load. Probably limiting the write transaction size to maybe 1/3 of controller cache size will do.

So at this point you're now discussing artificially limiting the potential performance of the array.

Does the phrase "slow, ineffective, and mostly misconfigured" ring any bells?

Because to be blunt, this is what you're doing again right now; just on a different platform.

The small sum you spend on supported HBAs to properly implement ZFS will pay off.

Spearfoot · Aug 22, 2016

@Andrii Stesin , see the Using a RAID Controller section for examples of things going terribly wrong for those who chose the direction you're choosing, from the 'How To Fail...a guide to things not-to-do' thread. There are additional such examples here and elsewhere on the web.

You're electing to ignore not just the informed advice of the forum's members, some of whom have years of experience in production environments, but also, and more importantly, the express recommendations of the FreeNAS developers themselves. It's your choice; your systems; your data -- and your job -- and while we on the forum think your plan to be imprudent, nevertheless we all wish you every success, and will wait with bated breath to hear the results of your experiment. With any luck, perhaps we won't have to add your experience to the list of unfortunates in the 'Using a RAID Controller' post.

Before you commit fully, though, bear this in mind: those LSI 9271 cards are very expensive, especially compared to HBA models well-supported by FreeNAS. There's a good chance you could sell your LSI 9271 cards, purchase suitable replacement HBA cards such as the LSI 9211, and actually have some cash left over! :)

Good luck!

Andrii Stesin · Aug 22, 2016

"doesn't ZFS do it's own checksumming of data?" - depasseg you say, "Yes it does, and it will alert you to the corruption either on a scrub or when the file is individually accessed, but if there isn't any ZFS redundancy, then ZFS can't correct (heal) that file and you will need to restore from backup." - Ok, this is a really good argument which points to an actual deficiency of my setup. Thank you. Now I completely understand what I'm missing.

On the other way, let's suppose I use my shelf with 12 4Tb HDD in a "pseudo JBOD" mode when MegaRAID provides these to ZFS as a set of "single-disk raid0 pseudo-arrays". Now I get back the self-healing ability of ZFS, but AFAIK with this setup replacing of a failed HDD becomes somewhat tricky and needs some nontrivial actions to be performed. Hmm now maybe this is a better option then because advantages overlap the disadvantages. I'll read a bit more about the details of this setup before I start messing around with setting it up later today.

(And if going this way, 33 of these older, worn out 1Tb drives probably worth conversion into a raidz3 pool?)

Very fast cached I/O of MegaRAID will still work for me this way, though redundancy will go up a stack, to ZFS... hmmmmm...

Andrii Stesin · Aug 22, 2016

Dear colleagues, please take my sincere "thank you really much for your help" and my apologies for bringing the old topic up once again and taking your time for this. My final decision, after reading yet more discussions here around: I'll follow you advice and I'll go the "pseudo-JBOD" way, despite its obvious risks and complexities.

The worst case seems to be "you remove a drive and then reboot" and the mrsas (4) will mess up device order after reboot and ZFS gets completely confused...

Andrii Stesin · Aug 22, 2016

Spearfoot said:
There's a good chance you could sell your LSI 9271 cards, purchase suitable replacement HBA cards such as the LSI 9211, and actually have some cash left over! :)

Pity but they are not a property of mine so this is not an option. For me, I'll never purchase even a single one for myself spending my own money. This decision was made back in 2014 and obviously nobody asked my opinion because I wasn't working for this company at the time.

Andrii Stesin · Aug 23, 2016

BTW, can anyone tell me, which version of mrsas (4) comes prepackaged inside FreeNAS 9.10 installation ISO as of today? Do I need to replace it with a MR_FreeBSD_DRIVER_MRSAS_6.11-06.711.00.00 as of 03/21/2016 from Avago download page? (I will upgrade controller's firmware to the latest Firmware Package: 23.34.0-0017 (MR 5.14 Point Release) as of 07/20/16 anyway).

depasseg · Aug 23, 2016

Andrii Stesin said:
BTW, can anyone tell me, which version of mrsas (4) comes prepackaged inside FreeNAS 9.10 installation ISO as of today? Do I need to replace it with a MR_FreeBSD_DRIVER_MRSAS_6.11-06.711.00.00 as of 03/21/2016 from Avago download page? (I will upgrade controller's firmware to the latest Firmware Package: 23.34.0-0017 (MR 5.14 Point Release) as of 07/20/16 anyway).

Sure go ahead, I mean you are going against the number 1 "don't do this" recommendation, feel free to ignore number 2 "don't change or install anything in the base appliance - use a jail instead" advice.

Andrii Stesin said:
I'll follow you advice and I'll go the "pseudo-JBOD" way, despite its obvious risks and complexities.

This wasn't advice from this forum. psuedo-JBOD is arguably a worse idea than just using a full blown HW RAID array as a single device to ZFS.

Andrii Stesin said:
mrsas (4) will mess up device order after reboot and ZFS gets completely confused...

ZFS doesn't get confused by different device names since the labels are on each drive it figures out what is where.

HoneyBadger · Aug 23, 2016

Andrii Stesin said:
My final decision, after reading yet more discussions here around: I'll follow you advice and I'll go the "pseudo-JBOD" way, despite its obvious risks and complexities.

SweetAndLow · Aug 23, 2016

Tell your employer they are can't use zfs with this setup because it's reckless and will destroy their data one day. Otherwise you will be looking for a new employer.

Sent from my Nexus 5X using Tapatalk

Andrii Stesin · Aug 24, 2016

depasseg said:
Sure go ahead, I mean you are going against the number 1 "don't do this" recommendation, feel free to ignore number 2 "don't change or install anything in the base appliance - use a jail instead" advice.

Aghhh Ok, back in 1993 I already choose my way against the mainstream (which recommended SCO UNIX for production on Intel platform, and IBM AIX was a platform of choice) and installed FreeBSD 1.0-EPSILON. Still I do not regret that decision.

This wasn't advice from this forum.

Thank you, I'm not going to blame anyone for decisions of mine.

pseudo-JBOD is arguably a worse idea than just using a full blown HW RAID array as a single device to ZFS.

That was my initial question, in case you recall this. Nobody shared an opinion, for my situation when I'm forced to use HW RAID, if not ZFS, what will be better than? BTW the site calomel.org suggests that LSI MegaRAID in "pseudo-JBOD" mode provides a very decent performance, though latency is somewhat high, which does not matter for my exact case (I don't want to store database tables on this storage).

ZFS doesn't get confused by different device names since the labels are on each drive it figures out what is where.

Thank you. I somehow missed this, my fault, now I got the point. Your comment was extremely helpful.

Andrii Stesin · Aug 24, 2016

SweetAndLow said:
Tell your employer they are can't use zfs with this setup because it's reckless and will destroy their data one day. Otherwise you will be looking for a new employer.

Thank you, your joke just made my day.

HoneyBadger · Aug 24, 2016

Andrii Stesin said:
Nobody shared an opinion, for my situation when I'm forced to use HW RAID, if not ZFS, what will be better than?

If you absolutely must do this, create virtual drives on the RAID controller and add them as iSCSI device extents within FreeNAS.

Andrii Stesin · Aug 24, 2016

HoneyBadger said:
If you absolutely must do this, create virtual drives on the RAID controller and add them as iSCSI device extents within FreeNAS.

Yes, I absolutely must do this. And Ok, thank you for the clever suggestion. This is a good fallback path in case tests will show that my initial approach does not work well.

SweetAndLow · Aug 24, 2016

Andrii Stesin said:
Yes, I absolutely must do this. And Ok, thank you for the clever suggestion. This is a good fallback path in case tests will show that my initial approach does not work well.

Wow what had happened to this place. Next thing you know people will be using vmdks running on fake raid.

Sent from my Nexus 5X using Tapatalk

mattbbpl · Aug 24, 2016

SweetAndLow said:
Wow what had happened to this place. Next thing you know people will be using vmdks running on fake raid.

Sent from my Nexus 5X using Tapatalk

You guys led him to water. He's weighed the risks and benefits and is choosing to proceed. Nothing wrong with that from our perspective as he has the necessary information and will have to live with the fallout whether it be positive or negative.

The most important step from here for this community is to make it clear to future readers that this is not recommended and is likely to end in disaster down the road. I believe you have all done that.

You can probably all dust off your hands at this point. You've done what you can do.

HoneyBadger · Aug 24, 2016

Andrii Stesin said:
Yes, I absolutely must do this. And Ok, thank you for the clever suggestion. This is a good fallback path in case tests will show that my initial approach does not work well.

I would suggest doing this as your primary course of action, as it's much safer.

But humour us all. Your original post says that you have "four or five" of these systems.

Buy one LSI HBA, like a 9211-8i. Install it in one machine and configure ZFS properly as we've suggested in this thread.

It will in all likelihood be faster than your RAID card solution.

Andrii Stesin · Mar 11, 2017

Following the story. Everything was working fine, I'd even say - perfectly. Until... (horror story begins here).

Guys decided to move the whole setup to the new location. Just physically. They stopped FreeNAS, powered it off, moved to the new place, mounted it into the new rack. Ok so far. They switched the power on.

They made a fatal mistake. The whole setup was Supermicro with MegaRAID controller and 1 array in the main case and the Very Big Array in the additional shelf, connected via SAS cable. They missed the order and powered the main case (where FreeNAS and MegaRAID both live) *before* the additional shelf.

Result: MegaRAID (being switched on by power *before* the disk shelf) just plain lost its configuration. This is whole not in any way a problem neither of FreeNAS nor of ZFS. Just plain problem of LSI controller and SAS expander in the shelf, which took a cretinous pose: right now MegaRAID sees the drives in the shelf as "Unconfigured Good" but does not recognize the "Foreign Configuration" on these so there is no way to import these back.

How happy I am that I don't work for them since November 2016. Anyway, tomorrow early morning I'll go there (they asked me to help) to fight the disaster. Avoid being employed by morons, ladies, and gentlemen. As for the technical side, use HBAs, this will keep you sane.

And yes, at the moment of my question I recognized all the risks. I did not have a choice. That's life, plain real life. S**t happens, occasionally.

Andrii Stesin · Mar 11, 2017

SweetAndLow said:
Wow what had happened to this place. Next thing you know people will be using vmdks running on fake raid.

You thought you were kidding? Hmmm. Right yesterday I've seen the following configuration.

Storage is HP 3PAR. On this, there are 8 vmdk's (on 3PAR's low-speed part with big but slow mechanical drives). They are given through FC connection to FreeNAS, which lives inside the VMware VM and assembles these into the RAIDZ volume. And it serves other VMs inside this very HA cluster via NFS.

Say "now I have seen everything".

darkwarrior · Mar 13, 2017

Andrii Stesin said:
Following the story. Everything was working fine, I'd even say - perfectly. Until... (horror story begins here).

Guys decided to move the whole setup to the new location. Just physically. They stopped FreeNAS, powered it off, moved to the new place, mounted it into the new rack. Ok so far. They switched the power on.

They made a fatal mistake. The whole setup was Supermicro with MegaRAID controller and 1 array in the main case and the Very Big Array in the additional shelf, connected via SAS cable. They missed the order and powered the main case (where FreeNAS and MegaRAID both live) *before* the additional shelf.

Result: MegaRAID (being switched on by power *before* the disk shelf) just plain lost its configuration. This is whole not in any way a problem neither of FreeNAS nor of ZFS. Just plain problem of LSI controller and SAS expander in the shelf, which took a cretinous pose: right now MegaRAID sees the drives in the shelf as "Unconfigured Good" but does not recognize the "Foreign Configuration" on these so there is no way to import these back.

How happy I am that I don't work for them since November 2016. Anyway, tomorrow early morning I'll go there (they asked me to help) to fight the disaster. Avoid being employed by morons, ladies, and gentlemen. As for the technical side, use HBAs, this will keep you sane.

And yes, at the moment of my question I recognized all the risks. I did not have a choice. That's life, plain real life. S**t happens, occasionally.

Yeah, shit happens as we could see ...

That's the final feedback everybody was expecting ... if it's not getting screwed up by the HW RAID, it will get messed up by human error ...

I will, on purpose, not enter into the debate of whom was a moron, when and for which reason ... ;)

Andrii Stesin · Mar 13, 2017

darkwarrior said:
Yeah, crap happens as we could see ...
That's the final feedback everybody was expecting ... if it's not getting screwed up by the HW RAID, it will get messed up by human error ...

I will, on purpose, not enter into the debate of whom was a moron, when and for which reason ... ;)

Let me remind you of Scott Adams' Dilbert's Principle. Namely,
Everyone is an idiot, not just the people with low SAT scores. The only differences among us is that we're idiots about different things at different times. No matter how smart you are, you spend much of your day being an idiot.
Scott Adams, The Dilbert Principle

I wish I will observe (laugh at) you be you at my position back in summer, ha-ha. BTW, the Very Big Volume is Ok, ZFS and FreeNAS tolerated the morons, I restored it easily. The whole setup with FreeNAS/ZFS over a bunch of "one disk per logical RAID0 volume exported by LSI MegaRAID" is by far not the worse thing I have seen in my long life. I won't recommend this kind of setup, but is is a viable option in absence of better ones.

Important Announcement for the TrueNAS Community.

[RFC] Planning install on LSI MegaRAID SAS 9271-8i (no HBA mode)

actually does care

He of the long foot

Dabbler

Dabbler

Dabbler

Dabbler

FreeNAS Replicant

actually does care

Sweet'NASty

Dabbler

Dabbler

actually does care

Dabbler

Sweet'NASty

Patron

actually does care

Dabbler

Dabbler

Patron

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "[RFC] Planning install on LSI MegaRAID SAS 9271-8i (no HBA mode)"

Similar threads