serious question about using LSI/Avago based hardware raid cards

daddysworkisneverdone · Feb 27, 2019

so new user on the forums. I've trolled through over the years and even toyed with freenas virtualized a few times, but thought I'd throw this out as a serious question for those with lots of experience with freenas. Not even sure if this is the right place to ask, so you won't hurt my feelings if you need to move this to another spot.

I know the ruling principle is "don't use hardware raid", but please hear me out first, and I'm hoping some of the smart folks can break down the idea and either destroy it entirely with explanation, or discuss as I think it may be a cheaper solution for us smaller scale users. And I’m stressing small scale. I realize hardware raid doesn’t scale, and that’s probably the biggest advantage of software defined storage, and yes I realize ZFS was the original, and likely still the best. But all the new players are following (some poorly) the same basic constructs. SDS doesn’t seem to scale down very well in either performance or price.

Currently running a used dell branded LSI raid card, an H700. 512MB cache, battery backed, breakout cables to 8x4TB sata drives. These cards (cheaper now), are $100 to $150 (move to newer H710 now at that price), cache is battery or capacitor (in higher end models) protected, battery is readily available and easy to replace. a low power core I3 with 16GB ecc is all the cpu and ram it needs. could actually get by with less ram since the hardware raid card does all the heavy lifting, but I run some VMs on it as well.

I know hardware raid can't "resilver" the data..and you'll have bitrot issues..but..for years now LSI has had an IMPI command tool to actually have the card do the same thing. LSI calls it a consistency check. This is not comparing a given drive sector’s data to its checksum, that's a patrol read and happens automatically. the consistency check actually compares full raid stripe data to the parity data across the raid stripe set, and will correct any errors found. This is not an automatic process, but must be called with the tool, and scheduled using cron or windows task scheduler. But it can be done and works very well.

The Dell cards seem to sell cheaper (don’t know why, maybe volume?), with their tool, the command is something like

omconfig storage adapter=0 virtualdisk=0 action=checkconsistency

Any LSI or rebrand of it supports this if it’s an actual raid card and not a plain HBA. Command will vary based on rebranding OEM name-even Intel sells them!

With that out of the way, I can get pretty much the same data integrity at a much lower price than needing a monster price Xeon and tons of ram.

There are some advantage too. I’m not limited to a single zdev worth of throughput, so an 8 drive (I know that’s not an ideal drive count) raid 6 gives me 6 drives worth of read IOPs, and 6 drives (minus raid 6 write penalty) worth of write IOPs. What would I need with ZFS or another SDS type? 6 Zdevs? Or have enough ZIL, ARC and L2ARC to survive the bursts? Not knocking it at all, just out of my price range to have that many drives, that much CPU, and that much ram.

What if-I run freenas and ZFS on top of that hardware raid card? Present a single virtual disk (disk device) to freenas, and create the ZFS equivalent of a raid 0? Let the LSI chip handle the raid (and there’s a BSD toolset to call consistency checks regularly to handle bitrot). Now I can run a lower cost core I3 and less ram.

Why bother with Freenas then? The point in time snapshotting is awesome, no other way to put it. MS isn’t quite there yet…but then I’d also need a windows license. Win10 would be ok for this type of small nas, but it’d only be a nas, no iSCSI target on desktop OS. Solarwinds iSCSI target software stinks IMO, so I don’t like that as a solution.

Linux might be ok, and price is right, but I still don’t know of anything that approaches ZFS snapshotting? Maybe there is and I’m not aware.

So my general thought here is I could eat my cake and have it too-get the great point-in-time snapshot function of ZFS and the small scale performance of hardware raid. Having a decent iSCSI target built in is a nice bonus, it’s nice to have for toying with ESX. NFS works too I know, and Freenas has that too.

Please help me see the flaw(s) in my idea and help me understand why it’s a no-go, or share any insight if you’ve seen it tried before

Bozon · Feb 27, 2019

http://bfy.tw/MWRR

The first link is an article from this website.

daddysworkisneverdone · Feb 27, 2019

funny, care to elaborate? have a good discussion on hardware and software defined storage meeting each other? for the cpu and ram cost overhead it seems tough to argue for ZFS on the small scale, and throwing this out as a thought

Chris Moore · Feb 27, 2019

daddysworkisneverdone said:
funny, care to elaborate? have a good discussion on hardware and software defined storage meeting each other? for the cpu and ram cost overhead it seems tough to argue for ZFS on the small scale, and throwing this out as a thought

If you want to use hardware RAID, go ahead, just don't do it with FreeNAS. FreeNAS exclusively uses ZFS. If you don't know what that means, you need to learn about ZFS because ZFS is a file system and volume manager in one that was designed to REPLACE hardware RAID meaning that they don't mix. This has been discussed so many times on the forum that we have a resource about it. Which is what you were linked to:
https://www.freenas.org/blog/freenas-worst-practices/
The discussion happens every week or two. The answer doesn't change.

Bozon · Feb 27, 2019

daddysworkisneverdone said:
funny, care to elaborate? have a good discussion on hardware and software defined storage meeting each other? for the cpu and ram cost overhead it seems tough to argue for ZFS on the small scale, and throwing this out as a thought

I know I am going to regret this, but the explanation is literally in the first section of the first article that comes up in that search. The author of that article is a storage expert who can express it a lot better than I can.

https://www.freenas.org/blog/freenas-worst-practices/

Chris Moore · Feb 27, 2019

daddysworkisneverdone said:
I know hardware raid can't "resilver" the data..and you'll have bitrot issues..but..for years now LSI has had an IMPI command tool to actually have the card do the same thing. LSI calls it a consistency check. This is not comparing a given drive sector’s data to its checksum, that's a patrol read and happens automatically. the consistency check actually compares full raid stripe data to the parity data across the raid stripe set, and will correct any errors found. This is not an automatic process, but must be called with the tool, and scheduled using cron or windows task scheduler. But it can be done and works very well.

By the way, I also manage Windows servers for work and we use hardware raid with those. It can be fine, but it isn't for FreeNAS.

Chris Moore · Feb 27, 2019

daddysworkisneverdone said:
Now I can run a lower cost core I3 and less ram.

You can already run FreeNAS on an Atom processor and 8GB of RAM, but that is as low as you can go.
If you want to do this, you need to present two volumes to FreeNAS and let ZFS mirror them so it has the capability to error correct if there is an data error on disk. Those patrol reads and underlying hardware RAID changes to the file system is the problem with hardware RAID because it changes things that ZFS is expecting to NEVER change. If they do change, ZFS sees it as a fault and wants to fix it. They can end up grinding against one another in the case of some hardware fault.

daddysworkisneverdone said:
Linux might be ok, and price is right, but I still don’t know of anything that approaches ZFS snapshotting?

You can install ZFS on Linux. Still, it is ZFS that doesn't play well with hardware RAID.

daddysworkisneverdone · Feb 27, 2019

thanks guys. not trying to argue here, but I think (esp Bozo) is missing the point of the question. @Bozo, are you the new Cyberjock? j/k.

If we could say roughly ZFS is the bastard love child of LVM and MDADM (or raidtool since we're talking BSD and you've been around that long); what if we gene edit out MDADM's contribution and keep the LVM type benefits of ZFS as well as point in time snapshots. Let a cheap and boring LSI chip handle the hardware layer. That garbage article from how many years ago holds no real technical merit. cache is battery or capacitor backed, or disable it if you desire. All LSI and all rebrands allow this. disable write cache and leave read cache enabled if you want

"The “one big disk” that hardware RAID cards provide limits some of ZFS’s advantages, and the read and write caches found on many hardware RAID cards are how risk gets introduced. ZFS works carefully to guarantee that every write it receives from the operating systems is on disk and checksummed before reporting success. This strategy relies on each disk reporting that data has been successfully written, but if the data is written to a hardware cache on the RAID card, ZFS is constantly misinformed of write success. This can work fine for some time but in the case of a power outage, catastrophic damage can be done to the ZFS “pool” if key metadata was lost in transit. Such failures have been known to carry five-figure price tags for data recovery services. Unlike hardware RAID, you will not suffer from data loss that can occur from interrupted writes or corrupt data returned from a hardware cache with ZFS. "

"Finally, most hardware RAID cards will mask the S.M.A.R.T. disk health status information that each disk provides. "

Smartclt is great for standalone drives, esp. on a desktop. I suppose on plain sas controllers it can be good too. LSI's tool set also allows log exporting, I can tell you a very well known software vendor wrote their own systems management tool that parses this log instead of relying on LSI's baked in IPMI logging. Drive monitoring is a mute point as well, you can do it with IPMI or with log parsing at any time interval you please

Break out the numbers and software defined storage does not scale down affordably. So that's the crux of the question. Open the discussion to consider different use cases and thinking of possibilities. Asking as I thought there might be some here with some depth, not simply repeating the gospel of Cyberjock. If it helps, I can PM you my creds, I'm far from a newb..

jgreco · Feb 27, 2019

A correction: The LSI consistency check is not a bitrot tool. It merely verifies that both copies of the data are identical. If it cannot, then it has no way to correct the error since there is no way to know which one is correct.

The biggest issues involve drivers, disks, and ZFS's need for redundancy.

The FreeBSD drivers for many RAID cards range from middlin' to pretty damn good. However, ZFS is extremely demanding and really isn't tolerant of any cock-ups, and at the same time ZFS is good at stressing controllers a lot harder (especially during a resilver or scrub). It's known to shake out problems with RAID controllers. Cyberjock actually learned this early on the hard way, if I recall correctly. You really need virtual perfection in device drivers. NAS is demanding. This is why random ethernet controllers don't work well, why random RAID controllers don't work well, why random PC-grade AHCI controllers aren't generally too reliable. We know that certain stuff works REALLY WELL. LSI HBA's - check. Intel ethernet's - check. Intel PCH SATA - check. Realtek ethernets - various shades of fail. RAID controllers - various shades of fail. This is both about driver quality and device quality. If it isn't 100.000%, it'll eventually turn into an issue. We saw a lot of this in the early days. Go read early threads in the forums for examples, especially the ones based on AMD APU's.

Hiding the disks behind a RAID controller creates a number of challenges, the biggest of which is the hiding of SMART data. There are also issues with write ordering, because it's not that hard to convince your RAID controller to engage write cache when it doesn't have the requisite hardware to cope with a power failure. ZFS is able to roll back transaction groups as long as transaction groups are coherently written intact to disk, but if you get random bitspam due to a stupid write cache, you may have trouble, and ZFS lacks any real tools to do a "fsck", so once damage to a pool happens, you're hosed.

And of course, ZFS needs access to the redundancy of the raw disks to be able to determine what the correct data is supposed to be. This is the heart of ZFS's ability to heal from bitrot. You cannot take access to redundancy away and expect that ZFS will function correctly under adversity.

If you don't like that ZFS is a combination of volume manager and file system, that's fine. Don't use it. The product is definitely heavyweight and aimed at a specific type of use.

If you don't think that ZFS scales down affordably, that's fine. It was never intended to. Go use UFS (FreeBSD) or EXT3 (Linux) which will happily run on a Raspberry Pi and serve files at a fraction of the cost.

I don't see anyone claiming that ZFS and FreeNAS is right for every application. It's really oriented towards larger storage servers.

daddysworkisneverdone said:
Break out the numbers and software defined storage does not scale down affordably. So that's the crux of the question. Open the discussion to consider different use cases and thinking of possibilities.

There's not a lot of point in "Open[ing] the discussion to consider different use cases". At the end of the day, it is very difficult to get a railroad locomotive like ZFS to ride down the street as though it were a truck, car, or bicycle.

Chris Moore · Feb 27, 2019

daddysworkisneverdone said:
That garbage article from how many years ago holds no real technical merit.

You need to understand that ZFS is doing nothing but getting more sophisticated and just because you want it to not be sophisticated, that doesn't change the facts. If you want to use ZFS, just use an HBA and give FreeNAS direct access to the disks. There are plenty of people that are using ZFS on low end systems, why do you think you must have a hardware RAID controller?

jgreco · Feb 27, 2019

daddysworkisneverdone said:
If we could say roughly ZFS is the bastard love child of LVM and MDADM (or raidtool since we're talking BSD and you've been around that long);

Ok so I'm game. As one of the oldest FreeBSD storage hackers... W...T...F.. is "raidtool"?

I'm familiar with ccd, grog's vinum (beta tester!), and all the new geom based stuff. I've got a copy of Sun's Online: Disk Suite around here somewhere, which I suppose dates me a bit.

what if we gene edit out MDADM's contribution and keep the LVM type benefits of ZFS as well as point in time snapshots.

Nice in theory.

Asking as I thought there might be some here with some depth, not simply repeating the gospel of Cyberjock. If it helps, I can PM you my creds, I'm far from a newb..

Well I wonder a little. If you're far from a newb, I hope you'd realize that one of the world's largest server manufacturers created ZFS during their prime, when they had lots of cash and lots of developers and could afford the foray into applied theory. Even so, their effort to develop ZFS was far from fully realized.

If you then look at the Linux efforts to create something vaguely similar, such as BTRFS, it should become clear that developing sophisticated storage systems is a wickedly difficult thing to do. The Linux folks got exasperated waiting for that and are jumping on the ZFS bandwagon (for awhile now), not because it's everything everyone wants, but because it's the closest fit we have for a massive scale filesystem that works TODAY.

You can ask for "discussion" to consider "different use cases" all you want, but no one seems to be willing to step forward and bankroll the development of featureful filesystems anymore. I'd like a filesystem that squirts rainbow sherbet out a unicorn's rump when I accidentally delete a file I didn't mean to. I'm probably just as likely to get that as we are to see someone make the extensive changes to modify ZFS in the manner you desire. It's just reality, sorry.

daddysworkisneverdone · Feb 27, 2019

jgreco said:
At the end of the day, it is very difficult to get a railroad locomotive like ZFS to ride down the street as though it were a truck, car, or bicycle.

fair enough. Although I'm pretty sure I've supported larger ZFS implementations on Solaris and later on "unbreakable" Linux than most have seen. From an engineering standpoint it's a thing of beauty, but a Bugatti makes a poor grocery getter.

10 years ago hardware (Ok, software stored in machine code on an asic) raid stunk in a lot of ways, that's not so much the case anymore. It can't scale to thousands of drives (yes, I'm working with large scale Storage Spaces deployments) like SDS can, which is their big advantage over 'hardware' raid.

But look at true SAN-level features, this forum seems way to focused on the NAS piece. Freenas on the surface seems to have so great SAN level features, and if it could be scaled down cost-wise that'd be great. I'm not expecting VAAI-level unmap, as even some 'true' san vendors can't get that working correctly. But thin provisioning-type behavior/dynamically expanding AND snapshotting? those are huge features to have. IF we could keep those benefits of ZFS and get the cost and performance to scale down it would be a killer product. Run a cheap windows system as the NAS head if you like, I'm looking at the backend SAN aspect

jgreco said:
There are also issues with write ordering, because it's not that hard to convince your RAID controller to engage write cache when it doesn't have the requisite hardware to cope with a power failure

modern LSI cards from any vendor have power backed cache. and the firmware auto disables write cache in the event of battery failure, so this issue is largely mitigated. not gone entirely, but largely mitigated

jgreco said:
ZFS is extremely demanding and really isn't tolerant of any cock-ups, and at the same time ZFS is good at stressing controllers a lot harder (especially during a resilver or scrub). It's known to shake out problems with RAID controllers

Do you know how many millions of these LSI cards are in use in banks, telecoms, defense, web hosts, etc? There have been a few hiccups for sure, but usually its version 1.0 issues. There was a huge issue with ESX 5.5 and 6 inbox drivers and 12Gb LSI raid and standard HBAs..driver issue when the 12G cards were first released.. But there have been plenty of QLogic and Emulex Fibre HBA issues also, and so many people think Fibre is the be all end all. Always resolved. simple answer don't deploy the latest of any OS at launch.
So unless we're saying FreeBSD lags that much in dev time, this largely shouldn't be an issue. I can tell you I know of at least 3 major SAN products on the market now based on BSD (although they run the more license restrictive NetBSD ).

jgreco said:
A correction: The LSI consistency check is not a bitrot tool. It merely verifies that both copies of the data are identical. If it cannot, then it has no way to correct the error since there is no way to know which one is correct.

I'll have to disagree with you there. Consistency check is a compare of the entire stripe against parity, and any element can be corrected so long as sufficient other elements are intact. I think you're thinking of a raid 1 mirror? 2 parts, compare, and if they differ who's to say which is correct?
In the case of raid 6 you have 2 parities. for any given stripe you can repair N-2 faults, and I've seen it on numerous occasions. It really does work well.
Another big downside though is the fixed raid stripe elements, they cannot be dynamically moved, so a true rebuild of a drive is required to restore full redundancy on drive failure. Dynamic pooling as an alternative is better here, especially with large slow drives, but this is well known and something that can be lived with if you keep expectations in check.

So, rather than repeat the gospel from years past, has anyone put any thought into it? Tested in a lab? If I had the hardware available long term in the lab I'd run a bit to see..but usually the bosses have me restricted to duplicating customer environments, not toying with things for fun.

daddysworkisneverdone · Feb 27, 2019

daddysworkisneverdone said:
for any given stripe you can repair N-3 faults, and I've seen it on numerous occasions.

that should read N-2. but any 2, doesn't mater if parity or data elements, or 1 of each

Chris Moore · Feb 27, 2019

daddysworkisneverdone said:
But look at true SAN-level features, this forum seems way to focused on the NAS piece.

We focus on the NAS part because that is what FreeNAS is designed to do. For larger implementations, like dual-head systems with greaterh fault tolerance, you need TrueNAS, but even then it is not a SAN.

daddysworkisneverdone said:
if it could be scaled down cost-wise that'd be great.

How much do you need to scale down. FreeNAS can work inside a system that costs less than $600, disks and all.

daddysworkisneverdone said:
Unless we're saying FreeBSD lags that much in dev time, this largely shouldn't be an issue. I can tell you I know of at least 3 major SAN products on the market now based on BSD

FreeNAS is an appliance based on FreeBSD but they cherry pick the parts they include. It is by no means a full implementation of FreeBSD with all features included.
Most people want to do the thing that is known to be reliable. What is the advantage of adding a hardware RAID controller into a FreeNAS system?

jgreco · Feb 27, 2019

daddysworkisneverdone said:
But look at true SAN-level features, this forum seems way to focused on the NAS piece.

That's because the forum consists mostly of hobbyists. Some of us do commercial work, but relatively few, and generally people doing SAN stuff end up with a FreeNAS box that just quietly works and no reason to spend work hours participating on a forum, or go to TrueNAS.

Freenas on the surface seems to have so great SAN level features, and if it could be scaled down cost-wise that'd be great.

But it can't. iSCSI actually causes FreeNAS to require more resources.

https://www.ixsystems.com/community...res-more-resources-for-the-same-result.28178/

I'm not expecting VAAI-level unmap, as even some 'true' san vendors can't get that working correctly. But thin provisioning-type behavior/dynamically expanding AND snapshotting? those are huge features to have. IF we could keep those benefits of ZFS and get the cost and performance to scale down it would be a killer product. Run a cheap windows system as the NAS head if you like, I'm looking at the backend SAN aspect

modern LSI cards from any vendor have power backed cache. and the firmware auto disables write cache in the event of battery failure, so this issue is largely mitigated. not gone entirely, but largely mitigated

There've been no new developments there in years.

Do you know how many millions of these LSI cards are in use in banks, telecoms, defense, web hosts, etc?

Yes, I do, having deployed at least hundreds, and often recommending them here for hypervisor deployments.

There have been a few hiccups for sure, but usually its version 1.0 issues. There was a huge issue with ESX 5.5 and 6 inbox drivers and 12Gb LSI raid and standard HBAs..driver issue when the 12G cards were first released.. But there have been plenty of QLogic and Emulex Fibre HBA issues also, and so many people think Fibre is the be all end all. Always resolved, don't deploy the latest of any OS at launch.
Unless we're saying FreeBSD lags that much in dev time, this largely shouldn't be an issue. I can tell you I know of at least 3 major SAN products on the market now based on BSD (although the run the more license restrictive NetBSD ).

FreeBSD relies largely on contributed or ported drivers. Where a manufacturer supplies a driver, that's probably about as good as it gets, but there are sometimes bugs. I believe Cyberjock may have a story of having teased some bugs out of his original RAID card he managed to get to work with FreeNAS, learning the hard way the things we had been telling him. When a driver is ported from another platform, it does not always fit cleanly into the FreeBSD CAM architectural format, and may result in issues. And when a driver has to be written from scratch, often the manufacturer doesn't provide documentation about the chipset (I'm looking at you Adaptec), and errata issues known to the manufacturer's driver team aren't known to the FreeBSD developer.

I'll have to disagree with you there. Consistency check is a compare of the entire stripe against parity, and any element can be corrected so long as sufficient other elements are intact. I think you're thinking of a raid 1 mirror? 2 parts, compare, and if they differ who's to say which is correct?

Or RAID5. Or RAID10. Or most of the other RAID levels. I suppose it might work for RAID6.

So, rather than repeat the gospel from years past, has anyone put any thought into it? Tested in a lab? If I had the hardware available long term in the lab I'd run a bit to see..but usually the bosses have restrict me to duplicating customer environments, not toying with things for fun.

Well of course it's been tested. It works, kinda, with caveats and issues, depending on the card and driver. The LSI's are about as good as it gets.

https://www.ixsystems.com/community/threads/confused-about-that-lsi-card-join-the-crowd.11901/

The biggest problem was that ZFS would totally swamp the thing with massive transaction groups, which led to some relatively perverse performance quirks. It took me a long time to really come to appreciate the value of the ZFS "conventional wisdom" which is why I've spent a little extra time trying to impart it upon you.

jgreco · Feb 27, 2019

Chris Moore said:
What is the advantage of adding a hardware RAID controller into a FreeNAS system?

Because he wants to cherry pick features. He wants snapshots but not the disk/volume management. Yay. Of course it's all tightly integrated, and you're never going to get what he wants without basically rewriting the whole thing as "ZFS-mini" or whatever, and no one has the interest in doing this. Fortunately I have some time tonite during this maintenance window just waiting for crap to happen, so it's mildly interesting to talk about.

Important Announcement for the TrueNAS Community.

serious question about using LSI/Avago based hardware raid cards

daddysworkisneverdone

Cadet

Bozon

Contributor

daddysworkisneverdone

Cadet

Chris Moore

Hall of Famer

Bozon

Contributor

Chris Moore

Hall of Famer

Chris Moore

Hall of Famer

daddysworkisneverdone

Cadet

jgreco

Resident Grinch

Chris Moore

Hall of Famer

jgreco

Resident Grinch

daddysworkisneverdone

Cadet

daddysworkisneverdone

Cadet

Chris Moore

Hall of Famer

jgreco

Resident Grinch

jgreco

Resident Grinch

Similar threads

Important Announcement for the TrueNAS Community.

serious question about using LSI/Avago based hardware raid cards

Cadet

Contributor

Cadet

Hall of Famer

Contributor

Hall of Famer

Hall of Famer

Cadet

Resident Grinch

Hall of Famer

Resident Grinch

Cadet

Cadet

Hall of Famer

Resident Grinch

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "serious question about using LSI/Avago based hardware raid cards"

Similar threads