Areca 1280ML-24 IRQ storm and CCB time out

Status
Not open for further replies.

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
1. I did go back through the BIOS. I even did the CTRL+F1 to get extra settings to appear. I tried disabling all sorts of stuff. No dice. :(
2. I'm somewhat convinced that the issue isn't BIOS related since the Highpoint controller works perfectly. Aside from no SMART support since the highpoint cli isn't installed, that's my only gripe.
3. Since the VT-d works great on the Highpoint controller I'm not sure if the VT-d of the motherboard isn't capable or what...
4. I'm somewhat convinced that the Areca just can't play with VT-d correctly in non-Windows environments.
5. I totally think that this would be a way to solve the problem. Too bad I'm not able to play with it myself.
6. I think that's plausible too. Of course, proving the exact problem is almost certainly impossible

I'm not sure how much support I can expect from Areca. This card is almost certainly discontinued/EOL. The firmware was last updated in 2010 with the card first being manufactured in 2006 if not earlier.

This card has been amazing up until now. I had so many problems with Highpoint I abandoned their controller and bought the Areca. Later I found out that the problem wasn't entirely Highpoint's fault. Seagate had crappy firmware and the hard drive and RAID controller just didn't play well together. Buying the Areca meant the end of my problems. After having hard drives dropping out of the RAID all of the time with the Highpoint I replaced them with WD Green drives since I didn't feel like I could trust Seagate anymore.

I did install the ESXi drivers for my Areca card. I didn't expect it to do anything, but I figured I had nothing to lose. Needless to say, it didn't help.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
2) Controllers sometimes work in subtly different ways; getting something to work for one case with a small set of particulars is very different than getting something to work for the general case with most things. The two controllers may be exercising different things, one thing works, another, not quite. My guess would be that whatever development Gigabyte did was probably less comprehensive than what IBM/HP/Supermicro/etc do, so I'm not as willing to dismiss the possibility that it is the mainboard/BIOS.

3) It would suggest that at least the board ought to be capable, yes.

4) I'd be a lot happier if the Areca didn't play in a Windows VM. I hate this inconclusive stuff.

5) I stopped doing any significant UNIX driver development many years ago, and the hardware platforms have gotten crazy. I remember it was sometimes difficult back when things were relatively *simple*! So I at least appreciate the difficulty of the task.

We might want to look at what it'd take to get your Highpoint CLI up and running.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, I got no reply from Areca today. Not too worried about it since we exchanged 4 emails just yesterday.

One thing I've found very very interesting. In my temporary FreeNAS server I replaced a 24 port Rocketraid 3560 with an 8 port RocketRAID 4520. Since my test zpool has 10x3TB drives I had to plug 2 of the drives into the Intel onboard controller. I booted up the server and the 4520 has no drivers with 8.3.0. So I upgraded to the 8.3.1 nightly. Then decided I'd start doing a scrub before I trust it with my data for a few days while I do the migration. I did a scrub a few days ago and the scrub fluctuated around 200MB/sec the whole way. Right now the scrub of the exact same drives but with the new controller is achieving 606MB/sec!

The only difference I see is that the 4520 supports 6Gb SATA which the drives also support. But these drives aren't SSDs so they shouldn't be bottlenecked at all like that. This whole thing is just baffling and I have no explanation for the 3 fold increase in speed.

Edit: Now I'm up to 616MB/sec and the scrub is 5% complete! I have no clue what caused the amazing performance increase, but I can't wait for an RC!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Something that you might not be thinking about:

These RAID card manufacturers are making RAID cards, not HBA's. Their RAID chipsets are often inherently limited to performance levels far less than what the drives might be capable of punching through to the mainboard via a standard HBA. If the design of the card places the card processor in the data path, and as far as I've seen they mostly-to-all do, then you have an artificial choke point.

In the old days, we used to have to worry about throughput at multiple levels. You'd hook up a pile of Seagate Cheetah ST373405LW's (about 60MB/sec) in a 9-drive rackmount chassis on a shared ultra-wide 320 SCSI bus, so you had 540MB/sec worth of drives on a 320MB/sec bus, then you'd pile 8 of those into a rack and cable them up to a pair of Dell PERC3 quad channel controllers, which could only handle ultra 160, and four channels would be 640MB/sec but PCI 66/64 can only manage up to 533MB/sec, and even in a server like a PowerEdge 6450 where you had two separate PCI busses (one 33/64) you only ended up with 792MB/sec to cope with 1280MB/sec worth of I/O (and you still needed to stick a network card or two in that mess). But your CPU would become a bottleneck if you could move that much data, and you couldn't anyways, because the CPU on the stupid PERC3 was a crappy 100 MHz Intel i960, and the entire card couldn't move half what Dell implied, well anyways..

These days, with dedicated channels everywhere, it is a lot easier. But anytime you have a central choke point like a RAID controller chip, you're introducing the possibility of silicon and driver related performance caps. In this case, seems you lucked out and were hitting a driver related performance cap.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
These RAID card manufacturers are making RAID cards, not HBA's. Their RAID chipsets are often inherently limited to performance levels far less than what the drives might be capable of punching through to the mainboard via a standard HBA. If the design of the card places the card processor in the data path, and as far as I've seen they mostly-to-all do, then you have an artificial choke point.

In the old days, we used to have to worry about throughput at multiple levels. You'd hook up a pile of Seagate Cheetah ST373405LW's (about 60MB/sec) in a 9-drive rackmount chassis on a shared ultra-wide 320 SCSI bus, so you had 540MB/sec worth of drives on a 320MB/sec bus, then you'd pile 8 of those into a rack and cable them up to a pair of Dell PERC3 quad channel controllers, which could only handle ultra 160, and four channels would be 640MB/sec but PCI 66/64 can only manage up to 533MB/sec, and even in a server like a PowerEdge 6450 where you had two separate PCI busses (one 33/64) you only ended up with 792MB/sec to cope with 1280MB/sec worth of I/O (and you still needed to stick a network card or two in that mess). But your CPU would become a bottleneck if you could move that much data, and you couldn't anyways, because the CPU on the stupid PERC3 was a crappy 100 MHz Intel i960, and the entire card couldn't move half what Dell implied, well anyways..

These days, with dedicated channels everywhere, it is a lot easier. But anytime you have a central choke point like a RAID controller chip, you're introducing the possibility of silicon and driver related performance caps. In this case, seems you lucked out and were hitting a driver related performance cap.
So I put the old RAID controller back in. Sure enough, putting in the old controller tanked the zpool scrub performance. So my RAID controller seems to not be that great at HBA. It's interesting because I've gotten faster transfer rates with it in RAID mode than in non-RAID mode. Go figure! So basically my limitation is the RAID controller in non-RAID mode. I would have never expected the significant different in performance that I got. For me, it just means scrubs will take alot longer. Getting over 200MB/sec means I can pretty easily saturate the Gb NIC.

I played around with computers back in the "old days". That was some great times because you HAD to know your stuff to get good performance. It separated the men from the boys. Now anyone can show up and plug in a SATA port and have great performance without the knowledge. The only exception is if you are using top of the line SSDs and you actually have a use for >300MB/sec. I do a facepalm when I get involved in IT projects with small business that hired someone and pay them WAY too much to have zero knowledge except the ability to connect a sata cable to a sata port.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, guess what? I did some dd test with the 8.3.1 nightly with the Highpoint controller and all seemed fine for several 20GB tests. Then suddenly at 17.4GB on my last test FreeNAS froze. Ended up powering off the guest machine. Upon trying to power up the guest machine I got an error... IRQ storm! I tried rebooting ESXi as well as a cold boot of the machine. Now, no matter what I get the stupid IRQ storm message from FreeNAS. So, it looks like my ESXi experiment is coming to an end. I appear to have sufficient incompatibilities to no longer trust it with any data. I'll probably experiment some more on a test platform in a few weeks. I'd really like to see ESXi up and running :P

Because of how awesome PCI passthrough is there's no telling if I won't be able to get all of this figured out and go back to ESXi at some future date.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well, that result makes me a lot happier - with that result, I'd wager it's the mainboard/BIOS. It isn't quite right, and/or ESXi isn't coping with what it is handed quite right.

You see now why your friends recoil in horror at PCI passthrough. If it isn't just right, then there's the potential for lots of badness and frustration. I just see it as a technology that's hard to get right. In many ways it reminds me of ZFS and FreeNAS, making it work well is a matter of both the right hardware and some experience (or lots of research and experimentation).

Hopefully you'll be able to make some future set of hardware line up to do this. It's pretty cool when it works. Sucks when it doesn't.
 

Tysken

Dabbler
Joined
Oct 30, 2012
Messages
21
I just want to share my experience with esxi freenas and direct path I/O.
First i ran my server without esxi and everything worked great. But after installing esxi and running freenas as a vm i ran into some problems. The first was a time out massage at booting similar to yours, after some google searches i came across a tip. First removing my hba:s from the vm then adding these tunables:
hw.pci.enable_msix = 0
hw.pci.enable_msi = 0
After that the timeout massage was gone but i got the irq storm as you did. I got rid of it after disabling everything not needed in the vm bios. Then i wanted to add a usb controller to connect my ups to the freenas vm and the irq storm started again. Only using 1 cpu with 1 core made it work again but i wasn't so happy about only using 1 cpu.

After some more reading i found a thread here on the freenas forum saying it is enough to use only the hw.pci.enable_msix = 0 and leaving the hw.pci.enable_msi at its default =1.
I tried that and now i am running the freenas vm with 2 cpu´s both hba´s a dual port nic and usb controller without any problems.

Maybe you can give hw.pci.enable_msix = 0 a try with the areca card
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'm not toi familiar with ESXi but I don't think direct path I/O is the same as PCI passthrough(feel free to correct me if I'm wrong). PCI passthrough is the only recommended way to use ZFS with ESXi. Last time I used ESXi besides for this thread was 2.0 or something.
 

Tysken

Dabbler
Joined
Oct 30, 2012
Messages
21
I´m not sure about what the correct word is, but i was referring to pci passthrough.
In the vSphere client the tab where you can activate pci devices for passtrough is called direct path I/O so my guess is that it is the same. There you mark the devices you want to passthough and then you reboot the system, after that esxi cant access that device any more but you are able to add it in the hardware config of a vm.
That tab is only accessible if you have a vt-d capable processor and motherboard ore the amd equivalent.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
And now with a Supermicro X9SCM-F-O with BIOS 2.0b I can confirm that the Areca 1280ML-24 does NOT appear to work with VT-d technology. You get the exact same errors as discussed in this thread. Too bad :/
 

ian_stagib

Cadet
Joined
Dec 14, 2013
Messages
1
And now with a Supermicro X9SCM-F-O with BIOS 2.0b I can confirm that the Areca 1280ML-24 does NOT appear to work with VT-d technology. You get the exact same errors as discussed in this thread. Too bad :/


I'm sorry for necro-ing this post but I was looking for a solution to a problem I was having trying to get freenas working in my esxi environment and I just couldn't let this just be the last word when it's incorrect as it is. I have an Areca 1260ML 16 port and I've been using it with vt-d just fine in esxi 5 passing it through to omniOS for probably just over a year now. Now when trying to pass the card (and another hba) through to freenas it gives the ccb timeouts mentioned earlier. The motherboard I'm using is an Intel S5520HC and 2 x xeon L5520 cpu's. The 1280 is the same card essentially as mine, so the problem has to lie in how freenas is handling drivers / vt-d, cause if OmniOS can do it there's no reason freenas shouldn't if they tried to support it.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Your card is not like mine. Physically it looks similar. But my card is based on the IOP341 chip while the 1260 is based on the IOP333 chip. Apples to oranges. Your card is a slower version of mine. The reality of it is that both of our cards existed before VT-d technology was invented, so support should not be expected and should not be relied on. Part of the VT-d technology requires that the driver be fully compatible with VT-d. There should be no guarantee that a card based on the 333 chip can, should, or will work the same as a 341 and vice versa.

The fact that it runs on OmniOS doesn't do us any good here, so my statement that the card doesn't support VT-d technology is still accurate for this forum. Nobody is going to come to this forum for support on VT-d for OmniOS or any other OS for that matter. All we care about here is FreeNAS. This is after all, the FreeNAS forums.

As for the driver there's virtually no chance it will be updated further. The firmware versions available are from 2005 to 2010. The card is almost 10 years old and I wouldn't expect it to be supported for much longer. Especially when you can just buy an M1015 and have instant SAS 6Gb/sec and all the HBA benefits you want for FreeNAS.
 
Status
Not open for further replies.
Top