Spontaneous reboots - FreeNAS-9.2.0-RELEASE-x64 (ab098f4)

Status
Not open for further replies.

H2O_Goalie

Dabbler
Joined
Jan 16, 2014
Messages
12
Some basic hardware information:
Dell 2950 chassis
Dual quad CPUs @ 2.66
32GB ECC RAM
Dell SAS 6iR card flashed to LSI IT firmware (data drives)
LSI IT HBA for SLOG and L2ARC
FreeNAS installed on Kingston DataTraveler 8GB USB
Drives - currently 6x750GB Seagate SATA (but have also tried 6x146 Seagate SAS)
Intel quad port PCI NIC

It strikes me as a legit box to use with FreeNAS. I'm experimenting with it as a storage repository for some ESXi hosts and I've been working with both NFS and iSCSI. I feel as though I'm getting the hang of it...iSCSI is performing well with some tuning help (I blew up NFS a few times but I think I know what to do now and will be revisiting that). **BUT...**

The damn thing is just spontaneously rebooting. And I mean with quite literally *nothing* going on...just sitting there. I have logs being sent out to a syslog server and there's nothing out of the ordinary being shown...one minute it's sitting there producing the occasional (looks like about every 5 minutes) pool health check messages and the next thing I know the switch has logged a drop on port g36 and I start seeing log output from the FreeNAS box that indicates it's coming back up.

Does anyone have any ideas or suggestions about where to start troubleshooting? I'm happy to get on it but I can't figure out where to start given the lack of clues via logs (or any activity that could be creating the problem).

TIA,
Matt
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The 2950 has been mentioned at least twice in the last week with random problems, none of which have had definitive solutions except "get a new server". Presumably the 2950 isn't really compatible with FreeBSD. If you check Dell's documentation the 2950 does NOT list FreeBSD as a supported OS.
 

H2O_Goalie

Dabbler
Joined
Jan 16, 2014
Messages
12
The 2950 has been mentioned at least twice in the last week with random problems, none of which have had definitive solutions except "get a new server". Presumably the 2950 isn't really compatible with FreeBSD. If you check Dell's documentation the 2950 does NOT list FreeBSD as a supported OS.
Well don't I feel like the prize ass. I actually searched the forums pretty extensively and Googled a log but never thought to search on "2950". It's just such a standard server that I didn't even consider an open-source OS like FreeBSD would be problematic. Thanks for the reply, I'll go take a look for those threads now.
 

Hexland

Contributor
Joined
Jan 17, 2012
Messages
110
Not very helpful replies... FreeNAS was working fine right up to 9.1.1.
The common element between all of these issues (apart from the fancy tat it's a 2950), is the LSI based controller. I have a cheap rosewill controller I'm going to try during the week
 

H2O_Goalie

Dabbler
Joined
Jan 16, 2014
Messages
12
Not very helpful replies... FreeNAS was working fine right up to 9.1.1.
The common element between all of these issues (apart from the fancy tat it's a 2950), is the LSI based controller. I have a cheap rosewill controller I'm going to try during the week

Do me a favor, after you try that will you post up back here and let me know how it goes?

It's disappointing that the 2950 is potentially not compatible, it's perfectly middle of the road/vanilla hardware and I went with LSI cards and the Intel NIC (disabling the onboard Broadcoms) based on what I read in the forums. I do understand that Dell doesn't list FreeBSD as a supported OS for the 2950...but how many hardware vendors specifically call out FreeBSD to begin with? I suspect that if we were to only utilize hardware that does we'd be down to a *very* narrow slice of vendors with little opportunity to repurpose older hardware.
 

H2O_Goalie

Dabbler
Joined
Jan 16, 2014
Messages
12
Working theory (testing now)...questionable/inflexible IRQ sharing is causing some instability. Looks like there are really limited options around the integrated disk controller, add-in LSI card and NICs (both integrated and PCI based) as far as IRQs being shared. I've pared way down on what's enabled and moved cards around in an attempt to find the least potentially conflicted arrangement.

Edit - good God I can't believe it's 2014 and I'm having to even consider this.
 

H2O_Goalie

Dabbler
Joined
Jan 16, 2014
Messages
12
OK...well, this seems to have resulted in a *much* happier server/OS. I've been beating the hell out of the install for 2+ hours with IOMeter testing and so far it's standing up to the onslaught...we'll see what happens when it sits idle for several hours.

FWIW here's what I can recall doing:
1. Disabled onboard NICs (using an Intel quad port PCI card instead).
2. Disabled onboard SATA (no, not the daughter board card...the actual on-mobo SATA that does exist. Not just disabling ports A and B but disabling onboard SATA).
3. Disable all USB except rear ports (yup, no more nicely hidden USB key...that internal USB port shares its IRQ with a bunch of stuff).
4. Arrange PCI cards (an additional LSI HBA and the Intel NIC) such that conflict is minimized...I'll crack open the case tomorrow and update as to which slots are occupied.

After making these changes I bounced the server and went back into BIOS. You can look at the IRQ assignments in there and you can make changes...but it appears that even where there are open IRQs for assignment there are just some things hard-wired to share (for example the daughterboard HBA and the PCI slot currently holding the LSI HBA). You have limited options, so it turns into a matter of figuring out what things are capable of sharing without too much conflict. I thought it seemed like a bad idea to have the NIC (which could be servicing a lot of inbound IO) sharing IRQs with the disk controller(s). We'll see if I made the right choices, it looks much better so far.
 

H2O_Goalie

Dabbler
Joined
Jan 16, 2014
Messages
12
I believe I narrowed the problem down to a matter of IRQ conflicts when devices sharing interrupts got busy. I was able to pare things down to a point that the system was stable at idle or under certain types of load. However the best I could do was get the integrated SAS 6 iR (handling the drive cage) and an add-in LSI PCI card (SLOG and L2ARC) sharing an IRQ (it was the LSI or a quad-port NIC...something was going to be sharing with the integrated SAS). The box performed OK when an async load (iSCSI in this case) was hitting it...the SLOG wasn't being used heavily/at all in those circumstances. But the second I tried NFS it began tipping over.

I suspect with the "right" combination of PCI cards this chassis could be made to work as bare metal...likely ditching the integrated SAS card and installing a PCI HBA that's capable of managing the drive cage (you'd need to connect to 8484 ports) as well as an external drive box. But outside of that it's probably always going to be a struggle. FWIW I threw ESXi on the metal and virtualized FreeNAS.
 

Hexland

Contributor
Joined
Jan 17, 2012
Messages
110
@H2O_Goalie...

> FWIW I threw ESXi on the metal and virtualized FreeNAS

You weren't using PCI Passthrough though? I have a 2950 (Gen III) with dual Xeon 5450's, and I had originally planned virtualizing FreeNAS and Linux on the same box -- but without the PCI passthrough, the ZFS disk performance was atrocious.
I just couldn't get the PCI Passthrough option to enable (I was using the customized Dell version of ESXi for the 2950)...

Just as a point of interest -- with FreeNAS being such a PITA right now, I decided to take a look at Ubuntu with a native ZFS implementation compiled into the kernel. This was never an option when I first looked at FreeNAS back in 8.2, so I was pleasantly surprised that not only did it work - but it was simple to install, fast and stable. I was getting about 50Mb/s better write speed to a ZFS RaidZ2 array (6 x 3Tb WD Green drives) than through FreeNAS (around 390Mb/s) -- and better yet -- copying a 1Tb file using 'nc' to the ZFS pool got a consistent 98Mb/s with no 'breathing' (something I've never been able to achieve with FreeNAS)

Dell 2950 Gen III Poweredge Server
2 x Quad Core Xeon 5450
32Gb ECC RAM
6 x 3Tb WD Green Drives
IBM M1015 SAS controller (IT Mode) -- running the 6xWD drives in the drive cage
Intel Dual NIC (Internal Broadcom Dual NIC is disabled in BIOS)
2 x Intel 80Gb SSD's (8Gb ZIL, mirrored across both + remaining space L2ARC)
1 x PCIe Rosewill 2-ch SATA controller (Sil3132 chipset) (running the 2 x Intel SSD's)
Dual redundant PSU's

Modified CPU + PSU fans (47ohm resistors in series, plus modified BIOS to lower fan thresholds)
 

H2O_Goalie

Dabbler
Joined
Jan 16, 2014
Messages
12
Are you talking Mb/sec or MB/sec? What are/were your FreeNAS vs. your Ubuntu numbers?
 

Hexland

Contributor
Joined
Jan 17, 2012
Messages
110
I'll re-measure them tonight and let you know exact numbers (I don't have access to my drive here at work) ...

I was going to try a full install of FreeBSD and maybe the latest openIndiana tonight, so I'll make a little spreadsheet and post it in the morning.

Did you get PCI passthrough working on ESXi on your 2950, or are you just working with virtualized storage?
 

H2O_Goalie

Dabbler
Joined
Jan 16, 2014
Messages
12
VT-x =/= VT-d, so just using virtualized storage. The 54xx series doesn't do VT-d (at least not the 54xx I have in the 2950) so no passthrough on this chassis. I also yanked the SAS 6iR and put a PERC 6i in so I could leverage the cache.

FWIW I quickly built up an Illumos-based device and I'm having pretty good results with that too.
 

Hexland

Contributor
Joined
Jan 17, 2012
Messages
110
OK, did some quick tests last night on the system...

All read/write tests to the ZFS RAIDZ2 pool were done using dd:
Code:
dd if=/dev/zero of=ZPOOL01/test.dat bs=2048k count=50k


All write tests from client (i7 iMac with 2Tb Fusion drive) were done using
Code:
time yes | nc -v -l 192.168.10.109 1111 < FreeNAS-9.1.1.img 



FreeBSD 9.2 (ashift=9)
——————
W: 744.2 seconds (144279522 bytes/sec)
R: 314.5 seconds (341377873 bytes/sec)


FreeBSD 9.2 (ashift=12 gnop -S 4096)
——————
W: 437.9 seconds (245168343 bytes/sec)
R: 352.1 seconds (304951263 bytes/sec)

nc: 2000000000 transferred in 27.331 seconds (73176978.52 bytes/sec) (69.78Mb/s)




FreeNAS 9.1.1
—————————
W: 437.4 seconds (245460370 bytes/sec)
R: 354.5 seconds (302887486 bytes/sec)

nc: 2000000000 transferred in 28.612 seconds (69900740.94 bytes/sec) (66.66Mb/s)


Open Indiana 151a8
—————————
W: 288.79 seconds (372MB/s)
R: 495.60 seconds (217MB/s)



Linux (Ubuntu 12.04LTS)
—————————
W: 281.277 seconds (382MB/s)
R: 359.095 seconds (299MB/s)

nc: 2000000000 transferred in 30.56 seconds (65427898.456 bytes/sec) (62.40Mb/s)


Linux (Ubuntu 13.10)
—————————
W: 296.71 seconds (362MB/s)
R: 278.72 seconds (385MB/s)


nc: 2000000000 transferred in 19.94 seconds (100300902.71 bytes/sec) (95.65Mb/s)
 

H2O_Goalie

Dabbler
Joined
Jan 16, 2014
Messages
12
Thanks for the data, it's appreciated.

You weren't kidding those native Linux throughput numbers are really solid. FWIW I quickly repeated your dd write test (dd if=/dev/zero of=Nexenta/test.dat bs=2048k count=50k) on my pool and it completed in 83.2 seconds for a rate of 1.3GB/sec. Running vMotions on/off the pool through an iSCSI mount I'm seeing 90+MB/sec writes and 110+MB/sec reads...considering overhead I'm close to saturating the 1Gb/sec link so I think I'm going to stick with this. A bit of tuning and I'll get the write numbers up over 100MB/sec I'm sure.
 

Hexland

Contributor
Joined
Jan 17, 2012
Messages
110
Holy crap... 1.3GB/s is an incredible write speed. I was quite proud of my 360MB/s until I read your post! Lol!
Is this a RaidZ2 configuration, or have you set your drives up as a mirror? Dedup and compression? Checksumming is on/off? What am I missing?
I know the PERC 6i has the 256Mb writeback cache on board, it would seem to make an incredible difference.

I currently run my drive cage from the IBM M1015 -- but the PERC 6i can be had for $20-$30 off eBay -- is the performance difference down to the controller with cache? Is it worth picking one up?
(I had a PERC 5 in the system which I removed, and replaced with the M1015 because of all the feedback on the FreeNAS forums about it being the best choice for a non-nonsense card with full support)
 

H2O_Goalie

Dabbler
Joined
Jan 16, 2014
Messages
12
I had been running an HBA roughly equivalent (SAS 6iR) in my box when I had FreeNAS on bare metal. When I moved to virtual I put the PERC 6i back in and configured a RAID-10 of 6 x 750GB MDL SATA (7200RPM) drives behind the PERC and gave that to ESXi as a big datastore. I also put an Intel 510 SSD on the mobo SATA connector. The box has 32GB of ECC RAM and a PCI Intel quad-port NIC.

In the hypervisor I gave ESXi (5.5) ~24GB of the SSD for host cache. Then created a single VM with 24GB RAM, a 40GB syspool drive, a 1.5TB VMDK for the ZFS pool (I'm letting the hw RAID handle my redundancy), a 4GB VMDK on the SSD for SLOG and a 100GB VMDK on the SSD for L2ARC. The NIC type is VMXNET3. Cache flush is disabled (between the NVRAM on the RAID card and the UPS the 2950 is on I feel OK about that) but sync=standard (for now). The numbers I gave you are with no extra tuning beyond what I've outlined here. Compression is on as is checksum, but no dedup. No DirectPath as the procs in the 2950 aren't VT-d.

I know that a lot of the direct benefits of ZFS come with allowing it to manage the disks but that just wasn't working for me with the 2950 (what I suspect are IRQ conflict issues). In my case with the "legit" hw RAID card and UPS I'm comfortable with having redundancy, etc. managed elsewhere...with the performance I'm seeing I feel like it's a reasonable tradeoff. Do be aware that my use case may be different that yours...nothing I'm doing is so crucial that I can't deal with something blowing up and having to rebuild from zero. This is really more of a learning experience for me.

PS - yes I do have the battery backup on the RAID card. And I used the Dell custom ESXi CD and applied all the latest VIBs.
 

Hexland

Contributor
Joined
Jan 17, 2012
Messages
110
Interesting... thanks for taking the time to type that up.

This is a learning experience for me too, so the hardware has been upgraded, chopped up, soldered and generally buggered around with so I can learn/play and *eventually* deploy.

Ultimately however, I still don't want to be tied to the Dell hardware, and I will eventually be hosting a lot of personally valuable and unrecoverable data (baby pictures, videos, documents, etc) -- so I'm opting for the software raid through ZFS-RaidZ2 for redundancy and resiliency. That way, I can just bung the disk array into some other generic hardware and re-import.

I'm a little disappointed that the latest FreeNAS doesn't work on the Dell -- I got the system because it was a) cheap, b) pretty vanilla, c) well supported (or so I thought... serves me right for not doing my homework ;) ). The response of "get a new server" is doubly disappointing, given that FreeNAS has been working just fine on these PE's for a while.
 

H2O_Goalie

Dabbler
Joined
Jan 16, 2014
Messages
12
No problem. FWIW I'd recommend (if you haven't already) that you read *a lot* and get a firm grasp on RAID levels and the performance limitations of each type. At the end of the day you'll always be somewhat limited by the underlying hardware and the "write penalty" of your RAID level...finding the right balance between redundancy, performance and cost often isn't easy. You know the saying...there are 3 options, pick 2. And SATA disks limit what you can do vs. SAS, etc. In a perfect world I'd have an external drive array with large mid-RPM SAS drives and just link it with an external SAS cable to whatever my server was at the moment, but I don't have that hardware at the present time (it's on the wish list). I think I've found my best option for the moment...when the ideal hardware comes along I'll snapshot it and migrate over.

I too am disappointed with the results from the 2950. My previous experience with open-source was that it'd pretty much run and run well on any hardware. The mystery reboots I reported at the top of the thread are worrisome to say the least. Ultimately though I think that's more a function of the IRQ sharing than it is the OS...with that external cabinet I could probably eliminate the sharing altogether by utilizing only the PCI NIC and a PCI HBA.
 
Joined
Jul 27, 2013
Messages
7
Some basic hardware information:
SNIP

TIA,
Matt

Hello H20!

I'm seeing something very similar on 9.2.0. Spontaneous reboots, also weird corrupted packets when trying to copy using ssh and rsync. I'm using an AMD board with 32GB ram and a 16-PORT Int, 6GB/S Sata+sas, Lsi Sas 9201-16I with 11 Seagate 2 TB NAS drives. Will try reverting to 8.3.1 today :( :(.

I got about 300GB over Rsync so far with no further issues after stepping back to FreeNAS-9.1.1-RELEASE-x64 (a752d35).
 
Status
Not open for further replies.
Top