Shares (NFS / iSCSI) occasionally stopped working

Status
Not open for further replies.

ericj

Cadet
Joined
Apr 22, 2013
Messages
8
Hello,

i have a strange problem with my FreeNAS-8.3.0-RELEASE-p1-x64 (r12825):

I use my NAS as a host for some VMs (ESXi 5.0) via NFS and some iSCSI-drives under Windows. Now for the 4th time, in the middle of the night, the shares stopped working, so my VMs gone offline etc.

The webinterface is sill working and does not show any problem at all.

I tried to reboot the system via webinterface, but that doesn't work. So I have to switch it off with the powerbutton of the server. After that, the VMs are back online, iSCSI is also working.

Does anybody here have a hint for my where I start my troubleshooting?

thanks in advance
Eric
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'd look at the logs...

Doing ZFS with ESXis NFS has many issues. Search the forums if you want more info.
 

pbucher

Contributor
Joined
Oct 15, 2012
Messages
180
NFS is really the best way to go for ESXi and FreeNAS.

I know that's not universally agreed upon and you need to do some research into sync writes to your ZFS volumes vs async ones. But the basic truth is NFS by default does sync and iSCSI does async and async will always be faster. There have been many tweaks & hacks kicked around on the forums and the other places over time in how to force ESXi & FreeBSD NFS servers to do async. I can't really say and I've never seen anyone prove otherwise that you have any more risk exposure to doing async with iSCSI vs async with NFS. I can say though that iSCSI when forced to do sync works much worse then NFS doing sync.

Anyways I've been running NFS shares to my ESXi 5.0 & 5.1 boxes for over a year now with out problem. I did run into an issue where I was using the Oracle DirectNFS client and it would hang the NFS server used in 8.3.x(haven't tried it yet with the 9.x series - not that excited about causing a hang on a SAN box). It would typically require multiple kills of the NFS server until it would restart(not sure if the multiple kill attempts did the trick or it just required some time to expire). This sounds similar to what I was experiencing, the logs btw didn't tell me anything. It was basically causing some kind of deadlock deep in the NFS server and some day I'll get back and try to track it down. In the mean time using the linux NFS client for my oracle boxes has worked just great.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
My problem isn't with NFS, ESXi, or FreeNAS. it's that all 3 can make a nasty combination. While I firmly believe that with enough time, money, and effort you can make it work for you I don't expect most people to have that kind of time, money or effort. It seems that about 90% of FreeNAS users don't even want to spend the time to understand what they heck they are doing before they do it, then cry when their data miraculously disappears one day.
 

pbucher

Contributor
Joined
Oct 15, 2012
Messages
180
My problem isn't with NFS, ESXi, or FreeNAS. it's that all 3 can make a nasty combination.

But that's the thing, they make a great combo...I hear you about people shooting themselves in the foot and loosing their data. But things have changed in the last 12 months since the days of using vfs.zfs.zil_disable="1", which disabled sync writes across your entire system. I understand the concerns of folks using nfs and then using the wisdom of the net to do sync=disabled or something similar. But as per the author of the code(http://milek.blogspot.com/2010/05/zfs-synchronous-vs-asynchronous-io.html) there is no risk to your ZFS volume using the option. And I'd argue that ESXi is not better off using iSCSI which is doing async i/o vs NFS with sync=disabled. I wouldn't suggest using the option on a "production" server where you can afford some SSD drives, but for home use I'm just not seeing the risk to it. At the end of the day hardware can only flush disk writes to spinning rust so fast and any option(being a Linux NFS server, iSCSI, sync=disable, black magic, etc) that speeds things up is doing some kind of caching that if power is lost or the OS goes down hard you will loose data and possibly loose it in a bad way.

In short a home made ESXi by a noob is a disaster waiting to happen regardless if they are using FreeNAS or not. All you need is some good cheap consumer hard drives with aggressive cache tuning by the manufacturer to get good performance #s and no UPS and you've got a disaster waiting to happen.

Finally my personal guess is folks who cook their systems and see life is over after some kind of kernel dump are seeing so because of bad hardware. Because most kernel dumps happen from bad hardware(aka memory getting corrupted) and if you got a few GB of ram holding your writes(aka sync=disabled) you have a good chance that your ZFS data structures got cooked along the way by the same issue that eventually corrupted the kernel. So folks equate kernel crash=unmountable zfs pool.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
In short a home made ESXi by a noob is a disaster waiting to happen regardless if they are using FreeNAS or not.

Isn't that what I said, but differently? The experts don't stop and go to a forum and ask for help. They just do it because they know how to do it. They've been doing it for years. They might even think that this FreeNAS thing is too Windows-y for them and they'd rather have full blown FreeBSD to work with. The rest of us noobs though.. we're stuck with learning everything still.

More often than not, the people that show up in the forum read about all this newfangled technology and think that they are in heaven. When noobs read about all this new cool stuff like jumbo packets, iSCSI, multiple protocol support, ZFS support, RAID5-like redundancy without an expensive RAID card, etc they jump into all of this and think they're going to do it all in a weekend. The reality is that you can't jump into all this stuff in a weekend. They get some of this stuff setup and working. Use it for a few days/maybe even a few weeks. But eventually they start seeing things not work quite right. They start noticing that not all of their network devices like jumbo packets. They notice that using ZFS as a datastore for ESXi with NFS has random issues in the middle of the night they can't explain. They notice that iSCSI sometimes poops all over itself for no explicable reason at 4am when the server should be idle. The proverbial "ghosts in the machine" start causing havoc.

Next think you know they're in the forum crying because all this new technology they learned about last week is falling apart at the seams and the boss(and/or wife) is pissed because its not working.

None of us learned everything we know about computers in a weekend, and a cool new OS isn't going to make that happen either. It takes time to learn how this stuff works, when it should be used and when it shouldn't, how to make it work best for you, and what you shouldn't do thinking you'll save some money.
 

ericj

Cadet
Joined
Apr 22, 2013
Messages
8
Hey guys,

thank you for your words, but they dont really helped me :smile: Im quite new to the FreeNAS-thing, cuz im new in my company and my predecessor set this things up couple of month ago. So Im still learning and hope you can help me a little bit.

I added the logfile from the moment the NFS/iSCSI stopped working (ca 21:59), maybe you can have a look at it & tell me something.

thanks in advance
Eric
 

Attachments

  • Log.zip
    2.1 KB · Views: 222

pbucher

Contributor
Joined
Oct 15, 2012
Messages
180
Eric,

I looked at your log file and don't really see anything. I'd recommend upgrading to FreeNAS 9.1 because you will get much better support with that because that's where the developers attention is now. Also would you happen to be using a LSI controller card? If so what firmware level is it running?

CJ,
I agree with what your saying and wanting to try to keep the forums from clogging up with noob posts about fried ZFS pools. 1 thing to keep in mind though is that the average weekend IT warrior doesn't try setting up a ESXi with a SAN for their datastore that often, it's a unique subset that tries such a thing. I have seen a few noobs doing some really dumb things with the magic combo over the last years. But they are going to burn themselves regardless of what they use, it would be nice if they didn't clog up this forum though when the fire breaks out.

I guess my thing is I was not that long ago attracted to FreeNAS because of the "stuff like jumbo packets, iSCSI, multiple protocol support, ZFS support, RAID5-like redundancy",etc and migrated from using Linux SANs to FreeBSD ones because of FreeNAS. If I had searched the net and found posts 16 months ago saying don't use FreeNAS, ESXi, NFS I would probably still be using Linux SANs with expensive RAID cards.
 

ericj

Cadet
Joined
Apr 22, 2013
Messages
8
Hey pbucher,

you're right, its a LSI-controller, named SAS 9211-8i, can't tell you the firmware level this time.

i'll try to upgrade my freeNAS to the latest version, hope that helps me.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
ericj,

Keep in mind that with LSI controllers its important to keep the firmware on the controller and the driver in sync. FreeNAS uses v14 right now(although v16 is somewhat expected with the next FreeNAS release). You can find out your version by running the command "dmesg | grep mps" from the command line. Ideally you want both to be the same version, but you definitely don't want your firmware older than the driver. I'm using v16 firmware with the v14 FreeNAS driver without issues.
 

ericj

Cadet
Joined
Apr 22, 2013
Messages
8
just have a look:

mps0: <LSI SAS2008> port 0x4000-0x40ff mem 0xc27c0000-0xc27c3fff,0xc2780000-0xc27bffff irq 16 at device 0.0 on pci1
mps0: Firmware: 14.00.01.00, Driver: 13.00.00.00-fbsd
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
That basically means that if you upgrade to 9.1 you are still safe. It's quite possible that 9.1.1 will require you to flash your card though. Note that this isn't a certainty as 9.1.1 just hit a beta release yesterday and I haven't tested it. It could still be updated before we hit 9.1.1-RELEASE.
 

ericj

Cadet
Joined
Apr 22, 2013
Messages
8
allright, thank you.

i upgraded my backup freeNAS couple of minutes ago and everything runs fine...
 

pbucher

Contributor
Joined
Oct 15, 2012
Messages
180
Moving up to 9.1 will get your LSI driver in sync with your firmware and with some luck will flush out your NFS issue also. If I recall correctly from going through the source FN 9.1 has a completely new NFS server in it.

I'm not 100% certain that the FN developers didn't backport the FreeBSD NFS server from the FreeBSD 9.x for the FN8.x series, I used to hack up the NFS server at one point during the 8.2 days and I can't recall clearly if it was the FreeBSD 8.2 or 9.1 NFS server code in FreeNAS 8.2.

Eric, Let me know what you find. If you don't hit and hangs I'm going to try seeing if I can hang my NFS server again.
 
Status
Not open for further replies.
Top