NFS+Interface dies under load?

GeoffK · Jun 15, 2015

We have an issue that seemingly when VMware removes a snapshot of vCenter, our NFS shares die till we reboot FreeNAS.

Additionally - whilst our general management interface (which includes CIFS shares) is fine and happy - I can't ping anything on the NFS/iSCSI Interface - and the 2 esxi hosts can't ping FreeNAS. Also the snapshot removal is being triggered by CA Arcserve UDP 5u4

FreeNAS Hardware;
Intel Xeon e3-1271v3
Current Supermicro Board (organising exact model...)
32GB ECC Ram
2x80GB Intel S3500 Mirror for FreeNAS Install
1x 240GB S3500 L2ARC
1x 200GB S3700 repartitioned to 16GB for SLOG
21x 4TB NL-SAS in 3 Way Mirrors
LSI 9211-8i FW 16 HBA
Intel X540-T2 10GbE LAN

2x ESXi Hosts
1x Intel E5-2630v3 2.4ghz
Current Supermicro Board
64GB ECC Ram
Intel X540-T2 10GbE LAN

VM Wise we are tiny;
adfs @ 15GB
exch @ 528GB (iSCSI mount/ZVOL for Exchange DB - actual DB usage is approx 240GB)
sql @ 620GB (iSCSI mount/ZVOL for SQL DB - actual DB usage is approx 100GB)
rds @ 518GB
vCenter @ 16GB

Serving roughly 32 users with approx 50% of them being RDS Users.

There is also about 2.2TB of CIFS Shares, including Folder Redirections.

Failing Backups are happening outside of business hours in a staggered formation (vCenter is hit at 9pm, RDS/Exch hit at 9:30, SQL/ADFS hit at 10pm)

I can't be certain this isn't a VMWare issue yet (currently VMWare 6, current build/patch) - but we're covering all bases - and it certainly feels similar to this thread - https://forums.freenas.org/index.php?threads/nfs-dies-under-load.14346/

We're running current version of FreeNAS (FreeNAS-9.3-STABLE-201506042008), nothing shows up in /var/log/messages wrt to NFS or anything when it all falls over - the only thing we get is;

Code:

Jun 16 08:06:33 storage WARNING: 10.1.5.11 (iqn.1998-01.com.vmware:vm1-2658a790): no ping reply (NOP-Out) after 5 seconds; dropping connection
Jun 16 08:06:34 storage WARNING: 10.1.5.12 (iqn.1998-01.com.vmware:vm02-3bec3b2c): no ping reply (NOP-Out) after 5 seconds; dropping connection

Which is presumably iSCSI taking a shit.

GeoffK · Jun 15, 2015

Further reading on this suggests that ESXi 5 onwards (including 6) has issues with NFS (they specifically name Netapp NFS filers - but there is reports on all NFS servers, citing possibly an issue with the NFS client in ESXi) - surrounding MaxQueueDepth being set to 4294967295 by default.

With some people citing relief by bringing this down to 64 (or enabling SIOC if you have Enterprise Plus licensing which brings it to 256).

I'll be implementing this tonight to see what happens.

Also will be reverting back to NFS v3

cyberjock · Jun 16, 2015

Good build and thanks for providing all of the information you did in the initial post. 32GB of RAM is way too small for an L2ARC of that size. Even 120GB of L2ARC is pushing the recommended limits (recommendation is L2ARC shouldn't exceed 5x ARC, and your ARC is probably around 20-25GB). So you may be choking your ARC with L2ARC index. The only options are to downsize the L2ARC or upsize the RAM. Obviously upsizing the RAM is going to come at large pricetag since that chipset can't do registered DIMMs.

This is a common problem for people with insufficient RAM. Because VMs in particular *love* L2ARC, the logical conclusion is that you need more RAM so you can have a larger L2ARC. Even without an L2ARC, the need for more than 32GB of RAM is even more pronounced.

iSCSI definitely is better than NFS for performance reasons with ESXi. NFS is nice as it is super easy to copy data to/from the NFS share from another system via NFS. But the convenience comes at a cost. If you don't have a need for that convenience I'd recommend going to iSCSI and use zvol based extents.

GeoffK · Jun 16, 2015

So I've made the changes (as well as ESXi NFS tweaks based on Oracles Documentation) - a full backup has been and gone (as well as the snapshots associated with it). Load at the time of the fault was pretty insignificant (no scrub running, all the memory stats were stable, etc)

Performance wise - this setup is pretty small - and at best will see 1-2 VM's/Jails going forward in its life - with the biggest data growths being CIFS (Real Estate Agents love photos - and video) - but i will organise to resize the L2ARC to 80-100GB (as it stands, the L2 hit ratio is a lowly 1.87 vs ~63 for ARC)

Most of my ZFS experience is wrapped up in Nexenta - however annual support costs - as well as the per-TB licensing in general is straying us away. To give you some idea, its literally the cost of a populated 24 slot JBOD for 16TB (raw - given we run mirrors for VM usage, it stings doublely).

Nexenta however strongly recommends NFS over iSCSI in practice (then again, they assume you have a high performance SLOG - something that most of the FreeNAS community writes off as unnecessary in most use cases). Is it a case of the FreeNAS iSCSI stack being significantly better than NFS - or simply that with iSCSI you get away without a SLOG for longer? We've seen some pretty nasty instances with ZVOL's (particular when you destroy them) - not to mention you lose a ton of benefits in terms of management, access etc.

After speaking with our local Supermicro hardware vendor, we're looking to push towards FreeNAS both internally and for clients - but using Nexenta reference hardware (which obviously most vendors would be pretty happy with to support).

At some point i'd like to talk to iX Systems (I have a ticket in atm wrt this issue - just replied to Mark earlier) - we're not allergic to paying for support.

cyberjock · Jun 16, 2015

Mark is a good guy. He'll take care of you.

One thing that I didn't mention and forgot to edit and add is that if you are losing network connectivity it is possible that your network hardware is being overloaded and rebooting or otherwise having problems when its buffers overflow. This is particularly true of a few brands like Zyxel and Netgear.

On more than once occasion customers have sworn by models and/or brands that we (being me and other guys at ixsystems) knows are horribly unreliable. We've gotten customers to switch to different hardware and like magic the issue goes away.

So it could be your networking gear. Not saying it is, just something to consider.

GeoffK · Jun 16, 2015

Network gear (yes it is a XS708e :D) is something i'll keep an eye on, but we've vlanned it to storage/lan - so unless it was going to just bring down half the switch, I don't think that was the issue (we have since moved to a Dell N4000 switch internally - where we have ~14 10G links) in this case.

I’ve had some thought regarding the actual NetApp fault with regards to nfs.maxqueuedepth being set to insane levels (4.29 billion, up from 64 before they made that chage) in ESXi 5.x onwards.

NetApp ONTAP is built on a BSD code-base, and whilst it’s a stretch, their fix for an APD state was to stop the TCP Window Size from ever getting to 0 – but they stress there is other reasons to set the maxqueuedepth value to sane levels (64/32/16). My assumption (I’m not a developer by any stretch – just a curious sysadmin, nor did I have a ) is that this is what resulted in the Interface disappearing from existence (not only did NFS drop, iSCSI and even ping requests ceased) – remembering that our Management and CIFS interface was fine the whole time – and indeed I could interact with shares, web access, etc.

Any thoughts/comments on that?

cyberjock · Jun 16, 2015

Do you have NFS and iSCSI bound to one IP, but the management and CIFS happen to be on another, and when the switch went bonkers it took out one port but the other was left up?

Just hypothesizing. It sounds like you may never have solid evidence of what exactly went wrong though.

GeoffK · Jun 16, 2015

cyberjock said:
Do you have NFS and iSCSI bound to one IP, but the management and CIFS happen to be on another, and when the switch went bonkers it took out one port but the other was left up?

Just hypothesizing. It sounds like you may never have solid evidence of what exactly went wrong though.

Yep thats the setup. And I agree that i'll probably never know without a wireshark capture.

Robert Trevellyan · Jun 16, 2015

GeoffK said:
Nexenta however strongly recommends NFS over iSCSI in practice (then again, they assume you have a high performance SLOG - something that most of the FreeNAS community writes off as unnecessary in most use cases). Is it a case of the FreeNAS iSCSI stack being significantly better than NFS - or simply that with iSCSI you get away without a SLOG for longer?

I try to follow along with SLOG discussions, and here's how I understand it:

For some workloads, e.g. SAN hosting, you need sync write.
Sync write will hurt performance without a dedicated SLOG device.
A dedicated SLOG device comes with its own tradeoffs, e.g. you have to get one with power fail protection, and you need more RAM.

So, I think when people get steered away from installing a dedicated SLOG device, it's usually because they don't have a workload that requires sync write, and therefore there's no reason for them to deal with the associated tradeoffs.

Hopefully someone will correct anything I've screwed up here.

depasseg · Jun 18, 2015

Robert Trevellyan said:
A dedicated SLOG device comes with its own tradeoffs, e.g. you have to get one with power fail protection, and you need more RAM.

I didn't think more RAM was needed for a SLOG. (For an L2ARC, yes, but not the SLOG).

Robert Trevellyan · Jun 18, 2015

depasseg said:
I didn't think more RAM was needed for a SLOG. (For an L2ARC, yes, but not the SLOG).

You're probably right, I wasn't sure.

bmh.01 · Jun 18, 2015

Robert Trevellyan said:
I try to follow along with SLOG discussions, and here's how I understand it:

For some workloads, e.g. SAN hosting, you need sync write.

Sync write will hurt performance without a dedicated SLOG device.

A dedicated SLOG device comes with its own tradeoffs, e.g. you have to get one with power fail protection, and you need more RAM.

So, I think when people get steered away from installing a dedicated SLOG device, it's usually because they don't have a workload that requires sync write, and therefore there's no reason for them to deal with the associated tradeoffs.

Hopefully someone will correct anything I've screwed up here.

I wouldn't say trade offs, more cost for benefit ratio. And no SLOG devs don't eat ram like the l2arc index does.

jamiejunk · Jun 19, 2015

I know 9.2 was having problems with the Intel X540-T2 10GbE LAN cards. The traffic would eventually slow and then stop all together. I think they fixed that in 9.3 though.

GeoffK · Jun 19, 2015

Just touching base with this thread - A few NetApp guys (and ONTAP 8 is based on BSD...) have mentioned that they had some rather catastrophic failures without setting nfs.maxqueuedepth=64 on ESXi that brought down entire interaces, since setting this value, they've not seen the issue repeat.

Everything has been hunky dory since i set this; The complete list of values i set were as follows (compiled from the Oracle ZFS documentation - as well as the addition of maxqueuedepth);

Code:

Net.tcipipheapsize=32
Net.tcpipheapsize=512
Nfs.maxvolumes=256
Nfs.heartbeatmaxfailures=10
Nfs.heartbeatfrequency=12
Nfs.heartbeattimeout=5
Nfs.maxqueuedepth=64

All of these values are per-host on your ESXi deployment

Solaris/Illumnos doesn't seem to need this value set - as it would appear that they cap NFS internally - excerpt from my Nexenta tech;

echo ::svc_pool nfs | mdb –k

does this command work on freenas?

Should show something like this

echo ::svc_pool nfs | mdb -k

SVCPOOL = ffffff01ce5cd0d8 -> POOL ID = NFS(1)
Non detached threads = 0
Detached threads = 0
Max threads = 4096
`redline' = 1
Reserved threads = 0
Thread lock = mutex not held
Asleep threads = 0
Request lock = mutex not held
Pending requests = 0
Walking threads = 0
Max requests from xprt = 8
Stack size for svc_run = 0
Creator lock = mutex not held
No of Master xprt's = 2
rwlock for the mxprtlist= owner 0
master xprt list ptr = ffffff01ddf58b70

edit

jamiejunk said:
I know 9.2 was having problems with the Intel X540-T2 10GbE LAN cards. The traffic would eventually slow and then stop all together. I think they fixed that in 9.3 though.

I read this as well - but at the same time, I read that they updated the driver for Intel XGBE in 9.3

cyberjock · Jun 19, 2015

The Intel driver problem was fixed in some 9.2.1.x release. I want to say it was in 9.2.1.5 or so.

The driver was again updated in 9.3 though. I couldn't give specifics off the top of my head, but I'm sure there is a changelog somewhere for those that really care.

I'm unaware of any problems with Intel cards at this time, so I'd expect them to be very solid and reliable.

Important Announcement for the TrueNAS Community.

NFS+Interface dies under load?

GeoffK

Dabbler

GeoffK

Dabbler

cyberjock

Inactive Account

GeoffK

Dabbler

cyberjock

Inactive Account

GeoffK

Dabbler

cyberjock

Inactive Account

GeoffK

Dabbler

Robert Trevellyan

Pony Wrangler

depasseg

FreeNAS Replicant

Robert Trevellyan

Pony Wrangler

bmh.01

Explorer

jamiejunk

Contributor

GeoffK

Dabbler

cyberjock

Inactive Account

Similar threads

Important Announcement for the TrueNAS Community.

NFS+Interface dies under load?

Dabbler

Dabbler

Inactive Account

Dabbler

Inactive Account

Dabbler

Inactive Account

Dabbler

Pony Wrangler

FreeNAS Replicant

Pony Wrangler

Explorer

Contributor

Dabbler

Inactive Account

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "NFS+Interface dies under load?"

Similar threads