NFS Writes over Load Balanced aggr cause ZFS wait/lock/deadlock

Status
Not open for further replies.

hcamacho

Cadet
Joined
Apr 30, 2013
Messages
2
Over the weekend I upgraded to 8.3.1-P2 from 8.3.1-release. After the upgrade I added a load balanced aggr (2 gigabit ethernet going to a cisco 3550) and then started to perform Storage VMotions from VMware. After a few minutes of high IO the ZFS file system would stop serving data both over the network and locally. If I cd /mnt/zfsstore a subsquent ls -la hung.

zpool status showed state ONLINE for all devices:

[root@freenas] /mnt/prod# zpool status
pool: prod
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
prod ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/e5e94455-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0
gptid/e68d3895-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0
gptid/e71dba22-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0
gptid/e7b8261b-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0
gptid/e848a5b7-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
gptid/e89063ae-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0
gptid/e8c4fb8a-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0
cache
gptid/e8fbeb72-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0

errors: No known data errors

I could only recover by rebooting the system. The system came up without errors.

I thought perhaps VMware was doing something funny, so I mounted a file system using my MAC and was able to replicate the problem using several instances of iozone -A

I wondered if this was just network or local, I performed a test using serveral iozone instances from the FreeNAS shell. 160+MB/s for 10 hours and the problem did not appear.

Added the MAC NFS to it and shortly thereafter the problem returned.

I shut one of the legs to the load balance aggr, and I have not been able to replicate the failure after having many SVMotions, MACs with NFS iozones running.

Since I have a setup that can be used to replicate the problem, if a developer would like to ask me questions I'd be happy to assist trying to figure out if this is a bug in FreeNAS or in FreeBSD or if it is something else entirely.

System Configuration:
5 1T drives SATA
2 Samsung 840 128G (Mirrored ZIL)
1 Samsung 840 128G (L2ARC)
8G Memory
8G Flash Boot USB
2x AMD Opteron 250 2.4Hz
Supermicro H8DAE
LSI SATA 300-8X
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Your info is lacking specifics to identify the problem(you are asking for help in this area) but my first guess is that it has to do with ESXi.

This is one of the long list of reasons why ZFS + ESXi = major problems. Even problems that would either correct themselves or be minor for a FreeNAS machine without virtualization suddenly become major show stoppers with virtualization. Even small things like a single bad sector for a zpool are minor on bare metal, but add in virtualization and you can have a zpool that randomly freezes indefinitely because of the virtualization layer.

If you search the forums the ESXi + FreeNAS + ZFS = bad joo joo has been discussed to death.

Maybe the ESXi geniuses in the forum will have some good ideas....I know that link aggregation doesn't work how 99% of people think it works, and it could be that you are just asking too much from ESXi and LA. I don't know.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
see bug 1531 and also discussions of nfs sync writes on forum. your pool is likely just too slow/unresponsive under heavy load.
 

hcamacho

Cadet
Joined
Apr 30, 2013
Messages
2
see bug 1531 and also discussions of nfs sync writes on forum. your pool is likely just too slow/unresponsive under heavy load.

That is very interesting. I do have mirror'd Zil running static drives so I am not sure why my pool would be slow for writes. The ZIL never got over 1G used. With one leg of the AGGR shut I have not had 1 ounce of problems and I've been hitting the thing pretty hard.

Additionally I think the bug in 1531 talked about delays and performance, I actually get the ZFS file system to stop all operations, and it stays that way until I reboot the system.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The ZIL has very little to do with write throughput; "dd if=/dev/zero of=/mnt/pool/file bs=1048576" basically won't touch your ZIL device but will be a firehose for pool writes. If you can catch the system getting hung up in the kernel in txg flush (see 1531 and use of control-T while dd'ing) for even short periods, ZFS may be overestimating your pool's write throughput.

See, the solution to that is to fix ZFS's estimation of your pool write throughput, so if things work at a slower speed (single gigE) but fail at a higher speed (dual gigE) then that looks like it could be an accidental version of my fix.

Locking up entirely? That sounds more like maybe hardware/interrupt/etc problems. So what I suggest you do is to maybe log on the console and do a dd locally while also pegging a single gigE and see what happens. If that works, shut down that interface and then try the same experiment with the other. If that works, then stop all pool I/O, and then try some netperf instances to see if maybe there's a networking IRQ conflict. And so on.
 

pbucher

Contributor
Joined
Oct 15, 2012
Messages
180
On the dd test, if you set sync to always on the zfs volume you are testing then dd will hit your ZIL for testing purposes.

I have seen NFS get hung in some kind of deadlock when hitting with Oracle Database's directNFS client, so it's possible that load balancing might trigger the same bug I've run into. I haven't taken the time to go back and try to track down what exactly hangs it, not to mention it takes some load on Oracle to trigger it in the first place. In the mean time I've left Oracle use the host OS(RHEL 6) NFS mounted drive without issue with many months of up time now.
 
Status
Not open for further replies.
Top