Over the weekend I upgraded to 8.3.1-P2 from 8.3.1-release. After the upgrade I added a load balanced aggr (2 gigabit ethernet going to a cisco 3550) and then started to perform Storage VMotions from VMware. After a few minutes of high IO the ZFS file system would stop serving data both over the network and locally. If I cd /mnt/zfsstore a subsquent ls -la hung.
zpool status showed state ONLINE for all devices:
I could only recover by rebooting the system. The system came up without errors.
I thought perhaps VMware was doing something funny, so I mounted a file system using my MAC and was able to replicate the problem using several instances of iozone -A
I wondered if this was just network or local, I performed a test using serveral iozone instances from the FreeNAS shell. 160+MB/s for 10 hours and the problem did not appear.
Added the MAC NFS to it and shortly thereafter the problem returned.
I shut one of the legs to the load balance aggr, and I have not been able to replicate the failure after having many SVMotions, MACs with NFS iozones running.
Since I have a setup that can be used to replicate the problem, if a developer would like to ask me questions I'd be happy to assist trying to figure out if this is a bug in FreeNAS or in FreeBSD or if it is something else entirely.
System Configuration:
5 1T drives SATA
2 Samsung 840 128G (Mirrored ZIL)
1 Samsung 840 128G (L2ARC)
8G Memory
8G Flash Boot USB
2x AMD Opteron 250 2.4Hz
Supermicro H8DAE
LSI SATA 300-8X
zpool status showed state ONLINE for all devices:
[root@freenas] /mnt/prod# zpool status
pool: prod
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
prod ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/e5e94455-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0
gptid/e68d3895-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0
gptid/e71dba22-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0
gptid/e7b8261b-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0
gptid/e848a5b7-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
gptid/e89063ae-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0
gptid/e8c4fb8a-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0
cache
gptid/e8fbeb72-9f32-11e2-8748-000eb6285fb8 ONLINE 0 0 0
errors: No known data errors
I could only recover by rebooting the system. The system came up without errors.
I thought perhaps VMware was doing something funny, so I mounted a file system using my MAC and was able to replicate the problem using several instances of iozone -A
I wondered if this was just network or local, I performed a test using serveral iozone instances from the FreeNAS shell. 160+MB/s for 10 hours and the problem did not appear.
Added the MAC NFS to it and shortly thereafter the problem returned.
I shut one of the legs to the load balance aggr, and I have not been able to replicate the failure after having many SVMotions, MACs with NFS iozones running.
Since I have a setup that can be used to replicate the problem, if a developer would like to ask me questions I'd be happy to assist trying to figure out if this is a bug in FreeNAS or in FreeBSD or if it is something else entirely.
System Configuration:
5 1T drives SATA
2 Samsung 840 128G (Mirrored ZIL)
1 Samsung 840 128G (L2ARC)
8G Memory
8G Flash Boot USB
2x AMD Opteron 250 2.4Hz
Supermicro H8DAE
LSI SATA 300-8X