more iSCSI woes ZVOL traffic not graphed, server reboots

Status
Not open for further replies.

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
Since finding out I should be using ZVOL based iSCSI LUNs for my VMs instead of file extents, I've been running into issues making the switch

For one, the FN server spontaneously reboots during heavy I/O. It may be a hardware issue on my end, but just curious if anyone else has run into the same problem??

Second is more confusing. Why isn't the RRD graph reporting iSCSI traffic when it's on a ZVOL, but does when it's file extent based LUNs?

Attached are the two graphs during my migration from a file extent LUN to ZVOL. You can see from the disk graph that the drives are still being written to, but the network graph shows ZERO TRAFFIC :confused:

Network traffic is network traffic, no? Why should it matter if I'm accessing a block device versus a file???

Screen Shot 2015-07-21 at 6.28.36 PM.png


Screen Shot 2015-07-21 at 6.28.53 PM.png
 
Last edited:

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
Hardware Specs:
DELL Precision R5500
32GB RAM
2 x Intel Xeon E5645 @ 2.40GHz
Intel Pro/1000 8251EB Quad NIC
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
What you see is probably result of VMware VAAI XCOPY offload supported by FreeNAS. vMotion and Clone operations within the same FreeNAS box consume almost no network bandwidth.
 

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
@mav@ thanks. I did confirm that indeed VAAI XCOPY is being shockingly efficient; to the point where it's barely registering any network usage. That's one mystery solved.

Still not sure why the server is rebooting under heavy I/O. Both the internal NICs and the Intel NICs are supported under FreeBSD HW compatibility list. Thankfully, there hasn't been any data corruption when the server just goes kaput.
 

bmh.01

Explorer
Joined
Oct 4, 2013
Messages
70
Does it only die when under load with a VAAI operation? If so I have seen this with no solution at this point although I haven't fully investigated, do you have any crash dumps present after the reboot?

I'd see CPU usage rise and rise and rise then it would (I assume) kernel fault and reset.
 

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
Most likely. The box is purely for serving iSCSI devices. I was doing VM migration all 3 times when the server rebooted. I just remember seeing a lot of network/disk activity across all NICs and then without warning all the connections were lost. There were no crash dumps as far as I can tell.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
Most likely. The box is purely for serving iSCSI devices. I was doing VM migration all 3 times when the server rebooted. I just remember seeing a lot of network/disk activity across all NICs and then without warning all the connections were lost. There were no crash dumps as far as I can tell.
Could this be a hardware failure? With all that activity, is it possible the chipset on the NIC is getting too hot? If you have physical access, try opening the case
and placing a fan blowing on the heat sink???
 

bmh.01

Explorer
Joined
Oct 4, 2013
Messages
70
Could this be a hardware failure? With all that activity, is it possible the chipset on the NIC is getting too hot? If you have physical access, try opening the case
and placing a fan blowing on the heat sink???
Under a VAAI operation there shouldn't be a lot of load on the network side, as the idea with VAAI is the offload the operation to the storage so you don't saturate the network with an operation that doesn't need to cross it.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
VAAI avoids network bottleneck. It means that load on storage subsystem is only increasing.
 

cfgmgr

Cadet
Joined
Jan 9, 2015
Messages
9
Assuming you have persistent logging setup on your ESXi box, you may want to investigate the vmkernel.log and vmkwarning.log file for any clues just prior to reboot. Also depending on your version of NIC firmware/driver you may benefit from an update?

Also, as someone noted, it is possible your card may be getting too hot...though seem to be rare. I have seen VMware logs where it specifically has called out that the NIC is start to go above temperature.
 

bmh.01

Explorer
Joined
Oct 4, 2013
Messages
70
Why on gods green earth are people fixating on NIC temperatures? VAAI specifically avoids network load, thats-the-whole-idea.
 

cfgmgr

Cadet
Joined
Jan 9, 2015
Messages
9
Why on gods green earth are people fixating on NIC temperatures? VAAI specifically avoids network load, thats-the-whole-idea.

I agree it is likely not the cause - but again it it very possible. Typically it will only cause a bus reset...and not a reboot however. It is a legitimate error message that can pop up in the logs. Only observed it on one occasion.
 

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
Thanks for all the feedback. I never was able to trace down the issue. The server sits in a server room set at 70 so I don't think it's a temperature issue. It's possible the temperature inside the case is hotter than usual, but I can feel pretty good airflow coming out the back and around the card.

I ended up upgrading to the latest stable release (Previously on a May 4th release) and the issue seems to have gone away. Haven't had a reboot since.
 
Status
Not open for further replies.
Top