iSCSI hung. where are the logs?

Status
Not open for further replies.

Alan Johnson

Dabbler
Joined
Jul 2, 2013
Messages
12
Let it be known that I have very little experience with FreeNAS and FreeBSD. I am very solid with Linux and know my way around iSCSI fairly well. I searched and read quite a bit on the forums and in the manual, but have not found anything yet. My apologies if I missed something I should have caught. On to the problem...

Last week, I put FreeNAS 8.3.1-p2 64b on a new Dell R515 (bb01: 6 AMD cores 64GB RAM, 12x 4TB SATA disk + 2 512SSD). (My first FreeBSD install in years.) It was working great for about 5 days until our oVirt hosts (7x CentOS 6 blades running the openiscsi service for initiators) all reported they could no longer communicate with the target about 2 days after I set it up and pointed them all at them. This happened after 11PM, so it is very unlikely anyone was in there mucking with it. I believe I am the only one who has logged in and very confident that I am the only one making changes.

After quite a bit of trouble shooting, I found that the istgt service was hung on bb01. The first symptom was that when I tried to stop the service in the WUI, it just sat there with the spinner spinning and never completed. I could close the services control tab, go back in, and it would show iSCSI as off. When I clicked it again, it just spun again, until I closed and reopened it again to find the indicator set back to on.

When I first configured the service, I took my best guess at making the Target Global Configuration parameters more robust based on the basic info in the manual. I could not find more useful info on these settings particularly regarding what the defaults are in the openiscsi for the initiators (as suggested by the manual). I can upload a screen shot of what I had them set to if anyone feels it relevant. So, I set them all back to their defaults, hoping that might help it start. I tried starting first after turning off LUC, then again after setting the restart back to defaults, and I just got the same behavior from the Control Services iSCSI switch.

After some amount more fumbling in the WUI, I found nothing relevant in the Reports graphs or the messages log. I went to the command line try and find some other logs. I looked again at /var/log/messages, and found very little, and nothing related other than... (Dang it. Apparently, messages is cleared at reboot and no old ones are stored? So, I can't share the exact line. I'll have to setup a log archiver. Anyway...) ... other than an indication that the Control Services iSCSI switch was not going to turn istgt off because it was already marked as disabled, or something like that. I grepped /var/log/* for iscsi and istgt but found nothing.

Finally, I grepped `ps uax` output for iscsi and istgt. I found the "/usr/local/bin/istgt -c /usr/local/etc/istgt/istgt.conf" process running even though Control Services indicated iSCSI was off. I also found 8 pairs of processes that were associated with the WUI trying to forcestop istgt. I toggled the iSCSI switch in Control Services again, and it just added to the list of forcestop processes.

So, I tried the implied command of those pairs: service istgt forcestop, and unfortunately, I didn't capture the output, but basically it just said something similar to the log entries mentioned above: "nah, it is already marked off". I tried status and stop as well, but similar output. I tried `kill istgt` and it stayed alive. I tried `kill -n 9 istgt` and still it would not die.

Finally, I bit the bullet and rebooted only to find shutdown hung waiting for processes to die. I assume it was istgt. I give it many minutes before I power cycled it. When it finished booting, iSCSI seemed fine. oVirt happily reactivated the iSCSI storage (all 7 initiators reconnected with no problem) and the virtual drives (Linux logical volumes (LVM) under the hood) that had been created there where still there and happy.

Through all this (except during the reboot of course), everything else seemed to be working fine, including the NFS share I had setup prior to the iSCSI target, which is running on the same zpool. It has been running for almost a day like this and I will update if it crashes again.

In the mean time, I'd like to figure out better ways to troubleshoot this kind of problem. Has anyone seen istgt hang hard like this before? Where should I expect to find logs and error messages for iSCSI/istgt? How does one really-really-kill a process in FreeBSD, like when kill -9 does not work? (BTW, I saw nothing indicating zombie status, but I'm not sure if it is the same as in Linux.) Where can I find good docs on safe levels for those Target Global Configuration parameters, particularly max. sessions and max. connections? Any other tips and tricks for this stuff?

Thanks much in advance for any help and efforts. I will keep looking in the mean time.
 

Alan Johnson

Dabbler
Joined
Jul 2, 2013
Messages
12
Oh, just googled for simply "istgt" and came across FreeBSD iSCSI-target guide with this interesting bit in a sample config:
Code:
# iSCSI initial parameters negotiate with initiators
# NOTE: incorrect values might crash
FirstBurstLength 65536
MaxBurstLength 262144
MaxRecvDataSegmentLength 262144

I had them set like this guessing bigger was safer in case the initiator was set bigger:
Code:
FirstBurstLength 262144
MaxBurstLength 2097152
MaxRecvDataSegmentLength 2097152


Am I on to something?
 

Alan Johnson

Dabbler
Joined
Jul 2, 2013
Messages
12
It has been running fine with the defaults so far. Today I turned LUC back on and left the rest alone. I expect that will be OK. I think my original post is too long to get attention on the good questions at the end, so I'll repost the remaining questions without all the details as a convenience to others answering, and those searching for the same answers.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Troubleshooting iSCSI(and most ZFS performance issues in general) are very much a personal issue to deal with. Your issues are based on your hardware, system needs, and how you use the system. Because of this there is no "silver bullet" that works with many/most/all people. The defaults will work for many/most configurations. But if those aren't working for you, trying to tweak stuff to make it work for you is very much an issue for you and your server and doesn't always reflect other people.

That's why you got no responses from anyone. If I (or anyone else for that matter) had tried to help you we could have easily spent weeks or months trying lots and lots of things. That's why FreeBSD admins can make great money....
 

Alan Johnson

Dabbler
Joined
Jul 2, 2013
Messages
12
Thanks, cyberjock! You reply supports my suspicion that those who might know my answers didn't make it to the end of the post where I finally get around to the actual questions. =) The questions I ask are not specific to my setup. My fault for being so ridiculously long winded.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Oh yeah.. it does... lol. I didn't read your first post today, only your latest post.

I forgot to mention that I have no idea how to do stuff with the logs. I know you have to setup a syslog server, but I couldn't get one to work for me. :P

If you had made your post something like "I'm having iSCSI problems and I want to look at the logs after the system freezes/crashes" you might have had a better chance at the answer. :( Kind of silly because either people provide too much or too little. And when you do either one some moderator like myself happily tell you we need more or less. lol

At least you got your problem solved.
 

Alan Johnson

Dabbler
Joined
Jul 2, 2013
Messages
12
hehe. Yep. I'm hoping my shorter posts will be more useful for everyone. =) If I get some good answers there, I'll reference them here and mark this as solved.

Thanks again.
 
Status
Not open for further replies.
Top