How do you handle a hard istgt hang?

Status
Not open for further replies.

Alan Johnson

Dabbler
Joined
Jul 2, 2013
Messages
12
I had some recent issues with istgt hanging hard to the point where it did not even respond to a kill -n 9. I also could not find anything in the logs about it. I think I have the problem fixed by going back to some default settings, but it leaves me uneasy with my lack of understanding. So my noob questions are:

Has anyone seen istgt hang hard like this before? Where should I expect to find logs and error messages for iSCSI/istgt? How might I increase the detail on such logs?

How does one really-really-kill a process in FreeBSD, like when kill -9 does not work? I saw nothing indicating zombie status, but I'm not sure if it is the same as in Linux.

--- system spec ---
FreeNAS 8.3.1-p2 64b on a Dell R515:
  • 6 AMD cores
  • 64GB RAM
  • 12x 4TB SATA disk in raidz2
  • 2 512SSD partitioned into
    • 32GB of mirrored ZIL
    • 960GB of stripped L2ARC
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Oh my. You have 960GB assigned to the L2ARC? That's far too much for your system RAM. The L2ARC should be sized to fit how much data you regularly need access to. You might have the largest L2ARC I have ever heard of!

The L2ARC needs 200 bytes per L2ARC record(4kb). This comes out to your 960GB of L2ARC using 46GB of RAM just to manage the L2ARC. You may be having issues just because of the excessive RAM needed to manage your L2ARC and you're starving ZFS for RAM for other uses. You are really running the system almost as if it had 18GB of RAM. For a zpool of that size, that's a bit short. ;)

I'd start with either upping your RAM to 128GB of decreasing your L2ARC by 1/2.

I'm not sure of your exact configuration for your SSDs, but its recommended that you not use the same device for the ZIL that you use for the L2ARC.

Also, rebooting your system "resets" the L2ARC. So its better to run the system longer and let the system "come up to full speed". There's some threads around that discuss how the L2ARC fills up with data. It does so rather slowly(just a few MB/sec if I remember correctly). You can change the value to fill the L2ARC sooner but then it causes other problems like excessive cache flushing.

Edit: Also, if you aren't happy with that system feel free to make a new one and mail me that piece of trash. I'll properly dispose of it for you. :)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Cyberjock hit this squarely on the head. You're creating a huge amount of stress on the system main memory, ARC is limited to 7/8ths memory, so you can only have about 56GB of ARC, and not all of that can be used for L2ARC records anyways. A 5:1 or 10:1 ratio is okay for L2ARC:ARC in most cases.

Putting the SLOG on the same SSD's as the L2ARC is a poor design decision. I'm going to guess that is MLC flash, which has awesome performance characteristics for L2ARC, but can be a bit laggy for SLOG, and when combining SLOG and L2ARC, you are creating potential for contention for the device.

Also, you absolutely must look at tuning l2arc_write_max and l2arc_write_boost, I've given advice on this in the forums before. You can create wild stresses on the main pool with insufficient ARC/L2ARC, and if these are not tuned, L2ARC warms up very slowly, and ARC is thrashing about wildly, and the pool is suffering lots of unnecessary I/O.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
You're creating a huge amount of stress on the system main memory, ARC is limited to 7/8ths memory, so you can only have about 56GB of ARC, and not all of that can be used for L2ARC records anyways.
You can tune the ARC max higher than 7/8ths. If your L2ARC device is too large, more than the ARC can reference, the "extra" simply goes unused. If I'm not mistaken L2ARC records are considered metadata.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yes, but you can't really tune it to 16/8ths, which is what is needed. Tuning to 15/16ths is great and all, but only increases the ARC by 4GB. There's no tuning fix for this, it is kinda broken.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I've gotta laugh at this thread. The OP(I know you had good intentions) really went all-in on the L2ARC. He deserves an award for the "determined to make this work" factor. It is impressive, even though it was overboard.
 

Alan Johnson

Dabbler
Joined
Jul 2, 2013
Messages
12
hehe. Thanks, but I just used what I had. =) When the box arrived, I discovered the 2 internal 2.5" bays and tossed in a couple of 512GB Crucial M4s I had in stock. Since the manual said to limit ZIL to 1/2 memory as more would go unused, and I didn't see anything about the possibility of to much L2ARC, I figured why not use it all? If at this point, some just goes unused, that's preferable to re-configuring to ensure that some goes unused... right? Performance is very far from being any concern on this system and the SSDs were just a bonus I added after spec.

Either way, I appreciate the advise, and look forward to confirmation of the above question, but I very much doubt it is relevant. When istgt hung, the other services were plugging along nicely, including the nfs and samba shares on the same zpool. I have no reasons to believe the zvol backing the target was having any issues either. There was no noticeable problem on the machine other than iSCSI service being unresponsive and refusing to die. I can post the graphs under Reporting if you like, but they were all well within nominal, IMHO. Also, the problem has not returned since I reset the Global Target Configuration to default settings. The lockup happened within a couple of days of turning on iSCSI and has not returned for over a week.

Regardless, my goal for this thread was not to dig into my specific problems, but to address the general questions of my OP so I, and hopefully others, can better understand the system to troubleshoot in the future. That said, again, I very much appreciate the other advise, and welcome more. I just hope it won't distract entirely from those more general questions. =)
 
Status
Not open for further replies.
Top