System crashes under heavy iSCSI load (Fatal trap 12: page fault while in kernel mode)

kurtc · Dec 28, 2017

I have an 11.0 U4 system (was 11.1 but I reinstalled on fresh boot disks with a previous version to rule that out) that crashes after running significant load to an zvol presented over iSCSI to a Windows system. It usually happens after about 20 minutes of continuous reads/writes (heavy sequential) or about when the ARC fills up. The system has 256GB of RAM and there is no deduplication. Since the crash references arc_reclaim_thread it is interesting that it usually corresponds to the ARC filling up the RAM. Attached is a screenshot of what I see when it crashes.

Screen Shot 2017-12-29 at 12.24.31 AM.png

kurtc · Dec 29, 2017

I have tried swapping out the NICs, the LSI card, the cables, the system board, all RAM, the CPUs, the Power Supplies, reloading the system, etc. With all that replaced same error. (attached). It really seems like a bug. I posted on some other threads that sounds like similar issues, but most were being blamed on hardware issues. I can say I have ruled that out.

Should I try going back to an even older version than 11 U4?

Screen Shot 2017-12-29 at 10.31.35 AM.png

AuBird · Dec 29, 2017

I've been happy with my 9.10 box. I don't know what you consider a high load, so here is what it's been like for the year or so. Collectd died at one point in it's life.

kurtc · Dec 29, 2017

I don't see the picture, but I assume going backwards at all will require a reconfiguration. Can someone let me know how far I can go back and still import my pool built with 11.0 Ux?

Also, would there be benefit in restricting the amount of system RAM the ARC can use, since the crash seems to reference ARC remapping? Something lower than the 256GB of RAM in the system.

Thanks,
Kurt

kurtc · Dec 29, 2017

I also noticed that I can read info from the iSCSI zvols for long periods of time at a high rate without crashing the system. It is only heavy writes do I get the crash above.

kurtc · Dec 30, 2017

So I put a SLOG back on my pool and turned on Autotune. I wasn't able to crash it last night during tests. I had originally removed the SLOG because of the ~5TB or so of writes that will be going to this pool per day. I don't know if it is truly fixed without further testing, and I don't know which change might have fixed it.

kurtc · Dec 30, 2017

Here are the autotune settings it put in there.

Screen Shot 2017-12-30 at 12.53.51 PM.png

kurtc · Jan 2, 2018

OK, so it turns out the SLOG is what prevents the system from crashing. So I built an entirely new system with almost identical hardware (different brand of drives) and I can crash it the exact same way until I add a SLOG drive. Here are the specs.

-Dual E5-2670
-256GB of reg-ECC DDR3 RAM
-LSI 9300-8i (no SAS expander)
-10TB 4K native helium SAS drives (8 in RAIDZ2) (have tried 10TB Seagate Iron Wolf SATA)
-Dual Intel 750 400GB NVMe SLOG
-Chelsio T520 dual 10Gbe (have tried Intel X520-DA2, X540-T2, and X710-DA2)
-11.0 U3/U4, and 11.1 have been tried
-Sync=standard

toadman · Jan 2, 2018

Sorry to hear you are having trouble! Interesting that adding an SLOG seems to fix the issue. The interpretation is you are getting too many sync writes for the RAIDZ2 pool to handle without an SLOG.

It would be interesting to monitor the sync write volume and find out what volume you are seeing. Depending on the client/workload you could have a high sustained number of sync, or a more bursty profile (which is what I would expect for an iscsi vmware datastore, though even that might be "constant" but a lighter load).

It sounds like you are just running out of memory. I suspect the SLOG helps because you can flush to ZIL fast enough to not require more memory buffering, where as on the hdd pool, given that it's slower, you need more RAM buffering. It tries to grab some memory and cannot. Crash.

I think arc_reclaim thread (which I think is responsible for reducing the arc back down to it's target size if it's over) is nominally called about 1x/second I think. So in one particular second you have a large volume of sync writes that come in, the arc is already maxed or over, and that starts the request for more memory, leading to the crash.

I wonder if you just limit the arc max size down several GB (seems like you have plenty of memory at 256GB. How much of that is arc when it's maxed?)? You should then have memory left for the arc to expand for that one second as needed.

Just initial thoughts having read the thread. I'm by no means an expert here, but might be worth a shot if you are experimenting.

kurtc · Jan 2, 2018

toadman said:
Sorry to hear you are having trouble! Interesting that adding an SLOG seems to fix the issue. The interpretation is you are getting too many sync writes for the RAIDZ2 pool to handle without an SLOG.

It would be interesting to monitor the sync write volume and find out what volume you are seeing. Depending on the client/workload you could have a high sustained number of sync, or a more bursty profile (which is what I would expect for an iscsi vmware datastore, though even that might be "constant" but a lighter load).

It sounds like you are just running out of memory. I suspect the SLOG helps because you can flush to ZIL fast enough to not require more memory buffering, where as on the hdd pool, given that it's slower, you need more RAM buffering. It tries to grab some memory and cannot. Crash.

I think arc_reclaim thread (which I think is responsible for reducing the arc back down to it's target size if it's over) is nominally called about 1x/second I think. So in one particular second you have a large volume of sync writes that come in, the arc is already maxed or over, and that starts the request for more memory, leading to the crash.

I wonder if you just limit the arc max size down several GB (seems like you have plenty of memory at 256GB. How much of that is arc when it's maxed?)? You should then have memory left for the arc to expand for that one second as needed.

Just initial thoughts having read the thread. I'm by no means an expert here, but might be worth a shot if you are experimenting.

Are you referring to the vfs.zfx.arc_max value that autotune put in above in the thread? If so should I make it lower? I wonder if that is a "max", will it just end up crashing anyways hitting that limit before a reclaim can take place?

I am new to FreeNAS and ZFS, but it is surprising to me that a system wouldn't be able to keep up with 200-300MBps of sequential writes without crashing!

Thanks!

toadman · Jan 2, 2018

Yea, I was referring to vfs.zfs.arc_max. I somehow missed the pic of your current tuneables, sorry about that! So I would think the current setting of vfs.zfs.arc_max would be fine.

Just to clarify, did you reboot since that arc_max tunable was set? i.e. is that the active value? or is that what autotune put in but it's not active yet because the system hasn't rebooted yet?

Code:

sysctl -a | grep vfs.zfs.arc_max

If you put in the SLOG at the same time as lowered the arc_max, maybe it's the latter and not the former that "solved" the problem? I was just suggesting you try the latter after removing the SLOG. i.e. isolate the issue further. Memory or SLOG. Or maybe both. Given it appears to be a memory starvation issue (in part), I would think having extra free memory available might then solve the issue. Hence the experiment with lower arc_max and no SLOG.

"it is surprising to me that a system wouldn't be able to keep up with 200-300MBps of sequential writes without crashing!" I completely agree with you on this statement. But we are probably looking at a bug here.

kurtc · Jan 2, 2018

I did try the autotune settings (including arc_max) with a reboot and no SLOG. That still was able to crash the moment the arc filled up. I don't know if I need to go even lower than what is above and see if that makes any difference.

toadman · Jan 2, 2018

Interesting. So an SLOG (fast SSD) must be in there to prevent the crash. Definitely sounds like a bug.

I suppose you could always dump "vmstat 1" into a file and try to initiate a crash. Then see what the memory was doing just prior to the crash. Then do the same with the SLOG in there and see what the difference is. Maybe run "top" in a console and see what that shows as the system gets closer to a crash as well. But short of the developers taking a look at this after reproducing, I'm not sure I can suggest anything to really help. :)

FlyingYeti · Feb 21, 2018

Kurtc, did you ever find a fix other than using SLOG? I have had the exact same experience as you (haven't tried SLOG and don't really want to). I'm also using Windows + iscsi. I've tried 2 different 1Gb nics (Intel/Broadcom), and now fibre channel (Qlogic 2564 and 2562). My machines have 48Gb RAM. I have two zvols, one made of spinners, RAIDZ, 7TB, the other, dual striped 1TB SSDs. I can easily crash FreeNas by just running a Crystal Disk Mark test on either pool.

toadman · Feb 21, 2018

I would definitely file a bug on this one if not done already. Please post the bug number here so folks can track progress.

kurtc · Feb 21, 2018

FlyingYeti said:
Kurtc, did you ever find a fix other than using SLOG? I have had the exact same experience as you (haven't tried SLOG and don't really want to). I'm also using Windows + iscsi. I've tried 2 different 1Gb nics (Intel/Broadcom), and now fibre channel (Qlogic 2564 and 2562). My machines have 48Gb RAM. I have two zvols, one made of spinners, RAIDZ, 7TB, the other, dual striped 1TB SSDs. I can easily crash FreeNas by just running a Crystal Disk Mark test on either pool.

Unfortunately, no I did not. The only way I can make it stay up is by having a SLOG. Which is a problem considering this is a set of backup iSCSI targets. I am burning through write endurance quite quickly. I had to have the system in production, so I was not able to keep crashing it to gather logs.

FlyingYeti · Mar 15, 2018

I've been running solid now for almost a month now. I was about to throw in the towel and then checked for an update. I installed 11.1-U2 (upgrade from 11.1-U1), and ever since then, everything has been rock solid. I haven't found any bug fixes that could have contributed to fixing the issue. I wish could find some proof that something changed between those versions that would have affected this. It makes me very nervous to upgrade from here.

kurtc · Mar 16, 2018

That is great news!

I am so nervous to take out the SLOGS and "crash on purpose" if it is still there. I guess it will happen on its own if they wear out from writes. Need to figure out an effective way to test this without endangering my data or having to purchase another complete system.

Razorblade · May 28, 2018

Hello @kurtc
were you able to resolve that issue?
I am experiencing similar issues: my FreeNAS system crashes completely when under heavy iSCSI (write) load. Today I even lost access to the system: no ssh, no web, no local console accessible. Fortunately iSCSI is still running so I have my VMs still up and running.
I am running 11.1 U4, maybe there's really a bug. I do not have a SLOG drive (yet

)

or4n · May 28, 2018

I had this same issue. I ended up getting Optane for SLOG and haven't seen crash since.

Important Announcement for the TrueNAS Community.

System crashes under heavy iSCSI load (Fatal trap 12: page fault while in kernel mode)

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Guru

Dabbler

Guru

Dabbler

Guru

Cadet

Guru

Dabbler

Cadet

Dabbler

Dabbler

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "System crashes under heavy iSCSI load (Fatal trap 12: page fault while in kernel mode)"

Similar threads