FreeNAS 11.1 possible memory leak

Status
Not open for further replies.

cepa

Cadet
Joined
Dec 25, 2017
Messages
3
Hi All,

first of all, I would like to thank iXsystems guys and FreeBSD team for amazing piece of software FreeNAS is. I've been using it for more than 5 years and always been happy with stability and feature set FreeNAS offers.

Recently, decided to restore my server lab used for development and devops muscle training and rebuilt the primary storage server that exposes NFS to four VM hypervisors. Details are listed below, to keep it short, it is a 8x HDD Stripe-Mirror with single SSD divided into ZIL and L2ARC and Swap, Xeon, 16GB ECC, networking over bonded 4x 1Gbe using LACP.
Soon after setting it up, I've noticed the system hangs unexpectedly without printing any output neither in console nor in any logs, here's what happened:

1. Initial setup, 8x HDD stripe mirror + Intel 520 SSD, 8GB ZIL, 196GB L2ARC, no swap, LACP + jumbo frames, default tunables - system freezed in couple of hours, initially responsive to ping, but impossible to log into SSH, seemed like stuck on IO. Before it happened, noticed relatively high CPU usage on System and IRQ.
2. Thought it might be SSD, so replaced it with Intel 320 SSD, left 8GB, 128GB L2ARC, no swap, LACP + jumbo frames, default tunables - again system freezed without crashing in couple of hours.
3. Suspected Jumbo Frames to be the culprit, so reverted back to 1500 MTU - system freezed again but after a longer time.
4. Had an L2ARC hunch, set it in tunables to 12GB and increased l2arc_write_max and l2arc_write_boost to 128MB - system freezed again but this time worked even longer, more than 24 hours, although I wasn't doing any load tests in meanwhile.
5. Finally noticed there is no swap (lol) so I added 16GB parition on the SSD and swapon'ed it - system freezed but again after more than a day, noticed FreeNAS started swapping soon before crashing.
6. Turned on autotune, go 13GB for L2ARC and started doing load tests - freezed again in around 12 hours or so, noticed swapping again soon before system freezed
7. Bless you guys for the "Remote Graphite Server Hostname" option to upload carbon stats, done that, decreased L2ARC to 10GB, started load test to see what's happening - freezed again and rebooted itself around an hour later early morning.
8. Retried load tests and watched Grafana - same result, freezed but this time got a pattern in Grafana dashboard.

So, it seems, FreeNAS, a kernel or some process, is running of of memory, regardless of the L2ARC size, at some point it starts to swap heavily and soon after that, poof... freezed, without even saying oh f*** y** in any logs.

Snapshots of the Grafana dashboards:
- 3 hours before the freeze: https://snapshot.raintank.io/dashboard/snapshot/9xM3Z6ropc6eVBZIUqtC0737Yvw3Y9X0?orgId=2
- 24 hours before the freeze: https://snapshot.raintank.io/dashboard/snapshot/zIYgjKmangIGNhKV6XbeoqNrwW9GhuUp?orgId=2

Have a look on the ZFS Hit Ratio, Memory, Swap, and SSD IOPS.

Hardware:
- Intel S3420GP mobo
- Xeon X3420
- 16GB DDR3 ECC
- 2x onboard Intel NICs used for management only
- 4x Intel 82575GB 1GBe, bonded in LACP, no jumbo currently
- 2x Marvell 88SE9235 controllers, two separate 4x SATA controllers, on dedicated to two HDD, one to SSD

Attached: lspci, ps aux, sysctl -a, vmstat, zfs-stat taken soon before freeze.

Hope that helps, in meanwhile I'm going to downgrade to 9.10 (?) to see if the issue occurs again, after all need a stable storage :)
 

Attachments

  • lspci.txt
    2.9 KB · Views: 425
  • psaux.txt
    9.1 KB · Views: 405
  • sysctl.txt
    163.1 KB · Views: 794
  • vmstat.txt
    9.4 KB · Views: 411
  • zfs-stat.txt
    8.8 KB · Views: 492
  • dmesg.txt
    13.8 KB · Views: 391

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
You might want to file a bug report so the devs can take a look at this.
 

AndrX

Dabbler
Joined
Aug 8, 2017
Messages
21
After update 11.0-U4 to 11.1 I have any trouble: memory leak, swap file lost and etc.
System reboot after work under load: run transmission download or run VM bhyve and trend memory start up.
After 3-5 hour load run system reboot.

How downgrade system to stable version 11.0-U4 ?
 

cepa

Cadet
Joined
Dec 25, 2017
Messages
3
You can try by doing manual update (System -> Update -> Manual Update) with this package: https://download.freenas.org/11/11.0-U4/FreeNAS-11.0-U4-manual-update.tar
However, I tried that and had a couple of errors, mostly related to underlying Django database schema, guess updates are one way only, so at the end of the day I simply reinstalled my FreeNAS box and imported the pool. Configuration import also worked so-so, some data was not compatible with older version of FreeNAS, but might work for you.
 

cancel

Cadet
Joined
Aug 14, 2017
Messages
5
Seeing the same behavior in system load and middlewared. Went back to 11.0-U4 that worked ok for 3 months, until this one is fixed.
 
Status
Not open for further replies.
Top