SOLVED FreeNAS 11.1 Disappearing RAM?

Status
Not open for further replies.

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
First, I apologize for the lack of details. I'm remote to the system right now and can't get all the system details. I'll post again later today with further details and results from any suggestions. Thanks.

Now then, I upgraded from 9.10 to 11.1 yesterday afternoon. I haven't experienced any issues with my existing jails, save for an iohyve vm that lost network access and I've since destroyed and started rebuilding. Then I saw the thread with instructions on potentially reusing the disk zvol. Oh well.

Anyway, everything has been fine until today when I noticed that I lost around 2GB of RAM last night. The system info page reports 16GB, but the reporting page shows a drop of almost 3GB around 0100 EST. It then bumps up by a bit more than 500 MB at 0120. Around 1030 it dropped another GB. I'm sitting just about exactly 3GB down from what I expect.

If I look at the FreeNAS web ui I see 16 GB reported. 'sysctl hw.physmem' shows 16 GB as well. The reporting page shows the drop in RAM.

If it was a bad stick I'd have expected a system halt or to see the total drop by 8GB. This looks more like something is taking up the RAM, but not being included in the numbers for active, inactive, wired, free, and cache. I do have the new VM running, which I gave about 3GB, but I'd expect that to be included in the reporting. When I get the chance I'll try killing the VM to see if the RAM comes back.

So, any suggestions on what I can do to identify what is using the missing RAM? Or if something is actually wrong with a stick and this is individual RAM chips failing over time?

I'll be back as soon as possible to update this with my actual system config and any results from further testing.

Thanks for your help.

edited to add system configuration:
  • ASRock E3C226D2I
  • Intel Xeon E3-1220 v3 @ 3.10GHz
  • 16 GB ECC
  • RAIDZ2: 6x Seagate 4TB ST4000DM000
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I noticed that I lost around 2GB of RAM last night.
How, exactly, did you notice that you "lost around 2GB of RAM"?
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
I send the collectd stats and some others to Librato (see attached image) but the charts on the Reporting tab (data from collectd) show the same drop.

The gap in the image is when I rebooted for the upgrade to 11.1.
 

Attachments

  • ram.png
    ram.png
    26.3 KB · Views: 415

Xelas

Explorer
Joined
Sep 10, 2013
Messages
97
If it's an older system, maybe it's getting senile? My total RAM is also dropping with age as well, and the ECC functionality failed right about when my kids were born and I was sleeping for 3-4 hours per night. It never came back. I get memory errors all the time. I actually couldn't recall my boss's full name for a few seconds on a conference call earlier today!

Now, to get back on topic - have you tried restarting the collectd daemon? It could just be a glitch with the daemon.
What if you side-step collectd and query the system more thoroughly through sysctl? Here is page with some scripts you may find helpful, including a fairly nifty shell script:
https://www.cyberciti.biz/faq/freebsd-command-to-get-ram-information/

Also, this thread was helpful - you can apparently query the health of ECC RAM from within FreeBSD (and I assume by extension FreeNAS) if you doubt the health of your RAM:
https://lists.freebsd.org/pipermail/freebsd-performance/2012-April/004585.html
The danger here is that nothing being reported could mean your RAM is fine, or it could be that ECC errors aren't being reported or logged correctly. If you have a know-bad RAM module you can test with, it might be useful to test the ECC functionality. Maybe one way to do this is to intentionally damage a single RAM package on a RAM module that is too small to be useful? Or you have spares laying around you can use as a test specimen? I've never done that, so I have no idea if that will work or not!
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
Code:
> sudo mcelog --ascii --file /var/log/messages
>

Looks good. Either my RAM is fine or ECC isn't detected properly.

Code:
> sysctl hw.physmem
hw.physmem: 17070518272

Yep, 16GB.

Code:
> sysctl hw | egrep 'hw.(phys|user|real)'
hw.physmem: 17070518272
hw.usermem: 6129745920
hw.realmem: 17985175552

Not sure how this works out exactly, but looks OK.

Code:
> grep memory /var/run/dmesg.boot
real memory  = 17985175552 (17152 MB)
avail memory = 16517664768 (15752 MB)

Looks reasonable, not sure why available is so high, but different from real, but it's not too far off.

Code:
> /usr/local/bin/perl ./freebsd-memory.pl
SYSTEM MEMORY INFORMATION:
mem_wire:	   10978406400 (  10469MB) [ 66%] Wired: disabled for paging out
mem_active:  +	737406976 (	703MB) [  4%] Active: recently referenced
mem_inactive:+   1292951552 (   1233MB) [  7%] Inactive: recently not referenced
mem_cache:   +			0 (	  0MB) [  0%] Cached: almost avail. for allocation
mem_free:	+	306196480 (	292MB) [  1%] Free: fully available for allocation
mem_gap_vm:  +   3307835392 (   3154MB) [ 19%] Memory gap: UNKNOWN
-------------- ------------ ----------- ------
mem_all:	 =  16622796800 (  15852MB) [100%] Total real memory managed
mem_gap_sys: +	447721472 (	426MB)		Memory gap: Kernel?!
-------------- ------------ -----------
mem_phys:	=  17070518272 (  16279MB)		Total real memory available
mem_gap_hw:  +	109350912 (	104MB)		Memory gap: Segment Mappings?!
-------------- ------------ -----------
mem_hw:	  =  17179869184 (  16384MB)		Total real memory installed

SYSTEM MEMORY SUMMARY:
mem_used:	   15580721152 (  14858MB) [ 90%] Logically used memory
mem_avail:   +   1599148032 (   1525MB) [  9%] Logically available memory
-------------- ------------ ----------- ------
mem_total:   =  17179869184 (  16384MB) [100%] Logically total memory

Ah, here we go. That mem_gap_vm line is what is missing from the memory reporting. It's odd that it's listed as UNKNOWN, while also identifying it as mem_gap_vm. Does the 'vm' not refer to Virtual Machine? Stopping the VM cut the mem_gap_vm line in half, but it didn't go away.

So I'm not sure if this is a reporting bug, or a memory leak.
I've created #27356 to track the issue.
  • ASRock E3C226D2I
  • Intel Xeon E3-1220 v3 @ 3.10GHz
  • 16 GB ECC
  • RAIDZ2: 6x Seagate 4TB ST4000DM000
 

Xelas

Explorer
Joined
Sep 10, 2013
Messages
97
I don't think that's RAM taken up by VMs. I think the "VM" is in reference to Virtual Memory. There could be some complex inter-play with the page-file? I wouldn't make sense for RAM allocated to virtual machines to show up in a RAM report at the kernel level.
What if you query swap file use at the same time? Just curious if there is a correlation.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
That's a good point, though it's interesting that the gap dropped by half when I shut down the VM.
I think I remember swap use being very low, but I don't have the numbers. I'll post those later today. I'll try starting the VM up again today and I'll post that impact as well.
 

Xelas

Explorer
Joined
Sep 10, 2013
Messages
97
That's a good point, though it's interesting that the gap dropped by half when I shut down the VM.
I think I remember swap use being very low, but I don't have the numbers. I'll post those later today. I'll try starting the VM up again today and I'll post that impact as well.

The gap drop could be explained if stopping the virtual machine impacted how much memory was being swapped. That would also explain why there was a correlation, but not 1:1. Shutting down the virt machine means less RAM used, so the system moves some stuff around but doesn't completely page in what it had paged out into the swap file. If "vm" = "virtual machine", then you would expect shutting down the virtual machine to free all of the RAM up.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
Alexander Motin figured it out in the ticket. It looks like the Cache category has been replaced with Library. I modified the freebsd-memory.pl script to include a line for Library and the gap disappears.

Thanks all.
 

Xelas

Explorer
Joined
Sep 10, 2013
Messages
97
Excellent - and thank you for wrapping up the thread with the solution. I HATE it when the OP fixes the issue and just disappears, or even worse, writes "Fixed it" and disappears.

Looks like they'll need to also modify the config for collectd so that gets reported properly as well.
 
Status
Not open for further replies.
Top