Failing drive and swap shoots/kills FreeNAS processes

Status
Not open for further replies.

Lars Jensen

Explorer
Joined
Feb 5, 2013
Messages
63
Before putting more RAM into the FreeNAS box I would like to ask if the following behavior can be avoided and keep the box online and recover without a reboot ?

This setting in System > Advanced

Swap size on each drive in GiB, affects new disks only. Setting this to 0 disables swap creation completely (STRONGLY DISCOURAGED).

Is currently set to 2.

Today one drive failed and the the FreeNAS box had Swap: 24G Total, 232M Used, 24G Free

232M used swap. When the drive failed several processes for GUI like nginx / django went down. Nginx was able to restart, but django was not possible to restart.

I assume the swap on the failed disk caused the problem.

I'm not familiar with each and every process FreeNAS is running, so not knowing which processes have been hit by the failed disk, I rebooted the box and everything was working fine again, and resilvering is going on fine.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
No that's not the problem.

If your disk swap space came above zero, that means you demanded much more RAM than could be allocated from unused + reclaimable areas.

Which means that you probably don't have enough RAM. As per the forum rules, please tell us your hardware, to include motherboard make and model, and of course, how much RAM you have.

When you say a "drive failed", did you replace it? Do you mean a boot-device "drive", or a pool drive?

About 5 or 6 things could be wrong here, and to know which it is, I will need to know the above answers.

Thanks.
 

Lars Jensen

Explorer
Joined
Feb 5, 2013
Messages
63
No that's not the problem.

If your disk swap space came above zero, that means you demanded much more RAM than could be allocated from unused + reclaimable areas.

Which means that you probably don't have enough RAM. As per the forum rules, please tell us your hardware, to include motherboard make and model, and of course, how much RAM you have.

When you say a "drive failed", did you replace it? Do you mean a boot-device "drive", or a pool drive?

About 5 or 6 things could be wrong here, and to know which it is, I will need to know the above answers.

Thanks.

Sorry, see below. Putting more RAM into the box is ok, but what if this is not enough later, and failed swap will fail it again ?

FreeNAS-9.10.1 (d989edd)
12GB RAM
Supermicro X7DVL Motherboard

Pool layout: zpool status
pool: zfsstor2
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Aug 29 12:23:10 2016
5.57T scanned out of 15.9T at 940M/s, 3h12m to go
76.7G resilvered, 35.00% done
config:

NAME STATE READ WRITE CKSUM
zfsstor2 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/ca6f53de-7172-11e4-81d2-0030488e255c ONLINE 0 0 0
gptid/cace091d-7172-11e4-81d2-0030488e255c ONLINE 0 0 0
gptid/cb30a9c4-7172-11e4-81d2-0030488e255c ONLINE 0 0 0
gptid/cb945b8f-7172-11e4-81d2-0030488e255c ONLINE 0 0 0
gptid/cbf7c9f6-7172-11e4-81d2-0030488e255c ONLINE 0 0 0
gptid/cc5c5552-7172-11e4-81d2-0030488e255c ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
gptid/cedcbf93-f8d3-11e5-9a4a-0030488e255c ONLINE 0 0 0
gptid/cfb95d40-f8d3-11e5-9a4a-0030488e255c ONLINE 0 0 0
gptid/d0a8a267-f8d3-11e5-9a4a-0030488e255c ONLINE 0 0 0
gptid/8c379ff9-6dd2-11e6-8f5c-0030488e255c ONLINE 0 0 0 (resilvering)
gptid/d2874d40-f8d3-11e5-9a4a-0030488e255c ONLINE 0 0 0
gptid/d369bd97-f8d3-11e5-9a4a-0030488e255c ONLINE 0 0 0
logs
gptid/f94f87f7-fd61-11e5-9a4a-0030488e255c ONLINE 0 0 0

errors: No known data errors

pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0h1m with 0 errors on Wed Aug 17 03:46:02 2016
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
gptid/cc05cdd6-a0a5-11e4-80fd-0030488e255c ONLINE 0 0 0

errors: No known data errors
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
This is interesting. If I'm understanding correctly, FreeNAS is picking one or more drives to use part of that as swap space. Shouldn't it be using space on the pool for swap, so a disk failure doesn't cause the loss of swap space?
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Swap space is striped across the raw disks (ie its a partition, and the member in the vdev is another partition). If swap is in use on a disk which fails, you have a significant chance of failure.

The solution is to balance things so swap is not used, preferably with help from the defaults, or not have swap on the pool drives.

Two good reasons to have swap on the pool drives are :

1) provides swap (which is sometimes needed) when you're booting form a device which is not really capable of having swap... say a USB drive.
2) provides a 2GB buffer on your pool drives. Sometimes a replacement drive is slightly smaller than the drive its replacing. Normally you'd be stuffed in this scenario, but with the swap you can shrink/remove the swap partition from the new drive, and thus replace the slightly larger previous drive.

The problem is if swap is actually in use, and a drive fails. Sortof defeats one of the purpose of a RAID, ie availability in the face of drive failure.
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
I'll be honest, this feels like a really big achilles heal for FreeNAS. Does this mean that we should be disabling swap?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
This is normal behavior to crash if you are using swap. The idea is you never use it. Basically your system is using lots of memory and when it asked for more it can't free it fast enough from the arc to give you what you requested. This cause parts of the os to get swapped. How many jails are you running? Turn them off and see if you still use swap. Either way you'll need more RAM.

Sent from my Nexus 5X using Tapatalk
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
9.10.0 seemed to have a propensity to use swap when an equivalent 9.3 system wouldn't, and anecdotally, I think this is resolved in 9.10.1

Which complicates the simple advice of "you're using swap, you must not have enough ram"

The issue is that when mounting a pool, in certain pathological scenarios, it might be necessary to have more ram than physically available. If your system runs out of ram, then you'd be hosed. Unless you have swap. With swap, the system can mount the pool, and resolve the situation which caused the problem.
 
Joined
Dec 2, 2015
Messages
730
My system, with 16GB RAM, running FreeNAS-9.10.1 (d989edd), currently has 5.1MB of swap used. I find it hard to believe that it can't find a way to get by without that 5.1 MB. I am tempted to buy another 16GB RAM, but I wouldn't be surprised to see it still use a few kB of swap.
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
My script in this thread will temporarily fix it.

I think my actual version of this script is using 'swapinfo' to build the list of devices with swap.

The idea is to only cycle the ones which have a non zero swap amount.
 
Joined
Dec 2, 2015
Messages
730
My script in this thread will temporarily fix it.

I think my actual version of this script is using 'swapinfo' to build the list of devices with swap.

The idea is to only cycle the ones which have a non zero swap amount.
I used your script, and swap stayed at zero for about 8 hours, including through a scrub. But, it's back up to 4.8 MB now.
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338

Lars Jensen

Explorer
Joined
Feb 5, 2013
Messages
63
FYI I had a disk failure on a FreeNAS 9.2.1.5 box which used 13 mb swap (the box has 48 GB RAM), which also resulted in django was killed/crashed when the disk failed. In this case django was able to restart.

Looking into it more it turns out FreeBSD is swapping idle memory by default, which pretty much fits the behaviour of django crashing when disk is failing since the GUI isn't used very often.
This also makes sense in general where people are reporting used swap even with large amounts of RAM.

In linux there's a setting swappiness where it's possible to change behavior of what is being swapped, setting to 0 means that it's only swapping if it's absolutely necessary. I don't think there's a swappiness feature in FreeBSD, but maybe another form of swap control exist ?
 
Joined
Dec 2, 2015
Messages
730
Issue 11617 is for a disk failure causing FreeNAS failure due loss of swap. Apparently FreeNAS 10 will mirror the swap so that a single drive failure can be accepted.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Issue 11617 is for a disk failure causing FreeNAS failure due loss of swap. Apparently FreeNAS 10 will mirror the swap so that a single drive failure can be accepted.
This is the reason I set up a 4GB swap file on the FreeNAS boot SSD and disable creating swap partitions on the hard drives. Obviously, this might not work very well if you're booting from a slow, feeble USB stick.

Arrant heresy, I know. But it's a free country! And anyway, FreeNAS hardly ever swaps at all! :)
 
Joined
Dec 2, 2015
Messages
730
This is the reason I set up a 4GB swap file on the FreeNAS boot SSD and disable creating swap partitions on the hard drives. Obviously, this might not work very well if you're booting from a slow, feeble USB stick.

Arrant heresy, I know. But it's a free country! And anyway, FreeNAS hardly ever swaps at all! :)
I've resisted getting a boot SSD, as mirrored USB sticks seemed reliable enough. But this swap silliness is the best argument yet for a boot SSD. I'll look for one.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
disable creating swap partitions on the hard drives
That's not a very good idea, since the swap is there, in part, to allow slightly smaller drives to be resilvered. Say 3TB-10KB instead of 3TB.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
That's not a very good idea, since the swap is there, in part, to allow slightly smaller drives to be resilvered. Say 3TB-10KB instead of 3TB.
I know... but I've checked current production drives from the usual sources (WD, HGST, Seagate) and they all have the same number of LBAs for a given size drive. So in my case, at least, this seems kinda like a solution in search of a problem... :smile:
 
Joined
Dec 2, 2015
Messages
730
That's not a very good idea, since the swap is there, in part, to allow slightly smaller drives to be resilvered. Say 3TB-10KB instead of 3TB.
Instead of using a swap partition to use part of the space, shouldn't it be possible to use a small empty partition?
 
Status
Not open for further replies.
Top