L2 ARC Summary: (DEGRADED)

Status
Not open for further replies.

subzer011

Dabbler
Joined
Mar 11, 2014
Messages
35
hi!

ive spent a good amount of time trying to resolve this issue. ive scoured for a solution and ive waited patch after patch to see if it's gotten resolved.

my issue is that the L2Arc appears to go into a degraded state after a while, normally when it's close to 100% warmed up. i'm using 2 MLC SSDs, but the issue also occurred when i only had one SSD

i've tried the following with no luck...
  • moving the pool to a fresh new FreeNAS install
  • export/importing the pool several times
  • manually selecting "no encryption" at the root level of the pool
  • wiping both SSD cache drives and re-adding them to the pool via web gui
  • disabling and enabling autotune
  • swapping the motherboard
  • running extended SMART tests on all disks - no issues found
  • running scrub - no issues found
  • manually setting a partition size for each SSD and adding those partitions as cache
  • tried different SSDs (Crucial, Samsung, OCZ)
My pool has 38TB of data striped across 4 Raid1Z's. 24GB of memory (maximum allowed on mobo) and a first gen i7 processor. the hard disks are connect to 3 IBM Serveraid M1015 controllers via backplane on a norco RPC-4224 case. the M1015s have been flashed with the LSI9211-IT firmware. the two SSDs are 256GB each. I'm using FreeNAS-9.2.1.2-RELEASE-x64 (002022c) but the issue's occurred since at least 9.1, when i first took notice.

the following 2 links appear to be related....
https://bugs.freenas.org/issues/3418
http://lists.freebsd.org/pipermail/freebsd-current/2013-October/045786.html

both links point the blame to l2arc compression. the L2Arc returns to healthy state after a reboot.

here's the L2ARC output from arc_summary.py

L2 ARC Summary: (DEGRADED)
Passed Headroom: 169.88k
Tried Lock Failures: 658.46m
IO In Progress: 6.26k
Low Memory Aborts: 132
Free on Write: 2.87m
Writes While Full: 1.24k
R/W Clashes: 357
Bad Checksums: 231.26k
IO Errors: 21.36k
SPA Mismatch: 0
L2 ARC Size: (Adaptive) 631.50 GiB
Header Size: 2.37% 14.97 GiB
L2 ARC Evicts:
Lock Retries: 4.86k
Upon Reading: 0
L2 ARC Breakdown: 157.17m
Hit Ratio: 0.49% 768.89k
Miss Ratio: 99.51% 156.40m
Feeds: 42.11k
L2 ARC Buffer:
Bytes Scanned: 145.84 TiB
Buffer Iterations: 42.11k
List Iterations: 2.68m
NULL List Iterations: 583.15k
L2 ARC Writes:
Writes Sent: 100.00% 40.74k

any pointers? thanks in advance!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, for starters your L2ARC should never exceed about 5x your RAM.. you've got 4x that... As a general rule using L2ARCs with <64GB of RAM is just waste of time because the total allocatable size is too small for most people.
 

subzer011

Dabbler
Joined
Mar 11, 2014
Messages
35
so i should just remove the SSDs from the pool until i can upgrade to another mobo with 64+ GB of RAM?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
so i should just remove the SSDs from the pool until i can upgrade to another mobo with 64+ GB of RAM?

That's one thing I'd do.. but realize you're going to hurt performance since those L2ARCs are probably helping somewhat until they are kicked from the pool.

This just begs the question of "what else have you made mistakes with that is costing you performance". And let me tell you, that list can be quite long. :(

And the reason for the disks being kicked from the pool is NOT because of insufficient RAM(at least, I've never heard of it causing that before..). So you actually have some kind of problem on top of the RAM problem.
 

eraser

Contributor
Joined
Jan 4, 2013
Messages
147
Are you still having this problem after upgrading to 9.2.1.4 or later?
 

subzer011

Dabbler
Joined
Mar 11, 2014
Messages
35
i'm on FreeNAS-9.2.1.6-RC2 and yes, i'm still having this problem. seems like when the l2arc gets full, the l2arc goes into degraded state. tested RAM and found no issues. i put in 2 more SSD drives as l2arc and it seems like it prolongs the performance loss and degraded state by a few days. with 3 SSDs, i reboot freenas once a week instead of once a day.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Thanks for the update. I also noticed this problem. I have opened issue #5347 [ https://bugs.freenas.org/issues/5347 ] about this. I will link this thread to the bug. Feel free to update the bug with your own observations if you want to.

Yes, but it's MUCH more likely a hardware problem than anything else. Can you post the full output of smartctl -x /dev/XXX? Note that crucial has typically not given any useful SMART data so I'm somewhat expecting the output to not be particularly useful with proving the drive is bad.
 

subzer011

Dabbler
Joined
Mar 11, 2014
Messages
35
degraded after a few days even with FreeNAS-9.2.1.6-RELEASE-x64. i'm using 2 toshiba SSDs and 1 crucial as L2ARC. none of the SSDs report any SMART errors. though since you mention it, i'm gonna try removing the Crucial SSD from the system entirely. i also noted that eraser was also using a Crucial SSD in his setup according to his notes in the bug report.
 

subzer011

Dabbler
Joined
Mar 11, 2014
Messages
35
eraser, do you have an alternate SSD at your disposal? my issue appears to have gone away. looks like it was the Crucial drive, despite no SMART errors coming from it. i've been running on a fully warmed up l2arc for a few days now using 2 toshiba SSDs. no IO errors and no degraded state have been reported yet. but i'll update this thread if anything changes.
 

eraser

Contributor
Joined
Jan 4, 2013
Messages
147
Let me see if I can track down a spare SSD. I don't have one at home or work, but maybe I can find one somewhere else.
 

subzer011

Dabbler
Joined
Mar 11, 2014
Messages
35
still seeing this issue with the latest freenas release, FreeNAS-9.3-STABLE-201512121950.

the ssd i'm using is a brand new Samsung PRO 512GB (2 days old). i've gone through 3 SSDs now from different brands thinking it's been hardware issue. i've tried replacing SATA cable and i've gone through another motherboard replacement and a pool rebuild since my initial post. system has 128GB ECC memory with XEON E5-2650 v3 CPU. I've tried enabling/disabling autotune. after a few days, the l2arc goes into degraded state.

i've been using freenas since version 8. the only thing i can think of that might be off is the config file that i've been using since 8.

Found these 2 bug issues but neither showed a solution...

https://bugs.freenas.org/issues/3418
https://bugs.freenas.org/issues/5347
 
Status
Not open for further replies.
Top