System hangs for 5 minutes at boot with L2ARC SSD installed

Status
Not open for further replies.

Morpheus187

Explorer
Joined
Mar 11, 2016
Messages
61
Hi There

I just upgraded my LSI3008 onboard controller to P14 and the system failed to boot ( that was the first assumption which was wrong ) with a dtrace ( see screenshot attached )

I then downgraded again, same problem. So I investigated a bit more and the only other change was adding a L2ARC SSD ( Intel 240 GB ). So I unplugged the SSD and the system booted up fine, the error message still appears but the system continues to boot up. After that I re upgraded the LSI3008 firmware to the latest version ( as this clearly isn't the problem )

The pool status was then as follows
Code:
[root@freenas] ~# zpool status
  pool: VOL1
 state: ONLINE
status: One or more devices could not be opened.  Sufficient replicas exist for
  the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
  see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub repaired 0 in 19h2m with 0 errors on Wed May 17 08:34:39 2017
config:

  NAME  STATE  READ WRITE CKSUM
  VOL1  ONLINE  0  0  0
  raidz2-0  ONLINE  0  0  0
  gptid/a672a7f0-e4b4-11e5-88a3-d05099c0ab52  ONLINE  0  0  0
  gptid/a7922fc8-e4b4-11e5-88a3-d05099c0ab52  ONLINE  0  0  0
  gptid/a8ae08fc-e4b4-11e5-88a3-d05099c0ab52  ONLINE  0  0  0
  gptid/a9d6aedf-e4b4-11e5-88a3-d05099c0ab52  ONLINE  0  0  0
  gptid/aaf0e30b-e4b4-11e5-88a3-d05099c0ab52  ONLINE  0  0  0
  gptid/ac10aef6-e4b4-11e5-88a3-d05099c0ab52  ONLINE  0  0  0
  gptid/ad2ed461-e4b4-11e5-88a3-d05099c0ab52  ONLINE  0  0  0
  gptid/ae52864d-e4b4-11e5-88a3-d05099c0ab52  ONLINE  0  0  0
  cache
  8245123297437606026  UNAVAIL  0  0  0  was /dev/gptid/73d36011-4217-11e7-8c81-0cc47ac9aafe

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: none requested
config:

  NAME  STATE  READ WRITE CKSUM
  freenas-boot  ONLINE  0  0  0
  da9p2  ONLINE  0  0  0

errors: No known data errors



This is as expected as I removed the SSD which caused the problem.

Strange notices: When I added the SSD 3 days ago, I enabled L2ARC and the L2ARC size grew to 980 GB which puzzled me a bit because the SSD has only 240 GB of space available.

I then did the following
- I removed the cache in the FreeNAS gui, shut down the system and reconnected the SSD -> System booted up again.
- I then readded the SSD as cache device via the gui and rebooted -> System did not boot up
- I decided to wait and give it 5 minutes and.. it continued to boot up!


Conclusion so far:
- The systems hangs for about 5 minutes at the position depicted in the screenshot when the L2ARC SSD is installed AND added to the pool as cache device.
- As I'm inpatient as fu** I first concluded the system did not boot at all, this conclusion was wrong, learning -> give it a few more minutes next time.

Maybe someone has an ideas what could cause this issue. It's not a deal braking issue but just a bit annoying.

Relevant system information:
FreeNAS-9.10.2-U4 (27ae72978)

Installed SSD
Code:
ada0 at ahcich2 bus 0 scbus3 target 0 lun 0
ada0: <INTEL SSDSC2CW240A3 400i> ACS-2 ATA SATA 3.x device
ada0: Serial Number CVCV240200A8240CGN
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 228936MB (468862128 512 byte sectors)
ada0: Previously was known as ad8


Partitions on the SSD
Code:
[root@freenas] ~# gpart show ada0
=>  34  468862061  ada0  GPT  (224G)
  34  94  - free -  (47K)
  128  468861960  1  freebsd-zfs  (224G)
  468862088  7  - free -  (3.5K)



noboot.png
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
What are the specs of your system?

Use of an L2ARC isn't needed in your standard set-up. Typically you only want to add one when you've maxed out the RAM on the motherboard and you have an extremely low ARC hit ratio. Adding a L2ARC uses RAM as pointers will be stored there.

I also noticed in the screenshot, you have a ton of sysctl's being set. Did you enable autotune? If you did, just disable it, delete all the sysctl's it created and reboot. Autotune is only ever needed on very large systems.
 

Morpheus187

Explorer
Joined
Mar 11, 2016
Messages
61
Hi thanks for the reply

This is my actual system:
Case: 825tq-r720lpb
Board: X11SSH-CTF
CPU: Intel Xeon E3-1245 V5
RAM: 64GB ECC DDR4 UDIMM
HDD: 8x6TB WD RED
Cache: 1x240 GB old INTEL SSD
Network: 2x10Gbit RJ45 ( currently only connected at 1 GBit )

Ram is already maxed out and I've encountered ARC hit rates around 50% so I decided to give it a try and install an L2ARC SSD, with that installed the ARC hit rate stabilized at around 45% and L2ARC slowly catching up, reaching 50%. So I think it had a positive impact on the system.

Yes I have autotune enabled, but I will disabled it for further testing.
 
Last edited:

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
A hit ratio of 50% is actually very good. I don't see a reason to use L2ARC. If your hit ratio was way lower, ~10% then you probably need it.
 

Morpheus187

Explorer
Joined
Mar 11, 2016
Messages
61
I understand that removing the L2ARC SSD solves the "problem" but it's still strange that adding a L2ARC SSD makes the system hang at boot for about 5 minutes.

I suspect it has something to do with the dtrace error, I googled it a bit and it seems to have something to do with UEFI boot which is what I'm using. I will keep monitoring the issue and maybe I'll come up with a solution.
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
I understand that removing the L2ARC SSD solves the "problem" but it's still strange that adding a L2ARC SSD makes the system hang at boot for about 5 minutes.

I suspect it has something to do with the dtrace error, I googled it a bit and it seems to have something to do with UEFI boot which is what I'm using. I will keep monitoring the issue and maybe I'll come up with a solution.
You're making this more complicated than it needs to be. The problem is you used Autotune and a L2ARC when it isn't required. Removing both fixes the issue. You really don't need a L2ARC device.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
A hit ratio of 50% is actually very good.
Quoted for truth. @Morpheus187, if you're used to earlier versions of FreeNAS, you're probably used to seeing artificially large ARC hit ratios. A fix was implemented (in 9.10, IIRC) to give more realistic numbers. A ratio of 50% means that literally half of the data requested from your NAS is served out of cache. That isn't bad at all.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Disable autotune and remove the tunables that it sets.
 

Morpheus187

Explorer
Joined
Mar 11, 2016
Messages
61
I removed autotune and disabled all tunables that it sets, the 5 Minute delay during boot still exists: See screenshot.


slow-boot.png
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
What is this dtrace thing? Did you turn it on? Turn it off if you did. Also try making a new boot usb and bit that just too see if it's the install breaking.

Sent from my Nexus 5X using Tapatalk
 

Morpheus187

Explorer
Joined
Mar 11, 2016
Messages
61
No that strange dtrace error was there from the beginning, it has something to do with freebsd and uefi boot I guess. I will investigate further later on, thanks for the help so far.
 
Status
Not open for further replies.
Top