changing L2ARC drives and SLOG drive

doesnotcompute · Jul 11, 2016

hi,

we're running FreeNAS-9.2.1.7-RELEASE-x64

what began as a pilot test has been working well, so as often happens more writers are using this (nfs mostly 1 cifs, no iscsi) and the original drives we used for some of our roles (slog, l2arc) we'd like to change.

namely:

* slog is currently a 160GB intel s3500 SSD.
* L2ARC is striped across a few crucial mlc SSDs

I'd like to retire the multiple crucials** and simply have 1 L2ARC (the current slog drive: the 160GB s3500) as it would seem to be plenty big based on historical usage (this is a heavy write filer, very few common reads, just confirming writes really).

I'd like to put in an intel s3710 for our new SLOG.

scheduled maintenance is fine, ideally we'd stop nfs and dismount what needs doing and remove the current L2ARC drives, remove the slog, retask slog to be new L2ARC and insert new SLOG and assign it that role. Have you done this? any steps/guides you can share?

** also, our filer locked up and upon boot i saw a warning fly by in the messages about this crucial drive(s) possibly/likely (?) locking up every 51xx hours. they're spare crap drives, i'd rather remove them from this system which has become more important to us.

also, regarding updating firmware of the s3500 after it's no longer SLOG and before it's the new L2ARC - easy to do in freenas?, or should i just pull it from the chassis during this window and update via usb/sata connector on my laptop while at datacenter?

in case it matters:

'head' node is a dell r710 server, original Perc removed -- an IT firmware flashed IBM 105 HBA connects to dell server backplane, all these 2.5" drives are in hot swap trays. OS drive is another 160GB s3500. actual ZFS drives are all in external supermicro JBOD, connected via a 2nd HBA in our 'head', an LSI 9200e with dual 8088 out the back, reflashed to IT mode, no boot bios. we have four x 6 drive raidz2 vdevs in our one vol

Robert Trevellyan · Jul 11, 2016

doesnotcompute said:
scheduled maintenance is fine, ideally we'd stop nfs and dismount what needs doing and remove the current L2ARC drives, remove the slog, retask slog to be new L2ARC and insert new SLOG and assign it that role. Have you done this? any steps/guides you can share?

http://doc.freenas.org/9.3/freenas_storage.html#removing-a-log-or-cache-device
That's for 9.3, but it appears unchanged since 9.2.1.

doesnotcompute · Jul 11, 2016

thank you for the link.

ok, so assuming i can stop my nfs/cifs services, i shouldn't have any client data on my slog.

about the zil/slog not being redundant, i'm pretty sure I read 9.2 was ok without slog redundancy, ie: a loss of slog wouldn't kill the whole vol, but i'm trying to verify and this:

zpool upgrade -v|more from Shell.

isn't working in my shell (even without | more)

seems i'd want to:

* maint window, shut down nfs/cifs
* insert new slog disk
* drill down to current slog disk, click replace, then choose the new disk (will it be ok that the new one is smaller than the old one?)
* same process/question for the l2arc crucial drives, except i want to remove 3 and replace 1 with the newly freed up intel 3500

jgreco · Jul 11, 2016

You don't need to stop any services. You can attach and detach L2ARC and SLOG devices hot.

The only time there's no "client data" on your SLOG is when it's brand new.

It's been years since you needed redundant SLOG; that was fixed long ago. Failure of a SLOG device will simply result in ZFS using the in-pool ZIL.

doesnotcompute · Jul 12, 2016

"Failure of a SLOG device will simply result in ZFS using the in-pool ZIL." - ok that matches what i read in the 9.2 docs as well.

thanks.

tvsjr · Jul 15, 2016

jgreco said:
It's been years since you needed redundant SLOG; that was fixed long ago. Failure of a SLOG device will simply result in ZFS using the in-pool ZIL.

@jgreco can you expand on that a bit? The last guidance I saw indicated redundant SLOGs were crucial as a failure of a single drive would cause the loss of whatever hadn't been sent from the SLOG to disk.

I've got a pair of S3700s mirrored for SLOG now (for my pool providing NFS services to VMware). If safe to do so, I'd love to break that mirror and make one SLOG and the other L2ARC.

jgreco · Jul 15, 2016

tvsjr said:
@jgreco can you expand on that a bit? The last guidance I saw indicated redundant SLOGs were crucial

Sure. You are now given this guidance: "this is no longer crucial and for all recent versions of ZFS, the pool can survive loss of SLOG."

(telephone lady voice:) "Please make a note of it."

as a failure of a single drive would cause the loss of whatever hadn't been sent from the SLOG to disk.

During normal operations, nothing is EVER sent from the SLOG to disk (and never has been); this is a fundamental misunderstanding of what SLOG is.

https://forums.freenas.org/index.php?threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/

The purpose of the SLOG is to provide a place for sync writes to be committed quickly. If a SLOG device fails, writes continue to be written to the in-pool ZIL, which is slow.

If your system panicks at the same time as your nonredundant SLOG device fails, then you will lose some information from the transaction group that is sitting in memory and has not yet been committed to disk. This will probably happen if you, for example, begin pounding on your NAS with a hammer. This is ... unlikely ... but possible.

tvsjr · Jul 15, 2016

jgreco said:
Sure. You are now given this guidance: "this is no longer crucial and for all recent versions of ZFS, the pool can survive loss of SLOG."

(telephone lady voice:) "Please make a note of it."

LOL, so noted.

If your system panicks at the same time as your nonredundant SLOG device fails, then you will lose some information from the transaction group that is sitting in memory and has not yet been committed to disk. This will probably happen if you, for example, begin pounding on your NAS with a hammer. This is ... unlikely ... but possible.

Fortunately, my new home has a closet dedicated for all the server gear (about 1,100 watts in a 4x8 closet... that was... fun?... to climate control). Locks have been installed to keep the hammer-wielding gnomes away.

Once I get back home, I'll break that mirror and make one drive SLOG and the other L2ARC. 200GB of L2ARC on top of 128GB RAM should be nice.

Mirfster · Jul 15, 2016

tvsjr said:
200GB of L2ARC on top of 128GB RAM should be nice.

I *think* the ratio is more along the lines of 1 to 4... But don't quote me on it....

tvsjr · Jul 15, 2016

Mirfster said:
I *think* the ratio is more along the lines of 1 to 4... But don't quote me on it....

My understanding is 4:1 L2ARC:ARC is a not-to-exceed value, due to pressure on the ARC metadata. I'd be at 1.67:1 so I should be good, yes? Being closer to 4:1 would give me more usable cache without impacting the metadata too heavily, but I don't have an 800GB drive sitting around.

Mirfster · Jul 15, 2016

There are other variables, aside from RAM that play into it; however I am not qualified to really make suggestions. I will defer to Mr. Grinchy and take notes as well. ;)

tvsjr · Jul 15, 2016

Mirfster said:
There are other variables, aside from RAM that play into it; however I am not qualified to really make suggestions. I will defer to Mr. Grinchy and take notes as well. ;)

I'm definitely no ZFS whisperer myself - this is my first production freenas box, so I've got plenty to learn.

I'd like a handful of big NVMe drives for my VM pool, which would make all of this discussion silly.

Or I could just max this box on ram.. A TB would make L2ARC unnecessary. So many toys, so little money!

jgreco · Jul 15, 2016

tvsjr said:
My understanding is 4:1 L2ARC:ARC is a not-to-exceed value, due to pressure on the ARC metadata. I'd be at 1.67:1 so I should be good, yes? Being closer to 4:1 would give me more usable cache without impacting the metadata too heavily, but I don't have an 800GB drive sitting around.

This is all tied up in what your block sizes are, plus some other things. 4:1 is expected to be generally safe unless you're trying to do something stupid like small block sizes (512, 1K, 2K) on an ashift=9 pool. That starts eating up huge amounts of L2ARC headers because you've got so many entries. There isn't a bright line.

Running iSCSI with whatever the default block size was, I can tell you that the 128GB filer here was not having any problem with 768GB-1TB of L2ARC (8:1). Changes coming to ZFS that will compress ARC should also result in a reduced L2ARC header size that translates to lots more in L2ARC. I'm not up to date on where exactly that's at.

Mirfster · Jul 15, 2016

I guess the next question would be is there a size that is too small for L2ARC in which it would degrade performance?
Like is having a 1:1 is bad/useless, 2:1 is okay, 3:1 is better, 4:1 is the sweet spot, 5:1 is too much...
Maybe that is over-simplifying?

tvsjr · Jul 15, 2016

Mirfster said:
I guess the next question would be is there a size that is too small for L2ARC in which it would degrade performance?
Like is having a 1:1 is bad/useless, 2:1 is okay, 3:1 is better, 4:1 is the sweet spot, 5:1 is too much...
Maybe that is over-simplifying?

Ha... simple. You so funny. :p

Ericloewe · Jul 16, 2016

Small L2ARC shouldn't degrade performance relative to large L2ARC, it'd simply benefit from more.
At least that's my understanding.

jgreco · Jul 16, 2016

Mirfster said:
I guess the next question would be is there a size that is too small for L2ARC in which it would degrade performance?
Like is having a 1:1 is bad/useless, 2:1 is okay, 3:1 is better, 4:1 is the sweet spot, 5:1 is too much...
Maybe that is over-simplifying?

For the average pool, the benefits for L2ARC are much more closely tied with how *busy* the pool is and how much of the "working set" you can fit into it.

If you have a home user who is only making relatively light use of the NAS, L2ARC isn't usually of value because the slight additional latency involved in getting a block from the pool versus from L2ARC is difficult to notice. If you're on a departmental fileserver that's heavily accessed, however, with lots of heavy read and write activity, the L2ARC may accelerate both reads and writes: common reads are fulfilled from the L2ARC, which in turn increases the capacity of the pool to handle writes (and non-L2ARC reads).

Also, the ARC and L2ARC do not work well with totally random accesses. If you had 8TB of data in a pool and 1TB of L2ARC, and all data was being randomly accessed, you have maybe a 1 in 8 chance of "lucking out" and the system is also going to be working very hard to reorganize L2ARC more appropriately (which turns out to be a Sisyphean task). However, if you have 8TB of data and only 100GB of it is accessed on a frequent basis, you might find that 64GB of RAM plus a 64GB L2ARC works out really well.

However, the flip side here is that L2ARC isn't magic. You also need sufficient ARC in order to allow ZFS to identify that a block is being accessed frequently. So for the 100GB example I just gave, you can't just plop 16GB of RAM in a system and a 128GB L2ARC and say "oh now this will be SOOOOO much faster."

This is why the theme of adding as much ARC as you reasonably can *first* is important.

Adding L2ARC beyond the size of your working set isn't particularly useful, but the definition of "working set" is somewhat flexible. It could be as infrequently as data being accessed once per hour or even once per day, though I'd suggest that would be more useful in a VM storage environment than typical file storage use. There's a point at which you see the fill rate to the L2ARC start to level off, and that's probably the "big enough" point.

Mirfster · Jul 16, 2016

Thanks for shedding light on this. I know that there is not a direct answer, but personally this does help me.

Stux · Jul 16, 2016

While we're discussing L2ARC/SLOG

Why is partitioning a single NVMe device into a small SLOG and a large L2ARC discouraged?

tvsjr · Jul 16, 2016

Stux said:
While we're discussing L2ARC/SLOG

Why is partitioning a single NVMe device into a small SLOG and a large L2ARC discouraged?

My understanding is two reasons:
First, you want SLOG to be uber-fast but it doesn't need to be huge. L2arc needs to be fast, but not as ridiculously fast as SLOG. So you would be burning a large, uber-fast drive to make this work. Not cost efficient. You would also be introducing some contention for the SLOG.
Second, SLOG is hard on an SSD. A small partition allows for the internal firmware to do lots of wear leveling. You also should select a data center grade drive that is rated for many writes.

If you have the budget for a top-end Intel NVMe SSD that is rated for lots of write activity (my S3700s are rated for 10 drive writes per day for 5 years) then I would think it would work.

Note that not all NVMe drives have the write endurance. The Samsung 950 is rated well below 1 drive write per day.

Jgreco can tell me how much of this I got wrong

Important Announcement for the TrueNAS Community.

changing L2ARC drives and SLOG drive

Dabbler

Pony Wrangler

Dabbler

Resident Grinch

Dabbler

Guru

Resident Grinch

Guru

Doesn't know what he's talking about

Guru

Doesn't know what he's talking about

Guru

Resident Grinch

Doesn't know what he's talking about

Guru

Server Wrangler

Resident Grinch

Doesn't know what he's talking about

MVP

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "changing L2ARC drives and SLOG drive"

Similar threads