Intel Optane as SLOG for SSD ZFS Pool

alebeta · Oct 15, 2021

Hi friends,

I'm new in the community and with TrueNas too.

At the moment I want to build a storage server and I had been reading some blog posts of other friends telling their experience and so on, but I would like to clarify something that may be very simple.

I have in mind to create a ZFS storage pool using 8 x 3.84TB SAS 12Gbps 2.5 Solid State Drive and I would like to add a SLOG Device Intel OptaneSSD 905P Series 280GB, to increase security to Data Writes and increase the performance.

Now my doubt is, if my ZFS Pool is already SSD, the Intel Optane would really increase the performance? or I should not take it as a performance enhancer but just as security for my data?

Thanks in advance

Samuel Tai · Oct 15, 2021

As @jgreco is fond of saying, an SLOG is not a write cache. You'll likely not see any performance gain.

jgreco · Oct 15, 2021

SLOG is not a performance enhancer.

https://www.truenas.com/community/threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/

If you are doing something like databases or VM's where the filer vanishes and takes some committed transactions with it, while leaving a hypervisor running, then you should have sync writes enabled. Sync writes are notoriously slow on ZFS, so there is a mechanism to speed that up, which is SLOG. However, this is NEVER faster than just turning off sync writes and letting the pool do its thing normally.

ZFS async writes are *always* faster than SLOG writes because ZFS uses your system memory for its write cache, and it can allocate many gigabytes for the purpose. This means that we are comparing fast writes to RAM (async) to slow writes to Optane (SLOG). And of course this is where someone chimes in with "but Optane is so fast", but it ain't nothin' compared to DRAM.

So if you need sync writes, the pool drives on their own MIGHT be fast enough to support ZIL-in-pool, you can always try it and see if it's fine for you. If not, then you can always add a SLOG device.

But give careful consideration to whether you need any of this at all.

jgreco · Oct 15, 2021

Samuel Tai said:
As @jgreco is fond of saying, an SLOG is not a write cache. You'll likely not see any performance gain.

@Samuel Tai wins today's award for beating my wordy response with a short summary by 30 seconds.

alebeta · Oct 15, 2021

Hi @Samuel Tai @jgreco

thanks for the comments, and yes I'm writing here because I want to understand if the SLOG is needed at all or not in my use case.

This storage server will be serving as Data Store for Virtual Machines disks.

I guess what you said

If you are doing something like databases or VM's where the filer vanishes and takes some committed transactions with it, while leaving a hypervisor running, then you should have sync writes enabled. Sync writes are notoriously slow on ZFS, so there is a mechanism to speed that up, which is SLOG. However, this is NEVER faster than just turning off sync writes and letting the pool do its thing normally.

is my case, and then I should enable sync writes. is that right?

thanks a lot

jgreco · Oct 15, 2021

Perhaps. If you are already using SSD for the pool and you set your disks up as mirrors (you almost certainly do NOT want RAIDZ), you may still get very acceptable -- not lightning-fast -- performance without a SLOG.

Make sure you review

https://www.truenas.com/community/threads/the-path-to-success-for-block-storage.81165/

before setting up a pool for VM storage.

I've been encouraging people to try "without" SLOG in recent years simply because people tend to get hung up on the issue and spend lots of wasted money. If in-pool ZIL is too slow, then, yes, you've discovered you need SLOG, and Optane is probably the way to go there.

alebeta · Oct 15, 2021

Thanks @jgreco

thanks, that's what I wanted to hear about if the SLOG device is a performance solution or not.
On the other hand, it seems to be a good option to protect data writes in case of a failure, for VM Storage it would make sense?

Regarding the RAID-Z honestly, I was thinking to go to Raid-z3 so I can have more space, mirroring I will end up with half the space of usable capacity.

I will give a read to the link you reffer

thanks a lot

alebeta · Oct 15, 2021

And just to give a little bit more information. I have in mind to use iSCSI to connect the storage to the hypervisor cluster. But am just in the planning phase, it can change of course and I see there are many things to take into consideration

Samuel Tai · Oct 15, 2021

You definitely don't want RAIDZx as a backing store for VMs, as you won't get the IOPS you'll need. Mirrors are the appropriate architecture in this case.

sretalla · Oct 15, 2021

Samuel Tai said:
You definitely don't want RAIDZx as a backing store for VMs, as you won't get the IOPS you'll need

Absolutely true for HDD pools... not so sure that's the case here with an all-flash pool.

If the IOPS of one SSD would be sufficient for the VMs (possibly true with an SSD offerring 100K+ IOPS), then the capacity/cost argument would drive somebody to go toward RAIDZ. (of course understanding that the price tradeoff is at the expense of IOPS they could be getting from their hardware if using mirrors, giving up some capacity)

The question really is, "what's the requirement for IOPS on the VMs from the hypervisor?"... and "how many VDEVs do I need to provide that many IOPS?". If the answer is 1, then RAIDZ is possibly good.

Samuel Tai · Oct 15, 2021

@sretalla, you have to account for the worst-case scenario, where the SSD is constantly garbage collecting. A wide mirror pool will do better in this case.

sretalla · Oct 15, 2021

Samuel Tai said:
you have to account for the worst-case scenario, where the SSD is constantly garbage collecting. A wide mirror pool will do better in this case.

Fair point.

Maybe turning off TRIM could mitigate that, but would bring other issues... anyway, just to say that the same problems for HDDs may not apply to an all-SSD pool.

jgreco · Oct 15, 2021

sretalla said:
Absolutely true for HDD pools... not so sure that's the case here with an all-flash pool.

If the IOPS of one SSD would be sufficient for the VMs (possibly true with an SSD offerring 100K+ IOPS), then the capacity/cost argument would drive somebody to go toward RAIDZ. (of course understanding that the price tradeoff is at the expense of IOPS they could be getting from their hardware if using mirrors, giving up some capacity)

The question really is, "what's the requirement for IOPS on the VMs from the hypervisor?"... and "how many VDEVs do I need to provide that many IOPS?". If the answer is 1, then RAIDZ is possibly good.

The problem with RAIDZ isn't just IOPS, it is space allocation. The manner in which space is allocated on RAIDZ is ... um, well, optimized for sequential file data. If you pick the wrong design, which is, as far as I can tell, MOST of them, you end up with space inefficiencies burning up lots of space.

This is especially hurty when you use a larger ashift (such as the 12 which IIRC is current default) because the recordsize or volblocksize tends not to fit efficiently into the allocations, and there are just some really weird interactions that make my head hurt.

If we approach this from one of ZFS's classic examples of space amplification: if you have an ashift=12 (4K sector size) and you're using a volblocksize of 4K, if you do RAIDZ3, you need three parity blocks to protect that one 4K "volblock", and end up using 16K of space (one data, three parity).

But wait! It's even better. With RAIDZ, ZFS always allocates an even number of blocks, adding "padding" blocks to make sure that a block contains an even number, so you can also create pathological conditions where space is wasted on RAIDZ with small record/volblocksize on RAIDZ2 as well.

Now, you can INCREASE the record/volblocksize, and these problems are reduced, absolutely true. But this increases write amplification to an SSD, because you're writing unchanged stuff back out to the SSD.

Reducing to ashift=9 reduces the scope of the allocation inefficiency but creates other problems. Using mirrors makes it possible to use ashift=12 AND a volblocksize=4K and it all works optimally. Your only real chance with ashift=12 RAIDZ is to design your pool with a moderate volblocksize that is designed to cooperate with the RAIDZ structure to deliver optimal efficiency; I leave this as an exercise for the reader.

Basically the contortions here make my brain hurt so I don't like thinking of this and now my head hurts and now I hate you all.

sretalla · Oct 15, 2021

jgreco said:
and now I hate you all.

And that's the way we like it :)

Thanks for the excellent feedback and explanation.

alebeta · Oct 15, 2021

Hi @jgreco

thanks for the information, it was all beyond any expectation and eye-opening.

I realize that iSCSI may not be the smartest way to go with and that CIFS and NFS are better choices thanks to https://www.ixsystems.com/community...quires-more-resources-for-the-same-result.41/

I read the answer to @sretalla question about what is the case for SSD? and I understand it can be possible to reach a decent performance with RAIDZ2/3 but it would need tunning of the volblocksize and shift parameters and a very specific disk array setup. it is right?

Then the best option for the SSD pool would be to go with the mirror option. But something you said makes me more doubtful "you may still get very acceptable -- not lightning-fast -- performance without a SLOG", so why it is not lightning-fast ? and what would be lightning-fast?

Another thing I would like to clarify is if I would go with HDD instead of SSD for the ZFS pool, and use SSD or NVMe for L2ARC it would bring me to a speed near SSD pool storage? because it might be a solution that provides better price + performance. but it may be a silly question, but I just want to ask to be sure about it.

thanks

jgreco · Oct 15, 2021

Most of this is covered in

The path to success for block storage

It seems like I haven't written a sticky for awhile, but just in the last week I've had to cover this topic several times. ZFS does two different things very well. One is storage of large sequentially-written files, such as archives, logs, or data files, where the file does not have the middle...

www.truenas.com

If your VM environment is typical, i.e. you have a bunch of let's say 50GB VM's, but on average only 1-2GB of each one is frequently accessed, this means you have 45GB+ of "sleepy" data on each VM that is infrequently-to-never accessed. This is ideally stored on slow cheap HDD. If you can get your frequently accessed stuff to reside in ARC/L2ARC, you can reach the glorious state where most or all of your pool reads happen from ARC/L2ARC and NOT from the pool. At that point, your practical observed read performance should be SSD-or-better, even though if you go and try to read some random data off a disk that isn't in the ARC/L2ARC cache, it will have to go out to disk to get it.

alebeta said:
Then the best option for the SSD pool would be to go with the mirror option. But something you said makes me more doubtful "you may still get very acceptable -- not lightning-fast -- performance without a SLOG", so why it is not lightning-fast ? and what would be lightning-fast?

If you have an SSD pool and you omit a SLOG, your sync writes will be committed to the in-pool ZIL. You might think "but that's SSD so it's fast" but it is going through the pool write stack, which isn't really designed for maximum speed for this sort of thing. It's designed for accessing the pool. By way of comparison, the SLOG write channel is optimized for the low latency SLOG task, and is much faster. Remember, small amounts of additional latency in processes can have an outsized effect on performance.

alebeta · Oct 15, 2021

Regarding the second part of your last post.

If I go with SSD pool, for a VM data store, it is better to go with SLOG device + sync write enables. As far as I understand it is important to use sync writes when storing VMs, so it can safely write the data in the pool. Sync Write enabled, will kill performance, but in this case, the SLOG would help not to kill the performance so much, is that right? But if one would use the async write from ZFS, which is store in RAM, and is not recommended for VMs a SLOG device won't be advisable.

How bad would be to use async for the VM? if there any data regarding this use case? or better I do not ask?

I'm repeating a little bit the overall conversation just to check if I understood the basics a little bit.

Thanks a lot for the advice

NugentS · Oct 16, 2021

Using Async is fine - until something goes wrong.
If its a test environment or a home lab - then it probably doesn't matter - but if its production then you could lose many seconds of written data that the hypervisor thinks is written but is in fact still in RAM. Just imagine what that can do to a VMDK

alebeta · Oct 17, 2021

Thanks for all the great advice,

To complement this thread I found this one from a couple of months ago, I think it helps me to understand more about VMs data store with TrueNAS.

I could see some familiar faces from this thread

Help! build for VM production storage

Hi. I have a dell PE T430 with 128GB ram and PERC H730p on HBA MODE with 8x 3.84TB intel SSD DC S4500. 2x 10Gbpe network interfaces. I intend to get 2x SSDs 120gb for mirror boot and connect directly on sata ports on motherboard. My real question is about write cache, should i get a pcie...

www.truenas.com

alebeta · Oct 18, 2021

this would be an ok HBA controller? UCSC-SAS12GHBA. Any comments about it? it would come with the possible UCS C-series server am planning to get.

thanks

Important Announcement for the TrueNAS Community.

Intel Optane as SLOG for SSD ZFS Pool

Dabbler

Never underestimate your own stupidity

Resident Grinch

Resident Grinch

Dabbler

Resident Grinch

Dabbler

Dabbler

Never underestimate your own stupidity

Powered by Neutrality

Never underestimate your own stupidity

Powered by Neutrality

Resident Grinch

Powered by Neutrality

Dabbler

Resident Grinch

Dabbler

MVP

Dabbler

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Intel Optane as SLOG for SSD ZFS Pool"

Similar threads