Slog useful for SMB traffic?????

ragametal · Sep 10, 2021

I’m new to Truenas and I’m a bit confused as to whether i will benefit from a SLOG at all in my configuration.

My understanding is that data writes to the data pool can either be synchronous (slower but provides data integrity and protection) or asynchronous (faster but without data protection). I would like to use synchronous writes to get the higher level of data protection.

My intent is to use Truenas as my SAMBA server which by default use asynchronous writes. However, Truenas has settings to force all the writes to the data pool to be synchronous. Feel free to correct me on this one as I can’t remember where i got this info.

Only synchronous writes will benefit from a SLOG. So, in my case where I have SMB traffic which is asynchronous by default but I'm forcing it to be synchronous via a Truenas setting, would my system benefit from a SLOG? I will have no VMs or Databases running on the system, just SMB shares.

The system in question is a Truenas Mini X+ with two 6TB WD Reds in mirror mode.

The source of my information are the following links:
https://www.ixsystems.com/blog/zfs-zil-and-slog-demystified/
https://www.ixsystems.com/blog/why-zil-size-matters-or-doesnt/
https://www.truenas.com/docs/references/slog/#slog-for-asynchronous-writes
https://www.truenas.com/community/threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/

c77dk · Sep 10, 2021

I would suggest trying without - but it depends on how much you write. If you think it's too slow then you can add a fast SLOG.

To test how fast your disks can ingress via SMB you can make a dataset without sync writes and test - the result here will be faster than with SLOG (how much depends on your hw), but will give you a hint of max speed. Try the same test with sync=always and compare. If results are miles apart a SLOG might be a good idea - but remember to get one with good endurance and PLP

Spearfoot · Sep 10, 2021

ragametal said:
I’m new to Truenas and I’m a bit confused as to whether i will benefit from a SLOG at all in my configuration.

My understanding is that data writes to the data pool can either be synchronous (slower but provides data integrity and protection) or asynchronous (faster but without data protection). I would like to use synchronous writes to get the higher level of data protection.

My intent is to use Truenas as my SAMBA server which by default use asynchronous writes. However, Truenas has settings to force all the writes to the data pool to be synchronous. Feel free to correct me on this one as I can’t remember where i got this info.

Only synchronous writes will benefit from a SLOG. So, in my case where I have SMB traffic which is asynchronous by default but I'm forcing it to be synchronous via a Truenas setting, would my system benefit from a SLOG? I will have no VMs or Databases running on the system, just SMB shares.

The system in question is a Truenas Mini X+ with two 6TB WD Reds in mirror mode.

The source of my information are the following links:
https://www.ixsystems.com/blog/zfs-zil-and-slog-demystified/
https://www.ixsystems.com/blog/why-zil-size-matters-or-doesnt/
https://www.truenas.com/docs/references/slog/#slog-for-asynchronous-writes
https://www.truenas.com/community/threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/

I recommend you dispense with a SLOG device for SMB usage.

You won't benefit from a SLOG device in terms of performance. In fact, you will suffer instead. A SLOG device is not a cache; it doesn't improve write speeds.

Are you running virtual machines? Or handling critical, real-time transaction processing? If not, a SLOG probably isn't for you.

ragametal · Sep 12, 2021

@c77dk, Thanks for the advice. I will give it a shot once i get the server (i'm waiting for it to get shipped). I was honestly hoping for a less empirical solution but I'm more than try it.

@Spearfoot , I hear you and based on what i have read i tend to agree with you. I do not have any VMs or databases. However I'm still confused about my particular use.

As I said in my original post, SLOG is only beneficial for sync writes which SMB does not use (this is the part that i agree with you). But What would happen if i change the truenas settings for that dataset to be "sync=always"?

If i do that, would my SMB traffic use sync writes? if so, it appears that i would benefit from a SLOG. is my understanding wrong?

awasb · Sep 12, 2021

Well, SMB in fact uses (enforced) sync writes under certain circumstances and for special use cases.

MacOS-TimeMachine-Backup via SMB is such a use case, since TimeMachine issues a sync after every single file write to the server. That really slows things down if you got a lot of small files to backup. Over here a SLOG speeded TimeMachine-Backups up significantly (as it did for Linux-Backups via rsync to NFS).

Enforcing sync writes and using a fast (in terms of latency and iops) SLOG-vdev _could_ enhance speed compared to using sync writes and no SLOG (as long as the SLOG's latency/iops is considerably better than your pool's is). But I doubt there are (m)any use cases where SLOG + sync beats async. If they exist, I'd guess the pool is badly designed, the hardware is flaky, the server got not enough RAM etc. etc.

So ... the primary goal of using SLOG is to add that "sync-level" of data integrity (by telling the client it's all synced to the pool, while it is just synced to the SLOG-vdev) without sacrificing too much performance. That's why it's important to get a low latency, high iops persistent memory device [with PLP|without caches].

To sum it all up: If you want (or need by use case) that extra level of integrity, go for it. But (given your system is properly designed and working) your performance will suffer. A SLOG just mitigates the loss of performance.

If you don't need it, get a proper UPS.

Spearfoot · Sep 12, 2021

ragametal said:
@Spearfoot , I hear you and based on what i have read i tend to agree with you. I do not have any VMs or databases. However I'm still confused about my particular use.

As I said in my original post, SLOG is only beneficial for sync writes which SMB does not use (this is the part that i agree with you). But What would happen if i change the truenas settings for that dataset to be "sync=always"?

If i do that, would my SMB traffic use sync writes? if so, it appears that i would benefit from a SLOG. is my understanding wrong?

If you turn on synchronous writes for the dataset, I'm pretty sure your SMB traffic will use synchronous mode, and your write performance will be verrrrryyyyy slooooowwwwww. At least, this was my experience with a server at work where I'd inadvertently set up a dataset that way. A first-rate SLOG device would help -- something like an Optane, for example. But it may very well cost as much as the rest of your system put together.

Have you installed the maximum RAM supported by your motherboard? Because you should do that before even thinking about a SLOG device. You already have a SLOG -- in memory -- so it's a good idea to maximize that first.

I've never used a SLOG device for standard SMB shares, only for NFS & iSCSI datasets used for virtual machines. And I quit using them altogether because they slow performance and I don't really need that level of integrity for a home lab.

But this is your system, and you can do whatever you like; we learn best by experimenting, and Lord knows I've done plenty of that, too.

Whatever you do, have fun while you're doing it!

ragametal · Sep 12, 2021

Thank you all for your responses. It seems that the general consensus is to just try and experiment to see which option works best based on my work load and traffic type.

I will most likely do that once i get the server in my hands.
@Spearfoot , I haven't maxxed out the RAM of that system yet but i thought that Truenas did not use RAM for the sync writes. Instead it would save the writes to the ZIL which by default is located on the associated data pool. A SLOG is an alternative location for the ZIL to prevent writing the data twice on the same disks (the first on the ZIL and the second to the dataset).

Am i wrong on this?
If that is the case, then dding more RA should not help with sync writes.

Funny that you mentioned Optane becuse the 16GB M10 module is very affordable and was the main reason why I started this thread. I mean, my network connection is only 1Gbps and the ZIL saves up to 5s of data per transfer group. This means that my SLOG should only need to be sized 1Gb/S x 5s = 5Gb. And since there are 8 bits in one Byte 5Gb/8 = 0.625GB. If we wanted to add 2 or maybe 3 transfer groups on the same module for safety factor, then my SLOG should only be 0.625Gb x 3 = 1.875GB.

So, a 16GB optane module should be more than enough. Now, the Endurance of that module is not that high but at that pricepoint i can buy up to 10 modules for the price of a single optane ssd.

Thoughts?

awasb · Sep 12, 2021

You are perfectly right with your assumption, that the ZIL resides on the pool etc.

What is the ZFS ZIL SLOG and what makes a good one

We go into what the ZFS ZIL SLOG is. We also show a few common choices as ZFS ZIL SLOG devices, how they work and what makes a good one

www.servethehome.com

I‘m using the mentioned OptaneMemory 16GB m.2 module. Works good enough via 1GbE to feel/see/measure the difference.

flashdrive · Sep 12, 2021

Hello,

My goal for my homelab host / server is the same: SMB writes / ingest mostly.

So what would be the way to go to have a good write performance?

Adding as much RAM as possible - this helps with read speeds due to the ARC, correct?

For now I am around 400 Mbyte / s with 5 HDDs and 10 GB LAN.

I know that by writing to a RAM disk I can get near to the theoratical max of 10 GB LAN copper based.

Spearfoot · Sep 12, 2021

ragametal said:
Thank you all for your responses. It seems that the general consensus is to just try and experiment to see which option works best based on my work load and traffic type.

I will most likely do that once i get the server in my hands.
@Spearfoot , I haven't maxxed out the RAM of that system yet but i thought that Truenas did not use RAM for the sync writes. Instead it would save the writes to the ZIL which by default is located on the associated data pool. A SLOG is an alternative location for the ZIL to prevent writing the data twice on the same disks (the first on the ZIL and the second to the dataset).

Am i wrong on this?
If that is the case, then dding more RA should not help with sync writes.

Funny that you mentioned Optane becuse the 16GB M10 module is very affordable and was the main reason why I started this thread. I mean, my network connection is only 1Gbps and the ZIL saves up to 5s of data per transfer group. This means that my SLOG should only need to be sized 1Gb/S x 5s = 5Gb. And since there are 8 bits in one Byte 5Gb/8 = 0.625GB. If we wanted to add 2 or maybe 3 transfer groups on the same module for safety factor, then my SLOG should only be 0.625Gb x 3 = 1.875GB.

So, a 16GB optane module should be more than enough. Now, the Endurance of that module is not that high but at that pricepoint i can buy up to 10 modules for the price of a single optane ssd.

Thoughts?

Correct, the ZIL (ZFS Intent Log) resides on your pool; I misspoke.

awasb · Sep 12, 2021

flashdrive said:
[…]
I know that by writing to a RAM disk I can get near to the theoratical max of 10 GB LAN copper based.

Why should one jump all the hoops with ramdisk(s), when you can have your zfs datasets sync=disabeld? The integrity advantage is gone as soon as ZIL/SLOG is in RAM.

flashdrive · Sep 12, 2021

awasb said:
Why should one jump all the hoops with ramdisk(s), when you can have your zfs datasets sync=disabeld? The integrity advantage is gone as soon as ZIL/SLOG is in RAM.

A misunderstanding:

I was not referring to having the ZIL/SLOG in RAM.

The RAM disk was in a proof of concept build being used as the data pool.

awasb · Sep 12, 2021

Sorry. Missed that.

ragametal · Sep 13, 2021

Thank you all for the responses and input. It is reassuring to know that my understanding wasn't that far off from reality.

@awasb, do youhave "sync=always"? Do you saturate the 1Gbps with that optane module? how long have you had been using it? (I'm trying to measure weather or not i should be worried about the its endurance number).

ChrisRJ · Sep 13, 2021

At least for my personal taste there is too much talk about technical details, while the use-case is still not explored in sufficient detail. So what are the requirements and how valuable are the data? Also, what do the surrounding systems look like? A proper SLOG with power-loss protection will be of limited value, if the desktop from where the data comes is not connected to a UPS. In that light: Also, simply re-starting a copy over SMB in the rare case of power loss, is not a bad alternative to a rather complex system (if feasible from a process perspective).

awasb · Sep 13, 2021

Running main client, switch and NAS UPS protected over here.

@ragametal: nope. All pools are „standard“. But MacOS-TimeMachine enforces sync, as do my raspis (tftpd via NFS) and Linux machines (rsync backup via NFS).

My fast clients with „real“ 1GbE (not the raspis) saturate the line when copying larger files (117 MB/s read/write). No matter the sync option. The maximum write speed of the OptaneMemory 16 GB module seems just fast enough. No flush-waits.

Concerning durability I followed this „calculation“:

Examining 3D XPoint’s 1,000 Times Endurance Benefit – The Memory Guy

thememoryguy.com

If you take the TBW „endurance lifetime“ Intel „promises“ here

Intel® Optane™ Memory Series (16GB, M.2 80mm PCIe 3.0, 20nm, 3D Xpoint™) - Product Specifications | Intel

Intel® Optane™ Memory Series (16GB, M.2 80mm PCIe 3.0, 20nm, 3D Xpoint™) quick reference with specifications, features, and technologies.

www.intel.com

then you could write 100 GB a day for 5 years (=warranty). This heavily exceeds my use case. Initial backups are between 80 to 150 GB per machine for „real“ clients. The raspis take about 200 to 500 MB per machine/node initially. The differential backups of all those machines take 300 to 1200MB per day. (When I‘m doing some video editing, maybe some GBs add up.) But I am far from 100 GB a day.

The TimeMachine differential backups with a lot of small files (git/brew/LaTEX-projects), that used to take half an hour because every single file write within the sparsebundle gets synced, now take seconds to minutes.

The rsyncs got a remarkable boost by adding a L2ARC metadata only cache.

BUT: the baseline to all of this is „my pool is slow“. It consists of 8x 2.5 SATA 1TB drives in raidz2. Someone with a properly designed SSD only pool would laugh out aloud (and would possibly not even think about adding a SLOG, certainly not for 1GbE).

So the question is not „is it any good?“ but „what is it good for?“ or „which special use case benefits on what kind of hardware?“ … in my case: my (low budget built) NAS was slow enough.

ragametal · Sep 15, 2021

@ChrisRJ ,I agree that there is a lot of talk about the technical aspects of certain systems. But, if you don’t study and understand the technical basics, how else can you come up with a configuration that makes sense for your work environment.

To answer your questions, the data is extremely valuable as it is the central repository of project files for my small engineering firm. The NAS will serve about 5 local clients all connected via CAT6 Ethernet at 1Gbps. Each client will dynamically work on about 2 dozen files at any given time each ranging from 250kB up to 2GB each. Clients will not copy the files to their workstations, instead they will access these files directly from the NAS, using SMB shares, via CAD software on their workstations. Expect each file to be re-saved every 2 minutes (the 2GB ones saved every 20 minutes).

Each workstation and the NAS are protected by UPS.

Restarting a file copy over SMB, in case of power loss, is not an option as the system will not tell me which file was not written properly for me to manually correct a potential issue. This is an important point as a corruption of certain key files could compromise all the files of an entire project. This would be, of course, extremely rare and unique but possible nonetheless as there are multiple nested files.

@awasb ,Thank you so much for sharing your experiences with the Optane M10 module in a 1Gbps network. It amazes me that i found someone with the exact scenario I’m planning to have. I am particularly grateful about the explanation on how you evaluated if the rated endurance was suitable for your use. After seeing your results I decided to try the M10 as a SLOG for my system.

It should be adequate as I don’t see myself upgrading my network to 10Gbps anytime soon (and i don't expect to reach the rated 100GB/day wittes).

winnielinnie · Sep 17, 2021

I feel like I'm missing something here.

Is it true that...?

async = potential for corrupted data
sync = protection against corrupted data

I thought by using TCP, SMB (async) is essentially no more risky than downloading from FTP or any website.

Or is the protection against corruption only in the case of a power loss / system crash?

I've read about sync vs async, and ZiL, and SLOG devices, but only thing I could really discern is that it concerns power loss or system interruptions; no mention that using async is inherently risky.

awasb · Sep 17, 2021

async is not inherently insecure. But as long as data resides in volatile memory (longer than necessary, that is) there is the chance of losing that data due to powerloss/system crash before flush/commit or corrupting the ~~pool~~ data and/or losing partial data due to powerloss/system crash while flushing/doing commits.

synced writes reduce this risk. Synced writes with a fast SLOG reduce it to a higher degree.

Comparing with FTPing is not necessaryly „sufficient“, since it depends on the use case and the value of transferred data: when your version-backup- or even archive-NAS dies your data is (potentially) gone.

A good read:

Chris's Wiki :: blog/solaris/ZFSTXGsAndZILs

As is:

illumos-gate/usr/src/uts/common/fs/zfs/zil.c at master · illumos/illumos-gate

An open-source Unix operating system -- this is a read-only mirror of the official repository at https://code.illumos.org/plugins/gitiles/illumos-gate - illumos/illumos-gate

github.com

winnielinnie · Sep 17, 2021

awasb said:
there is the chance of losing that data due to powerloss/system crash before flush/commit or corrupting the pool and/or losing partial data due to powerloss/system crash while flushing/doing commits.

Losing the data due to a power loss or system crash I can understand. But how is it possible that a failed write (even in the middle of a commit) can corrupt the pool even though ZFS is copy-on-write? I can't imagine you lose your entire pool because of a single interrupted write/commit during a power loss.

Important Announcement for the TrueNAS Community.

Slog useful for SMB traffic?????

Contributor

Patron

He of the long foot

Contributor

Patron

He of the long foot

Contributor

Patron

Patron

He of the long foot

Patron

Patron

Patron

Contributor

Wizard

Patron

Contributor

MVP

Patron

MVP

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Slog useful for SMB traffic?????"

Similar threads