Removing cache drives from live pool

guemi

Dabbler
Joined
Apr 16, 2020
Messages
48
Then why bother with SLOG at all? It's slowing your writes down without providing a benefit. Just turn off sync writes and be done with it if "some lost data [...] isn't a concern or a problem."


I was under the impression that adding a fast, large write cache in front of slower spinning disks would improve speeds? The SLOG is 1 TB M2 SSD with about 2500 MB/s write speed.
As long as we don't fill the write cache drive, shouldn't it write much faster to the POOL and then the SLOG will feed the spinning disks as fast as they can?
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
I was under the impression that adding a fast, large write cache in front of slower spinning disks would improve speeds? The SLOG is 1 TB M2 SSD with about 2500 MB/s write speed.
As long as we don't fill the write cache drive, shouldn't it write much faster to the POOL and then the SLOG will feed the spinning disks as fast as they can?
SLOG is just a way to secure your sync writes - normally this device will only see writes, and no reads. If you need to read from it, then something brown has hit a fan.
What you seem to want could be implemented with another pool and some replication / maybe sVDEV (but be sure to get redundancy right)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I was under the impression that adding a fast, large write cache in front of slower spinning disks would improve speeds? The SLOG is 1 TB M2 SSD with about 2500 MB/s write speed.
As long as we don't fill the write cache drive, shouldn't it write much faster to the POOL and then the SLOG will feed the spinning disks as fast as they can?

Yes, I'm sure that adding a fast, large write cache in front of slower spinning disks would improve write speeds.

All this means is that when I suggested you go and read the "Some insights into SLOG" post, you didn't, because you still seem to think that the SLOG is some sort of write cache.

It is absolutely NO SORT of write cache at all.

Please do avail yourself of the resources that people point you at. Almost all of them are really good things to know about ZFS, even where they may not be immediately relevant to your specific situation.

Your system's main memory is your write cache. SLOG is a thing that guarantees written data reaches the pool, i.e. in case of a power loss. Your fastest write mode is ALWAYS to disable sync writes; ZFS simply CANNOT write faster than the pool can accept flushed transaction groups.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
As long as we don't fill the write cache drive
The problem there (other than incorrectly identifying SLOG as write cache) is that the amount of writes allowed to sit in that state is 5 seconds worth, so any kind of sustained IO basically throttles as soon as the pool can't absorb it. The only thing it helps with is flattening out short spikes in IO write requirements.
 

guemi

Dabbler
Joined
Apr 16, 2020
Messages
48
All this means is that when I suggested you go and read the "Some insights into SLOG" post, you didn't, because you still seem to think that the SLOG is some sort of write cache.

It is absolutely NO SORT of write cache at all.

Please do avail yourself of the resources that people point you at. Almost all of them are really good things to know about ZFS, even where they may not be immediately relevant to your specific situation.

Your system's main memory is your write cache. SLOG is a thing that guarantees written data reaches the pool, i.e. in case of a power loss. Your fastest write mode is ALWAYS to disable sync writes; ZFS simply CANNOT write faster than the pool can accept flushed transaction groups.


Then I suggest the official documentation that quite literally states that SLOG is a write cache be revisited:


All drives in an encrypted pool are encrypted, including L2ARC (read cache) and SLOG (write cache).


And the linked post also repeatedly mentions SLOG as a write cache. Probably, because it is.


No idea what you're getting worked up given that a SLOG most definitely is a write cache. Now it may or may not speed up writes, but that doesn't make it less of a write cache.
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Then I suggest the official documentation that quite literally states that SLOG is a write cache be revisited:


Feel free to submit a bug report. I am not an iX employee and I don't manage their technical writing. You are making the suggestion to the wrong person.

And the linked post also repeatedly mentions SLOG as a write cache. Probably, because it is.

It DOES NOT. I know, because I wrote the linked post. I specifically point you to the section that discusses the actual ZFS write cache, "The ZFS write cache. (No, it's not the ZIL.) "

No idea what you're getting worked up given that a SLOG most definitely is a write cache. Now it may or may not speed up writes, but that doesn't make it less of a write cache.

In computer science, a write cache refers to a resource where you write data to the cache, which is later read back out and spooled back to disk.

The ZIL/SLOG is never read under normal operating conditions.

Its sole purpose is to act as an intent log, which is what the "I" and "L" in "ZIL" stand for, to guarantee that sync writes are actually written to something that can withstand a power loss, kernel crash, etc.

The ZIL/SLOG is *exclusively* written to, never read from, except at one time: it is read during pool import to make sure that the transactions in the log were actually written to the pool disks.

Therefore, it is most definitely not a write cache.

Additionally, it does NOT generally speed up writes. ZIL/SLOG writes are guaranteed to be slower than async writes. ZIL/SLOG helps with synchronous writes because it eliminates the need to commit them to the pool.

Please consider actually reading the "Some insights into SLOG" post this time. I put extensive time and effort into the things I post in an effort to educate people who are having trouble with the concepts, which are, admittedly, sometimes a bit difficult.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I thought this was an L2ARC thread, how are we talking about SLOG? ;)

Then I suggest the official documentation that quite literally states that SLOG is a write cache be revisited:


Both this and the newer page at https://www.truenas.com/docs/references/slog/ have some incorrect assumptions/interpretations of SLOG, specifically around sizing (it's more complex than "throughput over 5s") and the section on "SLOG for asynchronous writes" which is just "wrong by definition" - you can force all writes to be treated as sync, and therefore go to the ZIL/SLOG, but async writes will still happily go to RAM only, and they'll still be faster.

And the linked post also repeatedly mentions SLOG as a write cache. Probably, because it is.

No idea what you're getting worked up given that a SLOG most definitely is a write cache. Now it may or may not speed up writes, but that doesn't make it less of a write cache.

Much like the "RAIDZ/RAID0/stripe" terminology, it's important to understand that people aren't just being pedantic here, but trying to describe the underlying ZFS behaviour. Think of the ZIL/SLOG as the ZFS equivalent to "battery backup" for a RAID card. It's not in the primary data path, it's there to provide assurance for your most important data in case of failures. When a new write comes into your ZFS server, it goes into RAM. If it's marked as sync, it also goes to the ZIL or SLOG. But when it comes time to "batch up the transaction" for committing to the capacity/data vdevs, those reads come from RAM, not the SLOG.

It's not a scenario where you can say "well, I have a machine with 32GB of RAM, and a massive 512GB SSD - I want to be able to write 500GB at SSD-speeds and then have it spool to a bunch of very slow disks." Because the "write cache" lives in RAM, the maximum you'd ever be able to hold would be that 32GB (less system/service overhead) of outstanding dirty data, and in order to do that, you'd blow out your RAM-based read cache.

You can copy everything to a pool on said SSD, and then use a server-side rsync/snapshot/copy script to move it to the HDD based storage, but that's a more manual undertaking. ZFS doesn't do "data tiering."

Given your previous copy speeds (500-600MB/s on a four-drive RAIDZ1) I'll eat my hat if you aren't already running sync=standard so this is as fast as that pool will get - and for a four-drive Z1 that's impressive for sequential speeds as it implies 150-200MB/s of usable throughput per spindle, depending on how compressible your data is. If you want to go faster, I suggest more back-end vdevs, and possibly in mirrors if any of that access trends towards "concurrent clients" or "random"
 

guemi

Dabbler
Joined
Apr 16, 2020
Messages
48
That seems curious. I would think that's an error/bug/problem, because that should be a nonobjectionable change. If you can duplicate that, it would be interesting to follow up on.


I managed to reproduce this again. SMB crashed out right and wouldn't respond at all.

Very interestingly, iscsi stayed working this time - it had 2 seperate disks connected to the same fileserver which was fully functional. (Connected via ethernet)
ICMP also worked.

But this time SNMP, SSH, WebUI stopped responding / working so I had to hard reboot the server.
 
Top