SLOG too big?

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
Is it possible for your SLOG to be too large?

We are having trouble with our array going into serious write latency. When we execute a large data move, the array (3 striped mirrors of 6 8tb disk with a 1TB NMVe SLOG) the array will go into full write mode, with 2 large spikes every 5 minutes.

Is it possible that the SLOG is so large that it is filling up and then trying to flush to disk? When this happens, the entire TrueNAS box becomes unresponsive, including the web interface and even SSH. We just have to wait for it to stop before we can regain control. This causes our iSCSI-connected VMs to freeze since they can't access thier disks.

We have stopped doing major data moves and the problem has gone away for now, but we are leary of doing any large data moves because it may cause the problem to recur.

I know there has been discussion on proper sizing of an SLOG, but I find bits and pieces when I search. Can someone link me to a definitive guide that shows the entire calculation (write speed/network speed/etc) for calculating the proper size?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
Is it possible that the SLOG is so large that it is filling up and then trying to flush to disk?
No. The SLOG is flushed every 5 seconds, which is why it's not usually practical to have a SLOG of over 30GB.

However;
Is it possible for your SLOG to be too large?
No, not really (other than being wasteful in terms of cost).
If you partition the SLOG drive correctly, you can have the system use the additional free/unallocated space in order to assist with wear leveling to extend the life of the drive.

Can someone link me to a definitive guide that shows the entire calculation (write speed/network speed/etc) for calculating the proper size?
There is a thread somewhere here discussing SLOG maximum sizes, but I can assure you that more than 30GB being written in 5 seconds is enough to cover most hardware that you would expect to put in place.

The intent of SLOG is to smooth out peak IO for small writes like is usually the case with VM HDDs in a block storage scenario. Perhaps reading this article would be of some help (point 10 is of particular note here):
 

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
Hmm.. Then I wonder what is causing these write spikes...

It's been a couple weeks since it has happened because of our "hands-off" approach.

If I can make it happen again, what log data can I collect to help diagnose this?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Is this the HP DL385 Gen 10+ you had posted about previously with the "E208i" SAS controller? Was this card replaced or supplemented with a different (LSI) HBA?

Please post the full hardware specs, including the make/model of all drives including NVMe used for SLOG.
 

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
I don't have that data at hand at the moment, but yes, it is the DL385 Gen 10 +, and I have replaced that E208i with LSI cards. I'll gather the config info and attach the file to a future message.
 
Top