Some insights into SLOG/ZIL with ZFS on FreeNAS

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Running a vmware esxi on zfs without breaking the bank is not possible.

It's all relative. If you go look at what the cost for running enterprise-grade storage for multiple ESXi hosts is, FreeNAS and ZFS come in looking very attractive indeed.
 

wrath

Dabbler
Joined
Jun 23, 2015
Messages
26
The math for hdd that does 150mb/s and having 4 of them makes it 600mb/s or 3gb/5secs. Considering 1/8 of ram is used for the transaction group. Having 16gb ram would give transaction group 2gb of ram. How does this scale up to lets say 6 disks at 150mb/s in raidZ2. Would it simply be 900mb/s or 4.5 gb/5s. So 1/8 of 32gb would be 4gb. Would 6 disk setup justify more ram without sacrificing performance? If you use 16gb ram for 6 disk setup, you essentially cut the performance in half. Am I right?
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
You are mixing and matching concepts. You have pool speed limits correct. RAM is primarily used for ARC. Basically it boils down to the larger the ARC the higher the chance you will be able to read from it at full memory speed bypassing the pool. Transaction size is not terribly relevant. Except as a means to calculate the maximum volume written to the zil before we must flush to disk. Sync writes get limited by latency and iops. So we add a super fast low latency device so we can return confirmation of the block write to disk and get on to the next one.

L2ARC is the device that affects low RAM systems as it can displace critical ARC info and end up hurting more than helping.

Little simplified as I'm on phone.
 

sfcredfox

Patron
Joined
Aug 26, 2014
Messages
340
After reading this a few times, I believe I understand each pool has it's own ZIL. So in an example system having two pools on separate disk sets, if the first pool is built for a virtualization data store and has a heavy SYNC workload, this pool is expected to benefit having a SLOG. If the SAME system also has a second pool for everything else, and is built with the expectation it will NOT likely have a heavy SYNC load, it also has it's own ZIL and the default in pool ZIL could be fine (all things situationally workload dependent)?

Bottom line, every pool has it's own ZIL?

I expect if you had two pools with heavy SYNC loads, you would want TWO distinct SLOGs so the two pools don't contend with each other for SLOG access?

Lastly, how does the calculation of the transaction groups account for having multiple pools?

If having your transaction group be too large for your system's hardware to write it before the second transation group needs to be written, how is this effected by having two pools?

Is there a transaction group for each pool? Or is there a single transaction group for all data intended for any pool?
Example: If one pool of disks is slower and has lots of writes being committed, would that lock up IO intended for a second pool that is very fast and has no problems committing it's transaction groups to disk?

How would you calculate the max transaction group size on a system with multiple pools?
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Okay, so is there any correlation to SLOG and the Swap size on each hard drive? Meaning if one were to have a pair of SSDs (mirrored) allocated for a SLOG then would it be advisable to set the "Swap size on each drive in GiB, affects new disks only" to "0" (zero)? Or would FreeNas simply ignore that swap and use it in the event that the mirrored SLOG went down?

Feel free to smack me if I am way off base here. ;)
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
would it be advisable to set the "Swap size on each drive in GiB, affects new disks only" to "0" (zero)?
I don't think it ever makes sense to change swap to 0. If only for the fact that it allows you to replace a failed disk with one that isn't quite the same size.

and use it in the event that the mirrored SLOG went down?
If the SLOG fails, it will directly use the pool instead (not swap).
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
I don't think it ever makes sense to change swap to 0. If only for the fact that it allows you to replace a failed disk with one that isn't quite the same size.
Thanks I will leave that alone, I was leery about wanting to muck with that.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
I don't think it ever makes sense to change swap to 0. If only for the fact that it allows you to replace a failed disk with one that isn't quite the same size.

That makes sense for the pool data drives, but it could possibly be argued that drives such as SLOG and L2ARC are of a somewhat different class of drive. The ZFS can-never-remove-a-vdev thing means that once added, you MUST have that drive in the pool. The others, however, are not, and have a less permanent, potentially ephemeral nature to them.

For whatever it is worth, I think the FreeNAS swap strategy is vaguely flawed, because the swap isn't protected against drive failure, and perhaps isn't sized to scale.

However, I cannot think of a significant value to be gained trying to play games with the swap size on something like a SLOG device, unless perhaps you've got something like a small battery backed RAM SSD where the size of the swap would be taking a considerable portion of the device.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
I didn't realize that slog and l2arc devices were used for swap. My comment was only referring to data drives.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Or I become more confused every day....
ada0 is my SLOG and ada1 is my L2ARC. I don't see swap on them.
Code:
[root@freenas1] ~# gpart show
=>      34  31249933  ada0  GPT  (14G)
        34        94        - free -  (47k)
       128  31249832     1  freebsd-zfs  (14G)
  31249960         7        - free -  (3.5k)

=>       34  390721901  ada1  GPT  (186G)
         34         94        - free -  (47k)
        128  390721800     1  freebsd-zfs  (186G)
  390721928          7        - free -  (3.5k)

=>       34  234441581  ada2  GPT  (111G)
         34       1024     1  bios-boot  (512k)
       1058          6        - free -  (3.0k)
       1064  234440544     2  freebsd-zfs  (111G)
  234441608          7        - free -  (3.5k)

=>       34  234441581  ada3  GPT  (111G)
         34       1024     1  bios-boot  (512k)
       1058          6        - free -  (3.0k)
       1064  234440544     2  freebsd-zfs  (111G)
  234441608          7        - free -  (3.5k)

=>        34  7814037101  da0  GPT  (3.7T)
          34          94       - free -  (47k)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  7809842696    2  freebsd-zfs  (3.7T)
  7814037128           7       - free -  (3.5k)

=>        34  7814037101  da1  GPT  (3.7T)
          34          94       - free -  (47k)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  7809842696    2  freebsd-zfs  (3.7T)
  7814037128           7       - free -  (3.5k)

=>        34  7814037101  da2  GPT  (3.7T)
          34          94       - free -  (47k)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  7809842696    2  freebsd-zfs  (3.7T)
  7814037128           7       - free -  (3.5k)

=>        34  7814037101  da3  GPT  (3.7T)
          34          94       - free -  (47k)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  7809842696    2  freebsd-zfs  (3.7T)
  7814037128           7       - free -  (3.5k)
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
ada0 is my SLOG and ada1 is my L2ARC. I don't see swap on them.
Interestingly enough on a current test machine the two drives dedicated for SLOG do show a swap. These two drives (da2 & da3) are mirrored so maybe that has something to do with it.

Also, this is not a "standard setup", it is a test system for FreeNas running on ESXi 6.0.
The two SSDs (Intel 320 32GB) are actually being served to the FreeNas VM via virtual drives which are attached to a LSI 9260-8i (each is a RAID0) with "write back w/bbu". ***Testing out a variant of having a SLOG with BBU... ;)

Call it my variant of:
An interesting but unorthodox alternative for SLOG is to use a RAID controller with battery backed write cache, along with conventional hard disks.

Code:
[root@ASC-FN01] ~# gpart show
=>      34  67108797  da0  GPT  (32G)
        34      1024    1  bios-boot  (512K)
      1058         6       - free -  (3.0K)
      1064  67107760    2  freebsd-zfs  (32G)
  67108824         7       - free -  (3.5K)

=>      34  67108797  da1  GPT  (32G)
        34      1024    1  bios-boot  (512K)
      1058         6       - free -  (3.0K)
      1064  67107760    2  freebsd-zfs  (32G)
  67108824         7       - free -  (3.5K)

=>      34  41942973  da2  GPT  (20G)
        34        94       - free -  (47K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  37748568    2  freebsd-zfs  (18G)
  41943000         7       - free -  (3.5K)

=>      34  41942973  da3  GPT  (20G)
        34        94       - free -  (47K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  37748568    2  freebsd-zfs  (18G)
  41943000         7       - free -  (3.5K)

=>        34  3907029101  da4  GPT  (1.8T)
          34          94       - free -  (47K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834696    2  freebsd-zfs  (1.8T)
  3907029128           7       - free -  (3.5K)

=>        34  3907029101  da5  GPT  (1.8T)
          34          94       - free -  (47K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834696    2  freebsd-zfs  (1.8T)
  3907029128           7       - free -  (3.5K)

=>        34  3907029101  da6  GPT  (1.8T)
          34          94       - free -  (47K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834696    2  freebsd-zfs  (1.8T)
  3907029128           7       - free -  (3.5K)

=>        34  3907029101  da7  GPT  (1.8T)
          34          94       - free -  (47K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834696    2  freebsd-zfs  (1.8T)
  3907029128           7       - free -  (3.5K)

=>        34  3907029101  da8  GPT  (1.8T)
          34          94       - free -  (47K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834696    2  freebsd-zfs  (1.8T)
  3907029128           7       - free -  (3.5K)

=>        34  3907029101  da9  GPT  (1.8T)
          34          94       - free -  (47K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834696    2  freebsd-zfs  (1.8T)
  3907029128           7       - free -  (3.5K)

=>        34  3907029101  da10  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834696     2  freebsd-zfs  (1.8T)
  3907029128           7        - free -  (3.5K)

=>        34  3907029101  da11  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834696     2  freebsd-zfs  (1.8T)
  3907029128           7        - free -  (3.5K)

=>        34  3907029101  da12  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834696     2  freebsd-zfs  (1.8T)
  3907029128           7        - free -  (3.5K)

=>        34  3907029101  da13  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834696     2  freebsd-zfs  (1.8T)
  3907029128           7        - free -  (3.5K)

=>        34  3907029101  da14  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834696     2  freebsd-zfs  (1.8T)
  3907029128           7        - free -  (3.5K)

=>        34  3907029101  da15  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834696     2  freebsd-zfs  (1.8T)
  3907029128           7        - free -  (3.5K)
 
Last edited:

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Hmm, that's strange. I installed this system on 9.3 back around ~Jan 2015 (not 9.3.1) and have been upgrading it since. I'm still on 9.3.1. though, not 9.10. Maybe that has something to do with it?
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Going to destroy and re-create the volume (doing some speed tests with Mirrors vs RaidZ2), so I will just add one SLOG and see if it creates a swap or not. Also, you are correct I am running 9.10.

*** Update: Seems if you are not doing any mirroring for the SLOG then the SWAP is not created on the disk
Code:
=>      34  41942973  da2  GPT  (20G)
        34        94       - free -  (47K)
       128  41942872    1  freebsd-zfs  (20G)
  41943000         7       - free -  (3.5K)
 
Last edited:

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Out of curiosity, if all other components are equal and I was going to connect an Intel SSD DC S3500 (160 GB) as a SLOG which would be preferable?

Directly to a SATA Port (3 Gb/s) or an HBA (6 Gb/s)?

*** Note: Both the MB SATA Ports and HBA are being passed-through to a FreeNAS VM running on ESXi 6.0 U2

I am leaning towards the SATA Port simply to circumvent all the extra traffic, but wanted to check my sanity...
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Out of curiosity, if all other components are equal and I was going to connect an Intel SSD DC S3500 (160 GB) as a SLOG which would be preferable?

Directly to a SATA Port (3 Gb/s) or an HBA (6 Gb/s)?

*** Note: Both the MB SATA Ports and HBA are being passed-through to a FreeNAS VM running on ESXi 6.0 U2

I am leaning towards the SATA Port simply to circumvent all the extra traffic, but wanted to check my sanity...
I recommend attaching it to the HBA. You want a SLOG device to have really fast write speeds with low latency, so, all else being equal, it would probably be faster connected to a 6Gb/s port vs. a 3Gb/s port. Or so it seems to my little pea brain. :smile:

Here is an interesting review of SSD SLOG devices which shows that the S3500 is a pretty good contender. My understanding is that the S3700 SSD is a better choice because it's optimized for writes and has higher durability. I use S3700 SSDs as SLOG devices in my systems.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Yeah, I do have a couple S3710s (200 GB) that I could use as well. But, for now will stick with the S3500 for testing.

While the HBA (Perc H200 Cross-Flashed) will provide 6Gb/s, it will take up one of my hard drive bays. Not a big deal, but I would prefer otherwise. Also, connecting it to a HBA would introduce a little latency (assumed)...

Would I be incorrect in assuming that neither a S3500 or S3710 could really saturate a 3Gb/s connection?
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Top