Some insights into SLOG/ZIL with ZFS on FreeNAS

Andrii Stesin

Dabbler
Joined
Aug 18, 2016
Messages
43
Dear FreeNAS gurus, would you mind sharing a bit of your wisdom, please? The question is, as of today (February 2019) are there any SAS or SATA-3 (not NVMe) SSD drives available on the market, which have a capacitor backed write cache so they can lose power in the middle of a write without losing data? I have just read this old blog post but I can't find any of these drives available brand new here. What are the possible replacements, as of now? Thanks in advance!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Dear FreeNAS gurus, would you mind sharing a bit of your wisdom, please? The question is, as of today (February 2019) are there any SAS or SATA-3 (not NVMe) SSD drives available on the market, which have a capacitor backed write cache so they can lose power in the middle of a write without losing data? I have just read this old blog post but I can't find any of these drives available brand new here. What are the possible replacements, as of now? Thanks in advance!
Hello Andrii,

The market has shifted to NVMe with good reason - there's significant performance improvements to be had by getting on the PCIe bus. That said there are some scenarios that make it difficult to implement (dual-ported/HA solutions mostly, but you might want to be looking at official TrueNAS at that point)

Check the link in my signature to the SLOG benchmarking thread (1) where you can find some good options, but most SAS/SATA solutions are being deprecated rapidly in favor of the NVMe ones. The Samsung SM863A should still be available for commercial/new purchase if it's required that way.

If you can create a new thread, we might be able to make some suggestions on the system architecture that would enable an NVMe SLOG. Feel free to @ tag me in there if you do.

1: https://forums.freenas.org/index.php?threads/slog-benchmarking-and-finding-the-best-slog.63521/
 

Andrii Stesin

Dabbler
Joined
Aug 18, 2016
Messages
43
Hello HoneyBadger, thank you for the suggestion. I'll look around for a Samsung SM863A maybe there is still a chance to find one. I've seen some offers of Intel DC S3700 but the prices are not friendly at all :(
Hello Andrii,

The market has shifted to NVMe with good reason ... most SAS/SATA solutions are being deprecated rapidly in favor of the NVMe ones.
Yes sure I understand this. Unfortunately, I volunteered to help one non-profit project where all I got is an old Supermicro chassis with X8 motherboard inside (not even X10) and three LSI 2108 HBAs, all donated by someone; so NVMe is not an option. But there is an ESXi with a bunch of VMs here so NFS (or iSCSI) performance means much.
If you can create a new thread, we might be able to make some suggestions on the system architecture that would enable an NVMe SLOG. Feel free to @ tag me in there if you do.
My warmest thanks for your invitation! I'll be back in a few days, right now I'm kind of busy with assembling the beast from whatever I got just as given... Regards, Andrii
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
If used gear is an option (and considering the non-profit project and donated chassis, I would probably suggest it) - I'd suggest picking up a pair of DC S3700s from eBay and mirroring them. 200GB models should be around USD$50 and yield fairly solid, if entry-level, SLOG performance. The cheaper S3500 can also work, but doesn't have the same level of write endurance as the S3700s.

My warmest thanks for your invitation! I'll be back in a few days, right now I'm kind of busy with assembling the beast from whatever I got just as given... Regards, Andrii

Looking forward to it!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
old Supermicro chassis with X8 motherboard inside (not even X10) and three LSI 2108 HBAs, all donated by someone; so NVMe is not an option.

NVMe is still probably an option. I've been putting the Addonics AD3M2SPX4 in when required, which gives an NVMe slot and also a pair of powered SATA-to-M.2 converters (which you can then hook up to SATA ports or an HBA).

https://forums.freenas.org/index.ph...donics-ad3m2spx4-m-2-to-pcie-converter.39735/

This has turned out to be extremely handy for retrofitting older servers, something we do here in moderate volume.

In a Dell R510 12-bay FreeNAS conversion, I will pull the H700i RAID controller, replacing it with an H200 crossflashed to IT mode, and then I will install one of the Addonics cards with a pair of cheap M.2 SATA SSD's, connected up to an H310 crossflashed to IR (not IT) mode, which gives maximal boot flexibility.

For hypervisors, in older SC826 systems that predate the BA rear boot bays model, I've been dropping in one (or two) of the Addonics cards to provide a RAID1 SSD datastore (or two) and hook that up to a decent ESXi-compatible RAID controller.
 

Andrii Stesin

Dabbler
Joined
Aug 18, 2016
Messages
43
If used gear is an option (and considering the non-profit project and donated chassis, I would probably suggest it)
Yes sure it is and probably the only reasonable option considering the prices of enterprise-class SSDs.
I'd suggest picking up a pair of DC S3700s from eBay and mirroring them. 200GB models should be around USD$50 and yield fairly solid, if entry-level, SLOG performance.
This is a nice idea. Given one HBA left with its 8 ports completely free (they somehow got 4 new Helium Seagate 10TB disks 2500000 MTBF for the array, so I will plug these into two HBAs and raidz these), if there are Intel DC S3700 for $50 then I will take 4 of these, plug them into the spare HBA and set them up into raid0+1. Should be fairly fast.

The question is, will raid0+1 vdev work as SLOG?

My underlying idea is - if I have an array which is capable of, say, 600 Mbytes/sec for writing, then SLOG should be at least twice as fast compared to the array. The less the difference in speed between SLOG and the main array, the less sense makes the SLOG. Am I correct?

The other question is - LZ4 makes the user-visible writing speed much higher compared to the "physical" speed of the array. When we add SLOG into the game, is SLOG written already compressed or not? This makes some difference.

p.s. BTW I started FreeBSD since 1.0-EPSILON back in 1993 and still using FreeBSD. Isn't it amazing?
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
The question is, will raid0+1 vdev work as SLOG?

My underlying idea is - if I have an array which is capable of, say, 600 Mbytes/sec for writing, then SLOG should be at least twice as fast compared to the array. The less the difference in speed between SLOG and the main array, the less sense makes the SLOG. Am I correct?

You are incorrect. SLOG is not a write cache. SLOG performance isn't really dependent on the write SPEED of the device. It's mostly about the latency. Take a single SSD and use it as your SLOG device. Keep a warm spare available in case the SLOG fails. Adding any more layers or complexity means that there is additional latency.

p.s. BTW I started FreeBSD since 1.0-EPSILON back in 1993 and still using FreeBSD. Isn't it amazing?

Only a little. It's a decent operating system. Some of us go back to 386BSD, and then other stuff before that even.
 

Andrii Stesin

Dabbler
Joined
Aug 18, 2016
Messages
43
You are incorrect. SLOG is not a write cache.
Hmm, did I ever say a word "cache"?

As far as I understand, the faster the (external) party which initiated the synchronous write receives the success confirmation from the NAS, the better. This is not a matter of "caching" but of the storage' latency observed by the writing party (in my case, ESXi VM over NFS).
SLOG performance isn't really dependent on the write SPEED of the device. It's mostly about the latency.
Yes, definitely.
Take a single SSD and use it as your SLOG device.
It's Ok if there is only a single writing party. I expect about 30 VMs writing simultaneously, maybe 100-200 after 1 year (how do I know?).

So, suppose we have one serial (SATA is serial) interface with one SSD on it for SLOG. So all synchronous writes from all (numerous) parties will become forcibly serialized into a queue of this (serial) interface, thus latency will increase. Because where the queue is, there the latency comes, too. But if I have SLOG access split among 2 SATA wires, I obtain a bit of parallelism, correct? It will decrease latency by a factor of (almost) two, isn't it?

OK, if I can split the long queue of synchronous writes to SLOG into 2 shorter queues, then why can't I split it into 4 even shorter queues?

The question is, will raid0+1 vdev work for SLOG or not. I've never tried this setup before so I am exploring the possibilities before starting the actual experimentation. Unfortunately, I can't justify the decent hardware (like pre-owned HP DL Gen9) and enterprise-class NVMe SSD backed with (super)capacitors for this exact project... Next time, maybe.
Keep a warm spare available in case the SLOG fails. Adding any more layers or complexity means that there is additional latency.
The question is, which of the layers adds more latency - raid0+1 or just SATA interface queue? I presume SATA is the bottleneck here and combined with the limited writing speed of SLOG device it is the main source of latency. Please correct me if I am wrong. CPU cycles are not a problem, RAM bandwidth is not a problem, too. The network is fast enough (a pair of 10GE in LACP trunk between the ESXi and FreeNAS, the trunk is shared among all the herd of writing parties so each VM may occasionally obtain the bandwidth of 1 Gbyte/sec for its synchronous writing). While one single good SATA SSD may show about 0.5 Gbyte/sec if you are lucky enough.
Only a little. It's a decent operating system. Some of us go back to 386BSD, and then other stuff before that even.
386bsd refused to boot on my 386DX at the time, so I waited for the 1.0-EPSILON distribution (it came on 5" floppies) and it worked! :) My PDP-11 days were already gone at that moment.
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
It's Ok if there is only a single writing party. I expect about 30 VMs writing simultaneously, maybe 100-200 after 1 year (how do I know?).

When you get to the point of that many VMs, you need to have a serious conversation with them about providing you with capital to build a SAN that can support the workload. And at that point, hopefully you're into the world of hot-swappable NVMe U.2 bays.

So, suppose we have one serial (SATA is serial) interface with one SSD on it for SLOG. So all synchronous writes from all (numerous) parties will become forcibly serialized into a queue of this (serial) interface, thus latency will increase. Because where the queue is, there the latency comes, too. But if I have SLOG access split among 2 SATA wires, I obtain a bit of parallelism, correct? It will decrease latency by a factor of (almost) two, isn't it?

OK, if I can split the long queue of synchronous writes to SLOG into 2 shorter queues, then why can't I split it into 4 even shorter queues?

Here's a question for you: How quickly would you be able to connect to either the web UI or a console shell of the FreeNAS machine in case of an SLOG device failure?

If you have a single SLOG and a large sync write workload, and there is a failure, sync writes will grind to a halt and you're very likely (almost guaranteed) to have all VMs on the affected storage grind to a halt and become unavailable, unresponsive, and possibly crash entirely. This is what a mirrored SLOG is designed to protect against.

In a 2-drive SLOG mirror, ZFS will be waiting for the top-level log vdev to return "write complete" before it acknowledges back up the chain. And in order for that to return, you'll have to write the same data down two separate wires. So there's no parallelism to be gained there; in fact, you'll be limited by whichever of the two drives is slower. Yes, identical drives would in theory be just as fast - but if one of them is in the middle of garbage collection, TRIMming a block, or something else, you'll be waiting that extra few microseconds.

With a "striped" SLOG (or a 4-drive "stripe of mirrors") then it will still be waiting for the top-level log vdev - the difference being that there will be two vdevs underneath that can serve two separate sync writes. It does increase the effective latency, because you're now "serving two requests at once" but it's not as simple as being "2x the speed."

The question is, will raid0+1 vdev work for SLOG or not.

Short answer: Yes, it will work. You may have to manually configure it through the command line though, because I believe that FreeNAS still configures multiple drives added at once as a log mirror vdev, and single drives will be a stripe each time. (I'd have to check to confirm though.)

The question is, which of the layers adds more latency - raid0+1 or just SATA interface queue? I presume SATA is the bottleneck here and combined with the limited writing speed of SLOG device it is the main source of latency. Please correct me if I am wrong. CPU cycles are not a problem, RAM bandwidth is not a problem, too. The network is fast enough (a pair of 10GE in LACP trunk between the ESXi and FreeNAS, the trunk is shared among all the herd of writing parties so each VM may occasionally obtain the bandwidth of 1 Gbyte/sec for its synchronous writing). While one single good SATA SSD may show about 0.5 Gbyte/sec if you are lucky enough.

The main bottleneck is the write latency of the SLOG device, which is the major contributor.

Your next bottleneck is the protocol (assuming an equally fast device, NVMe would beat SATA, which would beat SAS).

And finally your last bottleneck would be any ZFS overhead from SLOG vdev layout.

In regards to your other thread's question, compression is applied to the data before it is written to SLOG, so you will gain the "effective write speed" benefits there as well.
 

Andrii Stesin

Dabbler
Joined
Aug 18, 2016
Messages
43
Thank you for your answer, my friend! Let's look at it word by word.
When you get to the point of that many VMs, you need to have a serious conversation with them about providing you with capital to build a SAN that can support the workload. And at that point, hopefully you're into the world of hot-swappable NVMe U.2 bays.
I agree. But right now I need to start with what is given to me.
Here's a question for you: How quickly would you be able to connect to either the web UI or a console shell of the FreeNAS machine in case of an SLOG device failure?
From immediately to half an hour. It depends. But this is not the bank or nuclear power station or something mission-critical. They will forgive me a few hours of downtime. Though I don't want it to happen anyway.
If you have a single SLOG and a large sync write workload, and there is a failure, sync writes will grind to a halt and you're very likely (almost guaranteed) to have all VMs on the affected storage grind to a halt and become unavailable, unresponsive, and possibly crash entirely. This is what a mirrored SLOG is designed to protect against.
Thank you. I have done this kind of projects before and in mission-critical environments, too. I know the pool which I'm entering to swim inside.
In a 2-drive SLOG mirror, ZFS will be waiting for the top-level log vdev to return "write complete" before it acknowledges back up the chain. And in order for that to return, you'll have to write the same data down two separate wires. So there's no parallelism to be gained there; in fact, you'll be limited by whichever of the two drives is slower.
Yes, that's why I am speaking raid0+1 - so the writes are going into 2 parallel wires to return the success code to the party which initiated the synchronous write request.
Yes, identical drives would, in theory, be just as fast - but if one of them is in the middle of garbage collection, TRIMming a block, or something else, you'll be waiting that extra few microseconds.
Good point. I will think of it. But note, I'm speaking 4 (four) SSDs raid0+1 and I know how to TRIM so no garbage collection will happen during the working hours.
With a "striped" SLOG (or a 4-drive "stripe of mirrors") then it will still be waiting for the top-level log vdev - the difference being that there will be two vdevs underneath that can serve two separate sync writes.
Yes, and I think it will already a gain (though thanks, my idea of splitting the queue of sync writes into 4 queues was wrong, actually it will be split into 2 queues).
It does increase the effective latency because you're now "serving two requests at once" but it's not as simple as being "2x the speed."
Hmm. Now I am curious. With a single SSD, I get a writing queue of length Ν. With raid0+1 I get the writing queue split into 2 queues of length N/2. The latency should decrease? or not?
Short answer: Yes, it will work. You may have to manually configure it through the command line though
Thank you, for me, this is zero problem, I'm familiar with UNIX command line since about 2.9BSD, like this.
The main bottleneck is the write latency of the SLOG device, which is the major contributor.
Yes, I acknowledge this, and this is exactly what I'm researching now.

Let me explain to you some tech details of the whole project. There are some people who are doing (and teaching, and learning) some heavy computer graphics. I neither know nor I am willing to share the exact details. Right now I have about 30 users of this stuff waiting for it to start working. They decided to acquire a single shared resource, namely AMD FirePro™S9150 (or 9170?) mega-GPU, and to put it inside the externally hosted ESXi server and use it from there. So the use case is:
  • the user is sitting elsewhere, with whatever device he has, I don't care,
  • the user starts his personal VM remotely inside ESXi,
  • the user uploads some heavy files to his VM (I decided to store all user data outside of his VM's vmdk but on the FreeNAS instead) - the first case of workload, but this is the light one because the user's Internet connection is 100Mbps or less,
  • the user launches some software inside his VM which is dealing with these graphics files, I don't really care what the software it would be,
  • the software inside the VM reads the data from the FreeNAS (this is easy), processes it inside ESXi VM using the (shared) mega-GPU, and stores the processed heavy files back onto FreeNAS, and this is the hard part of the story, and I have no clues how good the compression of the user's data will be (0% or 30% or 50% - no clues as of now),
  • multiply this use case by 30+ users already, in parallel, more to come.
The users want it to work fast and safely, and their data to be safe.
Your next bottleneck is the protocol (assuming an equally fast device, NVMe would beat SATA, which would beat SAS).
Right now, I am limited to what was given to me - Supermicro chassis with old X8 motherboard, 2 Xeons, 48 GB of RAM and 3 LSI 2108 HBAs. NVMe? Maybe next time.
And finally your last bottleneck would be any ZFS overhead from SLOG vdev layout.
Huh, finally, we got there. Will the raid0+1 SLOG layout decrease the latency by a factor of (almost) two, or not?
In regards to your other thread's question, compression is applied to the data before it is written to SLOG, so you will gain the "effective write speed" benefits there as well.
Wow that's great! Thank you!

Another (theoretical) case - SLOG on 4 striped raid0 SSDs. Will it be faster, or not?
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Thank you for your answer, my friend! Let's look at it word by word.

Responding to your questions with some context added in bold.

(How long until you can get to the array?) From immediately to half an hour. It depends. But this is not the bank or nuclear power station or something mission-critical. They will forgive me a few hours of downtime. Though I don't want it to happen anyway.

Downtime would be acceptable, corruption wouldn't be. If you have room for four SAS/SATA devices it will be moot since you will be able to run "striped mirror" and receive both benefits.

(On mirrors of identical drives potentially having performance differences) Good point. I will think of it. But note, I'm speaking 4 (four) SSDs raid0+1 and I know how to TRIM so no garbage collection will happen during the working hours.

ZFS will TRIM the drives on its own - you won't be sending the command equivalent to fstrim manually.

On this note, if you purchase the planned Intel DC S3700 drives I would strongly suggest that you use the Intel SSD Data Center Toolbox (isdct) - https://downloadcenter.intel.com/product/87278/Intel-SSD-Data-Center-Tool - to both change the logical sector size to 4K and to massively increase the overprovisioning on the drives - you'll really only need 8GB available on each, most likely. This will increase both performance and endurance.

Yes, and I think it will already a gain (though thanks, my idea of splitting the queue of sync writes into 4 queues was wrong, actually it will be split into 2 queues).

Hmm. Now I am curious. With a single SSD, I get a writing queue of length Ν. With raid0+1 I get the writing queue split into 2 queues of length N/2. The latency should decrease? or not?

It will decrease the latency, but you should expect some overhead - you won't simply get 200% of the performance by having two SLOG drives striped. It should be close to it though.

(You may need to use the command line to create the "complex SLOG") Thank you, for me, this is zero problem, I'm familiar with UNIX command line since about 2.9BSD, like this.

Just so long as you know that you're in a very small group of other users now - most FreeNAS users don't have an SLOG at all, some have mirrored SLOG, but very few have a striped-mirror SLOG.

(latency derived from the "complex SLOG") Huh, finally, we got there. Will the raid0+1 SLOG layout decrease the latency by a factor of (almost) two, or not?

Short answer: Yes. How close that "almost" will be? Depends on the drives, your system, your HBA, the phase of the moon, etc. Testing will bear it out. With some older HGST SLC SAS drives I saw about 1.8x, I believe. I don't have them anymore to re-benchmark with.

Wow that's great! Thank you!

You're welcome. Please feel free to share the completed build with us in a new thread once it is done.
 

Andrii Stesin

Dabbler
Joined
Aug 18, 2016
Messages
43
Thank you again, my friend.
Downtime would be acceptable, corruption wouldn't be.
Yes, they will BBQ me, if it happens.
If you have room for four SAS/SATA devices
I have the room. The chassis is 24x3.5" slots, as of today they are still deciding on the mechanical drives to purchase (5x10TB raidz or 8x8TB raidz2) but it's not the problem of mine.
it will be moot since you will be able to run "striped mirror" and receive both benefits.
I hope so.
ZFS will TRIM the drives on its own - you won't be sending the command equivalent to fstrim manually.
Yes and I'm Ok with it.
On this note, if you purchase the planned Intel DC S3700 drives I would strongly suggest that you use the Intel SSD Data Center Toolbox (isdct) - https://downloadcenter.intel.com/product/87278/Intel-SSD-Data-Center-Tool - to both change the logical sector size to 4K and to massively increase the overprovisioning on the drives - you'll really only need 8GB available on each, most likely. This will increase both performance and endurance.
Thanks! This is one valuable suggestion.
It will decrease the latency, but you should expect some overhead - you won't simply get 200% of the performance by having two SLOG drives striped. It should be close to it though.
I'm in the fight for decreasing the latency of synchronous writes. Overhead? I don't care. The chassis has enough computing power.
Just so long as you know that you're in a very small group of other users now - most FreeNAS users don't have an SLOG at all, some have mirrored SLOG, but very few have a striped-mirror SLOG.
In my life, I did never look for easy ways to go. BTW once in my professional life, FreeNAS saved the whole register of the whole private property of one single country.
Short answer: Yes. How close that "almost" will be? Depends on the drives, your system, your HBA, the phase of the moon, etc.
Yes, I know. Right now I am in the phase of design, and the real life will show many different use cases... later.
Testing will bear it out. With some older HGST SLC SAS drives I saw about 1.8x
1.8? Not that bad.
You're welcome. Please feel free to share the completed build with us in a new thread once it is done.
Sure I will. The mega-GPU is ordered already and will arrive before March, 5 (they hope), all of a sudden the supply of these is very limited. And then we'll see was I correct or not. Thanks once again!
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
I have the room. The chassis is 24x3.5" slots, as of today they are still deciding on the mechanical drives to purchase (5x10TB raidz or 8x8TB raidz2) but it's not the problem of mine.

If you intend to host VMs on this you should immediately discard any idea of running RAIDZ and focus solely on mirror vdevs. To generalize, VM performance scales with "vdev count in pool" not "drive count in vdev." Propose 12x8TB as an alternative.

I'm in the fight for decreasing the latency of synchronous writes. Overhead? I don't care. The chassis has enough computing power.

1.8x is better than 1.0x certainly, I'm just advising that you shouldn't expect 2.0x. At some point you will hit the inherent limitations of the SAS/SATA bus, and to go any faster will require you to go to NVMe/PCIe. :)

In my life, I did never look for easy ways to go.

Nothing good comes easy; and I expect you will see good things from this project.
 

Andrii Stesin

Dabbler
Joined
Aug 18, 2016
Messages
43
If you intend to host VMs on this you should immediately discard any idea of running RAIDZ and focus solely on mirror vdevs. To generalize, VM performance scales with "vdev count in pool" not "drive count in vdev." Propose 12x8TB as an alternative.
VMs are to be running on the ESXi separate server, the one which holds the mega-GPU. Their vmdk's are to live there, on the own SSD of ESXi server chassis; FreeNAS will serve them as backup storage for vmdk's and act as real-time safe user's datastore.
1.8x is better than 1.0x certainly, I'm just advising that you shouldn't expect 2.0x.
Thank you I understand that 2x is a very very rough estimation ;)
At some point you will hit the inherent limitations of the SAS/SATA bus, and to go any faster will require you to go to NVMe/PCIe. :)
This decision would not be of mine. If the guys will enjoy this setup, and if they will desire something more - I'm always in :) They are welcome (with the appropriate budget, of course). This project is just a PoC.
Nothing good comes easy; and I expect you will see good things from this project.
I hope so, too :)
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
VMs are to be running on the ESXi separate server, the one which holds the mega-GPU. Their vmdk's are to live there, on the own SSD of ESXi server chassis; FreeNAS will serve them as backup storage for vmdk's and act as real-time safe user's datastore.

Understood. I would still recommend mirrors in this case, as they are by far the best option for heavy concurrent I/O. 30 streams of "sequential writes" coming in at the same time will essentially be "random data" to the array.

I hope so, too :)

I wish you the best of luck for a successful PoC that generates budget for a bigger project in the future then. :)
 

Andrii Stesin

Dabbler
Joined
Aug 18, 2016
Messages
43
I would still recommend mirrors in this case, as they are by far the best option for heavy concurrent I/O. 30 streams of "sequential writes" coming in at the same time will essentially be "random data" to the array.
From my practice I recall that raidz2 array of 10 SATA HDDs is capable of some incredible performance, like 1.6 GigaBYtes/sec or maybe even more (didn't do the thorough testing at the moment, it was "good enough" for the purpose so Ok). Why do you think that plain mirroring would be faster, I wonder. Would you mind explaining your idea, please? Thanks!
I wish you the best of luck for a successful PoC that generates budget for a bigger project in the future then. :)
I think we'll get this first part of the story closed at around the late March. Just wait for it :)
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
From my practice I recall that raidz2 array of 10 SATA HDDs is capable of some incredible performance, like 1.6 GigaBYtes/sec or maybe even more (didn't do the thorough testing at the moment, it was "good enough" for the purpose so Ok). Why do you think that plain mirroring would be faster, I wonder. Would you mind explaining your idea, please? Thanks!
RAIDZ will perform very well for single-user sequential access with large recordsizes on a clean pool. As soon as you start getting free space fragmentation, simultaneous read/write access, more than one read/write user, you start demanding random I/O from your disks, and random I/O scales with the number of vdevs in the pool, not the number of disks in the vdevs.
 

Andrii Stesin

Dabbler
Joined
Aug 18, 2016
Messages
43
UPD. The guys ended up with the purchase of two (not four) Samsung 860 Evo-Series 250GB 2.5" SATA III V-NAND TLC (MZ-76E250BW) for SLOG, these will arrive next week. (Someone found these for a bargain price and donated). This is considered to be a temporary solution for a year or two. Ok, this is what is given. Now again I am thinking about how to set it all up to obtain the lowest synchronous write latency possible. Mirror probably? The stripe is too risky, I think...

The other question is, does it worth the effort to overprovision the SSDs to, say, 50GB of "visible" space using the "Host Protected Area" feature. Will FreeNAS kernel (and ZFS) honor the HPA or not?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Not to sound unappreciative, but make sure you thank the donor for the new paperweights/desk ornaments; they're far more suited there rather than pretending to be useful SLOGs.

You can certainly overprovision the drives via HPA and FreeBSD/ZFS respects it, but that won't address the lack of power loss protection circuitry on the drives itself.

They'll make fine L2ARC devices, but quite bluntly they are not fit for purpose as SLOG.
 

Andrii Stesin

Dabbler
Joined
Aug 18, 2016
Messages
43
Dear HoneyBadger, you are perfectly right. But... They think that having two power supplies in the data center with two different power lines with different UPSes on each is good enough protection, and they (all of a sudden) care more about the "speed" (exactly latency) than about the (small) possible data loss. What else can I say?
 
Top