Anyone found an awesome SLOG device?

Status
Not open for further replies.

rwhitlock

Dabbler
Joined
Aug 17, 2015
Messages
13
Apparently, not possible solutions

Marvell WAM
Marvell Dragonfly
Netapp 16GB PCIe PISCES Accelerator


Hopes and dreams in order of awesomeness

Intel 3D XPoint DIMM (Mysterious - only able to find sales BS = the most awesomest)
NVvault DDR3 NVDIMM
PMC Flashtec NV1608
EXPRESSvault PCIe (EV3)
Mark 1 RAMDisk
ULLtraDIMM


Over 1000 bucks / too damn expensive / not wife friendly

ZeusRAM
Cloud Disk Drive 101 RAMDisk


Just might work!

RAID controller /w large Cache i.e. ARC-1883IX-12-8GB /w ARC-1883-CAP


Once was good… maybe

Curtis HyperHD \ ACARD ANS-9010
Gigabyte i-RAM
ddrdrive X1
OCZ AEON Drive


Would make a good L2arc / flash based

LSI Nytro WarpDrive
Fusion IO
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
I have several of those ANS-9010's around. They do work very well. The key is "how much throughput are you expecting?"

If you've got a single 1Gb network link, you definitely can't do more than about 130MB/sec no matter what, so even something like a Gigabyte i-RAM should work well despite being 1.5Gb/sec. Even buying an intel enterprise grade SSD will be more than enough.

Do keep in mind that if something like an i-RAM loses power and the battery fails, you'll have some problems to deal with. ;)
 

rwhitlock

Dabbler
Joined
Aug 17, 2015
Messages
13
The nerd in me wants to receive feedback on slog devices and provide the community with a conclusive NV, BBU, and FBWC solutions list. That being said, I value your opinion Cyberjock so, my initial configuration will be utilizing dual 4Gbs FC with MPIO which should yield ~1.6GB according to the specifications. I am expecting about half that. I will eventually upgraded to 8Gb FC or 10Gb iscsi.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
So you expect to have 800MB/sec of sync writes every second?
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
At this time we use:

L2ARC: SanDisk SDSSDHII-120G

ZIL: SanDisk SD6SB1M-064G-1022I

Note that we can change that at any time. ;)
 

rwhitlock

Dabbler
Joined
Aug 17, 2015
Messages
13
Cyberjock: So you expect to have 800MB/sec of sync writes every second?

Cyberjock your weighted question sounds like a trap my wife would make. LOL. It is my understanding the SLOG needs to outperform the lowest common denominator. If you’re working set is larger than your ARC and you request un-cached data your disks will be your bottleneck. Unless your storage interface is lower than your pool, in which case upgrade your NIC, FC, or what-have-you. I plan to have mirrored vdevs. It is my understanding, ZFS writes to each vdev in parallel, and the write process is complete when all disks are finished. The pool’s write performance will be the performance of the crappiest drive per vdev. All my drives are the same, so since I have 4 vdevs, I should be able to write to the pool at the speed of 4 drives. I plan to add 4 more disks for a total of 6 mirrored vdevs. Each disk yields ~150MB/s so the write speed to the pool should be around 900MB (150*6) max. My working set is rather small so I expect a high hit cache ratio. If this is correct my average throughput could be 1GB/s+ on a good day. Now will my VMs actually require this performance? Are you familiar with Billy Idol’s song Rebel Yell? Sync=always will be set, so… yes I do expect 800MB/s of sync writes.
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Well, if you're expecting that kind of performance I really think you should call iXsystems. Not because I want you to pay more, but you're going to need MASSIVE tweaking to the system if you think that you're going to get those kinds of speeds with 6 vdevs. I've seen people with 15 vdevs not get 800MB/sec. under real-world conditions.

In fact, I've only seen one system do 800MB/sec. They had 40Gb LAN and were an all-SSD system with 10 vdevs, 4 L2ARCs, and something like 384GB of RAM.

I'm really not sure why you are mentioning things like working set, ARC, arc hit cache ratio, etc. Even at 10Gb LAN, you are talking just barely over 1GB/sec. assuming you are using RAM exclusively and never have to go to the zpool for anything.

I think your expectations for what your hardware is capable of doing is woefully overestimated by an order of magnitude, at least. Granted, you didn't include any hardware specs. But unless you're about to tell me you have 512GB of RAM, I can pretty much guarantee you that you aren't going to get 1GB/sec throughput and be able to keep that speed for more than a second or two, and only for the purposes of a benchmark and not real-world performance.
 

rwhitlock

Dabbler
Joined
Aug 17, 2015
Messages
13
Ok I want my expectations to be in line with reality. It also was not my intention to come off defensive, cocky, or all knowing. You are a wealth of information and I believe I need to ask the right questions with enough supporting information to get an adequate response. I have read about ZFS and more specifically FreeNAS forums for 1000s of hours over many years. What is apparent to me now is ZFS must have overhead that is not well defined in the FreeNAS manual, Oracle’s white papers, or FreeBSD knowledge base.

It is my fault for not including my use-case and specs in my previous reply. I’m working on my signature. I honestly did not want this thread to turn into my specific problem/situation and wanted to keep it focused on SLOG devices.

I am very interested in the interworking of ZFS to accurately hypothesize the hardware required for a desired result.

I'm really not sure why you are mentioning things like working set, ARC, arc hit cache ratio, etc.


Your quote above had the most striking impact of my fundamental core understanding of ZFS. It is daunting to me not to see how things like working set, ARC, arc hit cache ratio, etc. are not relevant. In other words, I think they are the most relevant aspect.

But unless you're about to tell me you have 512GB of RAM, I can pretty much guarantee you that you aren't going to get 1GB/sec throughput and be able to keep that speed for more than a second or two, and only for the purposes of a benchmark and not real-world performance.


This conveys to me the speed at which data is hosted is directly attributed to RAM size regardless of working set.

I've seen people with 15 vdevs not get 800MB/sec. under real-world conditions. In fact, I've only seen one system do 800MB/sec. They had 40Gb LAN and were an all-SSD system with 10 vdevs, 4 L2ARCs, and something like 384GB of RAM.


Why does a real world experience have this limitation? What is this elusive overhead? How much stuff was stored? What percentage of the pool was being utilized? Why would this system not have throughput around 3.5 GByte/sec?



My brief understanding: Similar to the OLTP world, ZFS utilizes a comparable system called adaptive replacement cache (ARC). This methodology utilizes an ingenious algorithm to cache recently used and frequently used pages. The pages consist of two groups of data, metadata and working set. The metadata portion is similar to an allocation table. This is commonly calculated at 1GB for every 1TB of data. The default maximum metadata utilization is 1/4th of the ARC. The other 75% of the ARC holds your working set which is the actual cached reads from the pool. These values of 75% and 25% are minimums and maximums respectively and are dynamic. I.e. if you don’t have many files then the metadata portion of the ARC won’t utilize the maximum of 1/4th of ARC.

If you are not getting enough page hits (too many direct reads from spinning rust) you’re probably ARC/RAM deficient. This is represented in a poor hit cache ratio. You could possibly add L2ARC however, it requires metadata as well which eats a portion of your ARC. Not all of your RAM is dedicated to ARC. FreeNAS has OS, jails, and other stuff that requires resources. My guess is the OS overhead is typically around 1 to 4GB.

If the above is true, and you have a sufficient processor, then why if a file is cached in ARC is the bottleneck not your network/FC interface? I mean, what else is holding the system back?



To finish my point and give you and my fellow forum members something to poke holes in. Let’s create a fake use-case. We have a bare metal server running mssql locally. The OS is installed on local drives. The data drive on this server is hosted on a FreeNAS box utilizing 10GB iSCSI. This SQL server has one set of 4 databases totaling 8GB in size that reside on the FreeNAS box. The FreeNAS box is not hosting anything else. It has an adequate SLOG device to keep up. It has 2 mirrored vdevs using standard 10k SAS drives. The drives are 146GB each. This yields ~230GB useable disk space with overhead. The storage usage is about 3.5%, so poor performance from iSCSI fragmentation should not be an issue. If we use the methodology above and some slight fudging would the equation below not make sense?

4 (OS overhead) + 1 (metadata) + 11 (working set+ some) + 32 (iSCSI overhead) + 16 (why not) = 64GB of ram on FreeNAS

Would this not yield kick ass database performance without 512GB of ram? If the performance degradation lies with iSCSI screw it use NFS! Why would you not expect to see over 1GB of throughput here? I believe in this situation the ghosts lists won’t have any hits even with iSCSI flaws.



Cyberjock, you answer me so fast. I have a busy life with 3 kids and must administer the infrastructure at my employer’s company. My company unfortunately does not use ZFS. Please take a few days or a week to get back to me. No rush! How is your book going? I would like to buy it.

Thanks,

Ryan Whitlock
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Ok I want my expectations to be in line with reality. It also was not my intention to come off defensive, cocky, or all knowing. You are a wealth of information and I believe I need to ask the right questions with enough supporting information to get an adequate response. I have read about ZFS and more specifically FreeNAS forums for 1000s of hours over many years. What is apparent to me now is ZFS must have overhead that is not well defined in the FreeNAS manual, Oracle’s white papers, or FreeBSD knowledge base.

ZFS does have overhead. I generally tell people "ZFS protects your data at virtually any and all costs". Those costs translate into needing more power than just some old Pentium 4 with 512MB of RAM. The more performance you want, the more hardware you'll need to buy. It is linear to a point, then it becomes exponentially more expensive.

It is my fault for not including my use-case and specs in my previous reply. I’m working on my signature. I honestly did not want this thread to turn into my specific problem/situation and wanted to keep it focused on SLOG devices.

Unfortunately, use case is extremely important. In fact, it may be the most important thing in determining if you even can use an slog. If you plan to use CIFS exclusively, adding an slog will be pointless. The hardware won't even be accessed.. ever. So very, very, very important. Cannot be understated.


I am very interested in the interworking of ZFS to accurately hypothesize the hardware required for a desired result.

I unfortunately will disappoint you. This forum really isn't a good place to get that info. It is *a* place to get the info, but that's not something I'm about to jump into on this forum except in small quantities. I'm not into writing books and posting them on the forum. ;) Sorry.

Your quote above had the most striking impact of my fundamental core understanding of ZFS. It is daunting to me not to see how things like working set, ARC, arc hit cache ratio, etc. are not relevant. In other words, I think they are the most relevant aspect.

They aren't. And as you learn more about ZFS you'll figure out what stuff is important and what's not. Nothing wrong with being wrong (and I'm wrong sometimes because of stuff constantly being added and/or changed in OpenZFS).

This conveys to me the speed at which data is hosted is directly attributed to RAM size regardless of working set.

Totally incorrect. If your working set (regardless of the size) can fit in the ARC (regardless of the size of the ARC) you will see maximum performance. Why? Because all of the regularly accessed data is in RAM. How much faster do you want than grabbing data from RAM? But, if the working set is even 1% bigger than the ARC, there is a significant performance hit. It's also an exponential cliff.

Why does a real world experience have this limitation? What is this elusive overhead? How much stuff was stored? What percentage of the pool was being utilized? Why would this system not have throughput around 3.5 GByte/sec?

For that you'd have to understand all of the underpinnings of ZFS, etc. Far beyond the scope of a forum post.

My brief understanding:
Similar to the OLTP world, ZFS utilizes a comparable system called adaptive replacement cache (ARC). This methodology utilizes an ingenious algorithm to cache recently used and frequently used pages. The pages consist of two groups of data, metadata and working set. The metadata portion is similar to an allocation table. This is commonly calculated at 1GB for every 1TB of data. The default maximum metadata utilization is 1/4th of the ARC. The other 75% of the ARC holds your working set which is the actual cached reads from the pool. These values of 75% and 25% are minimums and maximums respectively and are dynamic. I.e. if you don’t have many files then the metadata portion of the ARC won’t utilize the maximum of 1/4th of ARC.

Not entirely correct. It is possible for the metadata to exceed that 1/4 value (either because you change the tunable) or because the system is stressed and ZFS has decided to go bigger.


If you are not getting enough page hits (too many direct reads from spinning rust) you’re probably ARC/RAM deficient. This is represented in a poor hit cache ratio. You could possibly add L2ARC however, it requires metadata as well which eats a portion of your ARC. Not all of your RAM is dedicated to ARC. FreeNAS has OS, jails, and other stuff that requires resources. My guess is the OS overhead is typically around 1 to 4GB.

Basically, this is correct. There's caveats and exceptions, but basically that is correct.

If the above is true, and you have a sufficient processor, then why if a file is cached in ARC is the bottleneck not your network/FC interface? I mean, what else is holding the system back?

You are limited to your weakest link. What if you have 20ms of latency between the server and the system? Your file server is directly affected by performance of other externalties. You've also oversimplified things because things like sync writes will directly interrupt any other work in process on the zpool to update the slog/zil. If you have no slog you just interrupted whatever other work was in progress to handle a sync write. If you work at a desk and every 10 minutes you get a phone call that interrupts you, even if its just for a minute or two, doesn't that slow you down more than just the time of the call? ZFS works the same way.

To finish my point and give you and my fellow forum members something to poke holes in. Let’s create a fake use-case. We have a bare metal server running mssql locally. The OS is installed on local drives. The data drive on this server is hosted on a FreeNAS box utilizing 10GB iSCSI. This SQL server has one set of 4 databases totaling 8GB in size that reside on the FreeNAS box. The FreeNAS box is not hosting anything else. It has an adequate SLOG device to keep up. It has 2 mirrored vdevs using standard 10k SAS drives. The drives are 146GB each. This yields ~230GB useable disk space with overhead. The storage usage is about 3.5%, so poor performance from iSCSI fragmentation should not be an issue. If we use the methodology above and some slight fudging would the equation below not make sense?

4 (OS overhead) + 1 (metadata) + 11 (working set+ some) + 32 (iSCSI overhead) + 16 (why not) = 64GB of ram on FreeNAS

Would this not yield kick ass database performance without 512GB of ram? If the performance degradation lies with iSCSI screw it use NFS! Why would you not expect to see over 1GB of throughput here? I believe in this situation the ghosts lists won’t have any hits even with iSCSI flaws.

In your hypothetical example, you have to read the data from the zpool for it to be in the ARC, and it has to have not been expired from the ARC. Have both of those conditions been met? If not, how do you make those conditions met. I can tell you that nobody has a system that magically has the SQL databases loaded into ARC from the get-go. So someone has to take that penalty and do the reading and/or writing.

So lets assume you need to read the entire database into the ARC, but the seek times of those 10kRPM drives will be your limiting factor. Solution: Go all SSD. Now thing are going to be much faster because from the time your SQL machine makes the query for data from the tables, to the time it goes over iSCSI, NFS, CIFS, etc, to the time it takes the zpool to read the data from the zpool's spinning rust (or the ARC if you are lucky) plus the time to respond back through the protocol, network stack, network infrastructure, you've just lost precious ms that the SQL database spent sitting around waiting for data.

Cyberjock, you answer me so fast. I have a busy life with 3 kids and must administer the infrastructure at my employer’s company. My company unfortunately does not use ZFS. Please take a few days or a week to get back to me. No rush! How is your book going? I would like to buy it.

Yeah, the book is on hold. Life is just too busy for me. Heck, I'm having to force myself to make time for the forums lately. :P

HTH!
 

rwhitlock

Dabbler
Joined
Aug 17, 2015
Messages
13
Cyberjock’s sorry for taking so long to get back. My kids and I have been sick.

I was driving to work today and a light bulb went off. (Damn, now I have to stop by AutoZone) Just wanted to validate my light bulb.

In the above posts we concluded there are many potential bottlenecks. Here is the testing methodology I thought of.

If you were to set sync=disabled temporarily as a test this is the performance of

data from the tables, to the time it goes over iSCSI, NFS, CIFS, etc, to the time it takes the zpool to read the data from the zpool's spinning rust (or the ARC if you are lucky) plus the time to respond back through the protocol, network stack, network infrastructure

This would give you a base line of X. If you expect your performance to be greater than X find bottleneck and fix. Rinse and repeat until satisfied. Once you have achieved a value for X you are satisfied with, purchase the appropriate preforming SLOG device to match the performance of X.


For people who have achieved their expected X value but plummets to Y when adding a SLOG I have attached a spread sheet to help them decide.
 

Attachments

  • slog.zip
    13.4 KB · Views: 372

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
The quote you made was reference to reads.. which are NOT affected by the slog directly. But for writes, the basic premise is there.
 
Status
Not open for further replies.
Top