Should I bother upgrading my ZIL SSD?

someone1 · May 21, 2014

Yet another ZFS performance thread... (Sorry!)

I am creating a ISCSI storage server for use in a failover clustered Hyper-V environment. Barring Bug #4003 preventing FreeNAS from actually being a suitable host for Server 2012+ Hyper-V servers (something I wish I realized sooner), I am trying to iron out any performance issues/tweaks in hopes that in a few months the bug will be fixed.

Server:
4x 1TB Seagate Constellation ES.3 SAS 6 GB/s 128MB cache @ ZFS Raid 10
LSI 9211-8i 6 GB/s - Flashed for IT Mode
Intel Xeon E5-2609V2
2x Crucial 16GB DDR1600 ECC RAM
SYS-5027R-WRF - 2u SuperMicro Chassis w/ redundant PSU + X9SRW-F LGA 2011 Mobo
Intel E1G44HTBLK 4x GBIT NIC

Dedicated Storage Switch:
Netgear 16-port gigabit unmanaged (GS116) (All attached devices setup for jumbo frames)

I had a spare OCZ Vertex 450 128GB laying around that I partitioned out 8GB as a SLOG for the pool. I do NOT use the SSD for L2ARC or anything else. I use lz4 compression on the entire pool and have dedicated a 1.5TB zvol (128K blocksize) for use as VM storage and a 1GB zvol (8K block size) as a quorum. I am using under 85% of available space on the pool.

Using IOMETER on the multi-path enabled ISCSI device, I can see that with sync=always I get about ~500 IOPS with my current build. If sync=standard, that shoots up to ~5000 IOPS. I've already tested my network, bench marked my zpool, and am fairly certain any decrease in performance is due to the sync=always settings (and after lurking the forums for months [years] I see that is the recommended setting for data integrity)

So this brings me to my question: If the problem is with my SLOG device, should I bother upgrading it to another "consumer" grade SSD?

I always see people hating on OCZ SSDs so I was thinking of picking up a Samsung 840 Pro (256GB) and partitioning out 8GB as a SLOG for the pool. Or maybe an Intel SSD (any recommendations?) The specs on paper aren't that much better than the current OCZ SSD I have and I do not have it in my budget to get something like ZeusRAM or a STEC SSD.

If anyone has any other tweaks or suggestions to try to increase performance, I'm all ears! I know I should expect performance to drop when using sync=always, but 10x fewer IOPS seems like too much?

Thank you and I appreciate any and all constructive criticism and feedback!

EDIT:

Apologies for the misuse of ZIL vs SLOG in the thread title

someone1 · May 22, 2014

After thinking this through more thoroughly, I think the drop in IOps is expected, however, I'd still like to see more throughput from my system which I think its capable of doing. Attached is a graph of IOps, response, throughput with and without a SLOG SSD.

Can anybody lend insight as to whether or not the graphs are representative of the build I mentioned? I guess SLOG SSD devices should have a high sequential write throughput as that's how I gather any write is placed on the SLOG (sequentially until it reaches the end of the volume and starts from the beginning again)

cyberjock · May 23, 2014

OCZ is a horrible choice for an SLOG. Even more so if you aren't doing mirrors. They are so well known for eating data and since the SLOG is your sole source of your data in the event of a loss of power, improper shutdown, or kernel panic, you shouldn't be using drives from a manufacturer that is so well known to be of such low quality.

Based on experience of users in the forums, you really shouldn't consider using SLOGs or L2ARCs until you have 64GB of RAM minimum. 96GB is more reasonable though. So while adding the SLOG may help in some ways, you're almost certainly hurting yourself in others because of how little RAM you have.

I've not seen any complaints for the Samsung 840 Pro here personally, but I'm a big advocate for Intel SSDs. There's an article from December or January where Intels were just the best all around when it came to protecting data on a loss of power. Every other model/brand tested(Samsung wasn't tested) failed the tests with data loss. OCZ failed the basic tests without any power loss testing. :P

someone1 · May 23, 2014

Thank you for the reply cyberjock. From various sources and tests I've performed myself, adding a SLOG is very beneficial especially when it comes to my purpose of increasing throughput and lowering average latency for sync writes while using FreeNAS as a ISCSI host. I think my graphs provided are representative of that, but I get the feeling they are a not representative of what my system can handle as a whole which is my main concern.

I did read a few articles about Intel SSDs being able to withstand over 6500 forced power cycles with no data loss. I guess my choice of SSD should have a supercap or equivalent for instances of power failure due to component issues which rules out the Samsung 840 Pro from what I read. As far as Intel's go, I see 3 models in the range of specs I think would work out but the cost is somewhat prohibitive (S3700). The Seagate 600 Pro seems promising but I haven't heard much chatter about it. It comes in over $100 cheaper than the Intel counterpart.

Another interesting choice is Samsung's SM843T which from what I gather is Samsung's counterpart to the 840 Pro with Intel's S3700 features (well at least a supercap). I see that for ~$200 less than Intel's and although I can't speak for its forced power cycle endurance, it does look good on paper. For the features and price savings, I could get more out the Intel but I can pretty much buy two of these for the cost of one Intel.

cyberjock · May 23, 2014

someone1 said:
Thank you for the reply cyberjock. From various sources and tests I've performed myself, adding a SLOG is very beneficial especially when it comes to my purpose of increasing throughput and lowering average latency for sync writes while using FreeNAS as a ISCSI host. I think my graphs provided are representative of that, but I get the feeling they are a not representative of what my system can handle as a whole which is my main concern.

Yep. The problem is testing is not a good representation of reality. This has been fought over between ZFS pros and newbies and the newbies have yet to come up with tests that actually reflect reality. There really is no substitute for reality. The best way to handle ZFS with your type of workload is to make some guesstimates based on ZFS experience(unfortunately you don't have much of this) and when your system starts getting slow start looking at what is the limitation and go from there. I can tell you that from reading almost every thread and post from 2+ years you're not going to see much in terms of 'great results' without 64GB of RAM minimum. Using a ZIL/SLOG and/or L2ARCs stress your ARC. Without a respectable size you stress the ARC by adding an SLOG or L2ARC, and you are in shock when you see performance drop. This isn't like Windows where more hardware is always better. It's about adding the right kind of hardware for your limitation.

someone1 said:
I did read a few articles about Intel SSDs being able to withstand over 6500 forced power cycles with no data loss. I guess my choice of SSD should have a supercap or equivalent for instances of power failure due to component issues which rules out the Samsung 840 Pro from what I read. As far as Intel's go, I see 3 models in the range of specs I think would work out but the cost is somewhat prohibitive (S3700). The Seagate 600 Pro seems promising but I haven't heard much chatter about it. It comes in over $100 cheaper than the Intel counterpart.

Another interesting choice is Samsung's SM843T which from what I gather is Samsung's counterpart to the 840 Pro with Intel's S3700 features (well at least a supercap). I see that for ~$200 less than Intel's and although I can't speak for its forced power cycle endurance, it does look good on paper. For the features and price savings, I could get more out the Intel but I can pretty much buy two of these for the cost of one Intel.

Yep, and that's where you decide how much your data is worth. Either you get spendy and do it right or you cut corners and home it does bite you in the butt later.

ZFS won't forgive you if you decide to be "stupid". ZFS expects you to be pro and do it right. Do it right and you'll have years of happy ZFSing. Do it wrong and everything will appear to work great until that one day when something unexpected happens and you are scratching your head and saying things in here like "my zpool has been a dream for 6 months until last night....".

someone1 · May 23, 2014

Well I don't want anyone thinking I didn't do my research:

I know adding more RAM will be beneficial to my system, but with the amount I have I thought common wisdom was that you should always use a SLOG to improve sync=always performance on your zpool. The reasons being many, but mainly it acts as much faster log location than your spinning disks would. It also helps offload writes from your spinning disks to increase availability for reads (and I think something about being able to optimize writes?). There is talk of that in the forums as well as in this blog post (a Nexenta developer I know, but he's talking about ZFS which from what I gather all distributions should be using a same behaving implementation except for Oracle or features on top of ZFS such as FreeNas + Encryption for ZFS) about it.
Using an enterprise SSD should satisfy the SLOG requirement from what I gather, although it's true that only Intel withstood a power cycle stress test (a very unrealistic test but indicative of stability none-the-less). I think the SM843T or Seagate 600 Pro should satisfy the requirement on paper with the cons being that they haven't been put through the same kind of stress testing as the Intel has and there is not much chatter on them. Also, there is talk from members such as jgreco that v28 of ZFS eliminated the need for a mirrored SLOG (though by the definition of a SLOG I am not entirely sure how if a SLOG goes missing would the few seconds of write data still make its way to the pool - is there another cache kept in the ARC?). To mitigate any power issues, I do have a system with redundant PSUs attached to separate UPS' which is attached to a power grid that has a backup gas powered generator - my hope is that enough has been invested into the infrastructure over the years to omit the risk of power issues along with a supercap SSD. Maybe I can upgrade the SLOG at a later point in time to improve performance.
Although I realize I cannot possibly benchmark my system to real-world workloads, I can at least try to establish and optimize a baseline. That being said, I am quite happy with the read IOps/throughput of my system, however, the writes are really where I'm concerned. As the graphs show, I never break 20 MB/s which is well under what the SSD I'm using is capable of. I understand there is latency in other areas, but I'm sure that my servers can saturate gigabit NICs they have. I also see in the same thread above (see this post) that another user with a system with half as much RAM and a slower SSD was able to hit around 30 MB/s on writes on his system using NFS with sync=always. I'm curious as to why I am not able to hit such levels with sync=always. The first and obvious problem to point to was the SSD. I am fairly confident that since I could saturate my network links in a read test that pushing writes shouldn't be a concern. Could it be a problem with the ISCSI implementation? Probably not as setting sync=standard seems to give good numbers and the actual sync=always setting is on the underlying pool, not on the ISCSI host software. I also have seen mention of other high-performance systems doing just fine with the current ISCSI host that comes with FreeNAS.

I think the purpose of this thread is satisfied (although I would appreciate any and all opinions) - I need to switch out to a Enterprise level SSD, not only to address any performance, but for reliability as well. Intel is already known to be rock solid but can the same be said of the Samsung or Seagate enterprise level SSDs? Also, do I need a mirrored SLOG? If I can save enough on hardware I'm thinking of building a second identical box for HA purposes.
I can definitely appreciate the "You get what you pay for" mentality and "How much is your data worth to you" mentality, but I'm hoping I can get the features I want without going through a storage vendor and spending 4-10x more than a DIY option such as FreeNAS.

xcom · May 23, 2014

someone1,

Please PM me. I have some good information for you.

cyberjock · May 25, 2014

lol.. I love how this whole "PM me thing is"... frankly.. if you aren't about to write in the forum you shouldn't be posting. The whole point of "forums" are for the benefit of all.

Few comments:

- The SLOG doesn't offload writes at all. It simply acts as a temporary non-volatile storage for sync writes that still need to be committed to the pool.
- That test you read that Intel passed that you also said was "a very unrealistic test". That's *exactly* what real-world scenario could be when using an SSD as an SLOG.
- Do not confuse unnecessary with not recommended. In v15 if you lost your SLOG you lost your pool. PERIOD. It was totally gone. In future versions that bug was fixed and you could recover the pool, but you'd lose some data from the transactions that were in your SLOG. The reason for mirrored SLOGs is to ensure you have a copy of that data in case you lost one of the SLOG drives. So while it's no longer "really really damn important" it's still very much recommended.
- All that power stuff is fine and dandy, but it does nothing to protect you from accidentally unplugging power cords, kernel panics, etc. So do not assume all that power-related equipment protects you. Plenty of people in this forum have seen all that stuff fail them because they made a mistake.
- For the same reasons that benchmarking a zpool doesn't work, trying to compare your server to someone else's is a fools errand. Even if you and I both run 1 VM and it's an email server, I should never expect my server to necessarily match your server. You might server 100 users and I might serve 100,000. So don't be fooled by what others do. And quite often those people do benchmarks(which are total BS) and then post *those* numbers, which are not realistic in one way or another.

ZFS is extremely difficult to master. I read everything I can on it in my spare time, and 2 years later I'm still finding stuff I didn't know. And I've been unemployed for the last 2 years and able to read about ZFS as much as I want. So do you think a few days/weeks of research is going to give you a thorough knowledge? No offense, but ZFS is far to complex to understand in just a few weeks. The one thing that everyone seems to do is underappreciate how complex ZFS really is.

someone1 · May 25, 2014

I'd just like to clarify (as this is a public forum), a SLOG does offload writes for the ZIL from the main pool thus decreasing overall latency for both reads/writes and increasing overall throughput in a realistic scenario. There are tons of articles stating this and proving it; one of which is an especially detailed blog post from Oracle. I have even been able to see this in action from my limiting benchmarking. Yes the SLOG is only temporary storage for writes that eventually get written to the pool, but now ZFS is writing the ZIL to a separate device and is able to batch the writes in transaction groups. My benchmarking is indicative of expected performance increases from a SLOG so while I do think you understand these benefits, cyberjock, your last post sounded misleading.

gpsguy · May 25, 2014

How full is the pool?

In a virtualized environment it's suggested to keep the usage under 60%.

someone1 said:
I am using under 85% of available space on the pool.

cyberjock · May 25, 2014

someone1 said:
My benchmarking is indicative of expected performance increases from a SLOG so while I do think you understand these benefits, cyberjock, your last post sounded misleading.

Yeah.. benchmarking zfs can be damned as far as I'm concerned. If we had a talk on the phone I could probably discredit your actual numbers in at least 5 different ways within 10 minutes. That's the whole problem. People don't understand ZFS' complexity, run benchmarks and things look better so they assume it must in fact *be* better. That's not true and I see it regularly here.

Just like benchmarking with and without a ZIL gives the impression that things are faster, an L2ARC can actually show performance decrease on a benchmark. It has to do with how each of those technologies work. The reality is that your benchmarking isn't a good reflection of reality, so saying "its better and I have benchmarks to prove it" is misleading. You don't understand that the benchmarks are lying. Just find the benchmark thread where someone did some benchmarks with 6 drives and got results that showed he was doing like 5GB/sec. Clearly we all know that 5GB/sec from platter based 4TB disks is physically impossible.

So yes, I know it "sounds" misleading. But the reality is that the ZIL is actually misleading you. But since you don't appreciate the complexity you are adding a ZIL, assuming it should make things better, doing a benchmark and seeing that the numbers are better, and are immediately assuming that the correlation must be true.

Also, you said:

ZFS is writing the ZIL to a separate device and is able to batch the writes in transaction groups

and that's not true. That's another marvelous complexity with ZFS. ;) The ZIL has no impact on the transaction groups at all.

someone1 · May 26, 2014

In a virtualized environment it's suggested to keep the usage under 60%.

I'm using under 100GB and my benchmarks were done with a 20GB file.

and that's not true. That's another marvelous complexity with ZFS. ;) The ZIL has no impact on the transaction groups at all.

I'm all for learning more so I'd appreciate some sources. When one of the developers of ZFS puts up a blog post stating the opposite of what you said, I am compelled to disagree with your statement. But as you said, you've had 2+ unemployed years to learn about ZFS so if you don't mind shedding some light on this, I'd greatly appreciate it.

From the blog post I previously linked:

The ZIL handles synchronous writes by immediately writing their data and information to stable storage, an "intent log", so that ZFS can claim that the write completed. The written data hasn't reached its final destination on the ZFS filesystem yet, that will happen sometime later when the transaction group is written

Additionally, cyberjock, I can appreciate how some benchmarks can be misleading, this is true of anything you benchmark, maybe even more so in ZFS. But its not without merit. I don't expect that my benchmarks be indicative of system performance all the time under a real workload. However, as I said I am trying to establish a baseline and I do think some level of testing on anything new you introduce to your production environment should be done to provide some level of understanding for what is capable of. My main concern, from the beginning, was that my write performance was not where it should be given the ideal condition it was given to perform.

All I wanted was some advice as to whether or not these numbers look okay, and if not, things to try to bring them up. Again, lets just try to think of this as optimizing a baseline where we are under ideal conditions. Actual performance may vary greatly due to various complexities of the system as a whole, especially ZFS, but I can cross that bridge when I get there. If you strongly believe that I'm already starving the ARC and need more RAM when the system is still in testing and doesn't have much on it, then I appreciate your recommendation and would value a second or third opinion.

someone1 · Jun 1, 2014

For the edification of anyone who stumbles upon this thread. I got a Samsung SM843T SSD and the numbers look healthier, but still not at the levels I think the system is capable of performing at.

xcom · Jun 1, 2014

someone1,

Looking good. Though your transfer seem awfully slow.

jpaetzel · Jun 18, 2014

I have some insight for you.

1) if you have a dedicated ZIL device it is merely a persistent mirror of the real ZIL, which is in RAM. In normal operation a sync write goes to the ZIL in RAM, from there it's written to the dedicated device, the client is ACKed, then the write is played from RAM to the pool in a txg. If you watch gstat you'll see the ZIL device under write load, but there are no reads to it.

2) iSCSI respects SYNCHRONIZE CACHE and other related commands that cause sync writes. If you watch zilstat you'll see these commands causing ZIL traffic. I see no reason to force sync=always. If the client requires a sync it will get a sync, otherwise filesystems these days are async by default!

I do not recommend mirroring ZIL devices with modern ZFS. Go ahead and concat them.

I do not recommend sync=always with iSCSI. I say that with my "I've supported hundreds of people using ZFS iSCSI as virtualization backed a for 4 years, sync=standard for them all, and if there was a problem with that I'd have seen it by now." hat on.

cyberjock · Jun 18, 2014

jpaetzel,

While I respect your opinion and insight, the users on here are not buying the same kind of high-end quality hardware you see in TrueNAS boxes. They also don't install then im nice air conditioned server rooms with clean power, UPS, etc. We've had many users lose data because of a lack of mirrored ZILs. This is known because on bootup the ZIL would decide that it was time to stop working forcing mounting the pool from the CLI with -F(which caused a rollback of some transactions) and several people have reported unbootable VMs as a result. :(

The sync=always versus sync=standard is a long drawn out conversation and if you want to know what I've seen you are welcome to call me about it sometime. ;)

someone1 · Jun 18, 2014

Thank you for the insight jpaetzel. Some of the things you mentioned go against guides and "common wisdom" spread across this forum. My only concern is with not using sync=always. It was my understanding that istgt did NOT respect sync writes and therefore to ensure there is nothing lost here, to utilize sync=always. Is the new ctld iSCSI target better behaved in this sense?

someone1 · Jun 23, 2014

So I performed the same tests with the new experimental iSCSI target (with multithreading enabled) and WOW, the results are definitely where I think they should be given the system build. Overall write throughput still isn't fantastic but its much better than before.

Build: FreeNAS-9.2.1.6-RC-71b05dd-x64 (sync=always)

Important Announcement for the TrueNAS Community.

Should I bother upgrading my ZIL SSD?

someone1

Dabbler

someone1

Dabbler

cyberjock

Inactive Account

someone1

Dabbler

cyberjock

Inactive Account

someone1

Dabbler

xcom

Contributor

cyberjock

Inactive Account

someone1

Dabbler

gpsguy

Active Member

cyberjock

Inactive Account

someone1

Dabbler

someone1

Dabbler

xcom

Contributor

jpaetzel

Guest

cyberjock

Inactive Account

someone1

Dabbler

someone1

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Should I bother upgrading my ZIL SSD?

Dabbler

Dabbler

Inactive Account

Dabbler

Inactive Account

Dabbler

Contributor

Inactive Account

Dabbler

Active Member

Inactive Account

Dabbler

Dabbler

Contributor

jpaetzel

Guest

Inactive Account

Dabbler

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Should I bother upgrading my ZIL SSD?"

Similar threads