Understanding SYNC Performance

Status
Not open for further replies.

acook8103

Dabbler
Joined
Mar 6, 2012
Messages
12
I'm trying to understand the performance of my server, and I'm totally baffled by my results.
I built this server years ago for the experience, and recently thought about upgrading due to poor performance, so I wanted to see if there was anything I could do to stretch its life.


The SATA was not working on my mobo and I was using a PCI card. Upgrading the BIOS on my motherboard fixed that, and moving to them drastically improved the performance. Since then I’ve been runnings tests on every combination I can think of.
My setup is 3 1TB 7200RPM Samsung drives, in RAIDZ. I was able to pull an SSD after upgrading my Desktop, I made a 2GB partition and added it as the SLOG. I plan to use the server for ESXi, so SYNC writes will be common, probably not necessary for my setup, but I already have the drive. This system, as old as it is, is on a Dual-Core AMD 64 w/8GB.


SYNC
-DISABLED
-ALWAYS
PROTOCOLS
-LOCAL
-NFS
-CIFS
-iSCSI

The file sharing protocols perform more or less the same so I will lump them together. For these tests, I’m just providing typical ‘dd’ results since they’re pretty typical for all the iozone tests I’ve run. I can also confirm these results from watching ‘zpool iostat -v’.

Locally
ASYNC 180 MB/s
SYNC 58 MB/s
File Sharing
ASYNC 34 MB/s
SYNC 14 MB/s
iSCSI
ASYNC 39 MB/s
SYNC 10 MB/s

If I run ‘iperf’, I basically hit the practical max: 938Mbps

I know this is not a fast pool, but it’s significantly faster than what I was using when I was on the PCI card. Based on the local results, I feel like I should be capable of much better network results. These are tolerable for me for the foreseeable future until a proper upgrade is possible.

I’ve run the tests (that I could) from a Mac and from from a VM guest (the guest is installed to local storage, I created 2 drives ‘async’ & ‘sync’ and ran dd directly on them sdb/sdc for the iSCSI tests). I’ve made sure to use Intel NICS on both machines, but that made no noticeable difference.

My question is though: Why are the SYNC writes over the network so terrible? As much as I’ve read, there’s nothing that seems to indicate that SYNC writes require any more overhead. Again, I’m able to test with a SLOG and watching ‘zpool iostat -v’ I can see that the ‘log’ device is being written at ~10MB/s. And the system is clearly capable of more due to the local tests, so even given the memory constraints, I think it should do better.

So, please explain to me a) What I’m doing wrong? (if anything) b) What I’m missing about FreeNAS and/or ZFS that would result in such bad networked SYNC writes?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Awesome, you did a lot of homework and characterization there.

Your tests actually show what the problem is and it isn't really sync writes. Look at the difference between "locally" and "iscsi" and you see async also drops horrifyingly.

ZFS, iSCSI, and ethernet conspire to add enough latency at each level, partially due to your older platform. Now I know you hate to hear that, so let me start out by saying that I had some nice Opteron 240EE storage servers that were capable of gigabit joy under FreeBSD 6. Under FreeNAS, total suckage... and it is largely due to the additional layering and resource consumption, combined with a low-MHz CPU. Yours is actually doing better than the ones we have were.

So basically I don't think you're doing anything wrong. I have frequently noted that ZFS is a resource pig and failure to resource it sufficiently will add latency, which in turn becomes poor performance, especially with sync, where you basically lose a lot of buffering capabilities along the way. This is frequently not understood by people who aren't used to the concept.
 

acook8103

Dabbler
Joined
Mar 6, 2012
Messages
12
I was willing to accept the drop for the network as "overhead". I was thinking that the if the limitation was the network, ASYNC and SYNC would be more or less equivalent since the SYNC attribute should be invisible to the client. FreeNAS would receive the data and handle it and that would be the end of it.

What do you mean by "lose a lot of the buffering capabilities"? Reordering writes to be sequential? That kind of thing? Watching 'zpool iostat -v 1', I can see the SLOG writing @ 10MB/s, and every few seconds I'll get a burst to the RAIDZ as RAM is flushed.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
With NFS at least, the sync attribute starts at the ESXi host NFS implementation. For iSCSI, setting sync=always on ZFS is local to the filer.

But in general, sync implies that a client is going to write a block of data, and then the response to the client is not going to happen until the block has been committed to stable storage in some way. This is going to be destructive to throughput because you are limited by the client-makes-request;traverse-net;iscsi-daemon;syscall-to-write-block;ZFS;hardware-push-to-ssd;ssd-write;ssd-acknowledge-to-hardware;ZFS;syscall-returns;iscsi-daemon-responds;traverse-net;client-satisfied cycle, which basically happens for each block. I'm kind of talking generally here so I've taken mild liberties to make the point. But basically you can expect that there's some minor latency involved in pushing a sync write from ZFS to a SATA SSD, and the SATA SSD is probably not super zippy about it.

I've been leaning heavily towards using a hardware RAID controller with BBU and conventional spinny rust for SLOG because:

1) The latency to write is immensely reduced (driver acknowledges once data is put into the controller's write cache, which happens over PCI-e)

2) The spinny rust has virtually unlimited endurance compared to SSD.
 

pbucher

Contributor
Joined
Oct 15, 2012
Messages
180
I've done some testing and found for cheaper but good performance ZILs Intel S3700 SSDs really do the job. I was able to beat out my RAID & rust setup in my test environment, though that didn't seem right. I thought the RAID controller should have done better and I probably should have spent more time poking at the config options on the RAID controller, but I wasn't really motivated because I've got better things I can do with the drives(going to move them over to my pool and rebuild the pool at a high raid-z level since they are the same drives I have in the pool).
 

acook8103

Dabbler
Joined
Mar 6, 2012
Messages
12
I've sprung for a server upgrade. I got an X10SL7-F-O w/ Pentium G3430 3.3GHz w/ 16GB ECC RAM. I'm running the 9.2 RC. Same 3 drives and same SLOG.

I made 3 Datasets and set sync=disables/always/standard

Here are a few updated stats:
Locally
ASYNC 260 MB/s
SYNC 54 MB/s

NFS
ASYNC 84 MB/s
SYNC 8 MB/s
STANDARD 39 MB/s

If I watch 'zpool iostat -v 1', it appears that the STANDARD test is writing to the SLOG at ~39MB/s, and then every few seconds there will be a 100+ MB/s burst as the TXG gets dumped to disk. (At least that's how I'm understanding these things work.) What I've been able to find is that NFS only does SYNC writes for METADATA, so in theory, I should be getting the same 84 MB/s minus a penalty for a few small SYNC writes, not a 50% drop.

What am I missing about NFS and DEFAULT?

I haven't tried to test iSCSI due to the ZVOL Extent UI bug.

Where the server is now, it has a cable that is 4x longer than it will in the "rack", which is added latency compared to my last set of tests.

Also, I thought that a BBU was well outside my budget, but came across this today. http://www.neweggbusiness.com/product/product.aspx?item=9b-16-118-158 + http://www.neweggbusiness.com/product/product.aspx?item=9b-16-118-119 . Which would get you a BBU controller for ~$210. Any thoughts on how well it might work? ie-any glaring deficiencies that I'm just not away of?

Thanks
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
iSCSI doesn't do sync writes. It's not part of the FreeBSD implementation or not part of the iSCSi spec(I forget which).
 

acook8103

Dabbler
Joined
Mar 6, 2012
Messages
12
I'm doing 'zfs set sync=always' for the dataset, as I plan to use this system for ESXi.

I've read the rants on the requirements for ESXi, and my needs are not that great, so I'm hoping this is a "good enough" implementation, and I come in to this upgrade knowing so.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
So you plan to use iscsi with sync=always? You're likely to need the same beefy hardware as NFS with sync=standard.
 

acook8103

Dabbler
Joined
Mar 6, 2012
Messages
12
I understand that to be best practice, so that was the plan if I opted for iSCSI.

I'm not understanding why NFS (plain ol' NFS, not specifically ESXi) has such different write speeds depending on whether 'zfs set sync' is 'always' or 'standard'. I understand that ESXi is going to tell NFS to always do a sync write whether or not I configure the dataset as such.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
No.

Your pool/dataset sync setting overrides ALL writes if you choose disabled or always. It makes every single write a sync write or every single write an async write.

NFS "supports" the sync write flag. So any write that is specified as a sync write will be a sync write if you have it set to standard. So what writes are sync writes normally? That depends on what you are copying and what program you are using to do the copying. Some programs support and do sync writes, some don't. Your benchmark for the always/standard/disabled is only valid for the exact client software you used to copy the data and only in the method you used. Other than that, it goes out the window.

Som other protocols have no sync write flag(iSCSI for one). So standard and disabled should be the same for those protocols. sync=always will of course make your non-sync protocol behave kind of like a sync protocol, with all of the performance killing properties associated with it.

ESXi is an application that sync flags EVERY SINGLE WRITE. That completely kills zpool performance, but has the advantage of protecting your data at pretty much every performance cost.

You can make every protocol appear to have sync writes for everything by setting sync=always. That will also kill zpool performance in a similar fashion. That's why I'm saying sync=always with iSCSI is no different than NFS with sync=standard.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
... when talking about ESXi. (damn the need to qualify every statement!)
 

pbucher

Contributor
Joined
Oct 15, 2012
Messages
180
I wouldn't call iSCSI vs NFS for ESXi a best practice, I'd call it more a pain in the side. Folks started saying that because iSCSI doesn't do sync writes by default and they felt safer not changing any defaults & getting better performance. In reality iSCSI default is not much different then sync=never on your dataset with NFS or iSCSI with sync=always equals NFS /w defaults. At this point is seems mostly an academic discussions on which method is more dangerous, iSCSI default and NFS /w sync=disabled or both dangerous and shouldn't really be used unless you really understand the risks. Also with iSCSI you loose a lot of flexibility vs NFS. Finally NFS is what vmware uses for their virtual SAN so I'm guessing it's what vmware has spent more time working on protocol wise.
 

pbucher

Contributor
Joined
Oct 15, 2012
Messages
180
I just found this by dumb luck, here a excellent link that covers log devices and sync writes. It also talks about iSCSI, NFS & virtual machines. The article is pretty deep, but I think most can glean some good info from it.
 

acook8103

Dabbler
Joined
Mar 6, 2012
Messages
12
I understand the ZIL and SLOGs, and NFS on ESXi, and iSCSI w/ 'set sync=always'.

I asked why 'set sync=standard' had the performance of neither 'always' or 'disabled' when running from a normal NFS client (ie-NOT ESXi).

I also asked if the hardware controller might be useful for ZIL. The 128MB of cache concerns me slightly. But per jgreco's post here NFS Performance with VMWare - mega-bad? , it looks like there will be burst speed until the cache is filled, then you will get SEQUENTIAL WRITE speed for the underlying drive. Even if I get half of that, it would be 5x what I'm getting currently.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I understand the ZIL and SLOGs, and NFS on ESXi, and iSCSI w/ 'set sync=always'.

I asked why 'set sync=standard' had the performance of neither 'always' or 'disabled' when running from a normal NFS client (ie-NOT ESXi).

I also asked if the hardware controller might be useful for ZIL. The 128MB of cache concerns me slightly. But per jgreco's post here NFS Performance with VMWare - mega-bad? , it looks like there will be burst speed until the cache is filled, then you will get SEQUENTIAL WRITE speed for the underlying drive. Even if I get half of that, it would be 5x what I'm getting currently.

No, you don't understand this sync thing. If you are using something besides ESXi(that's the common problem child) then you will need to look at what program you are using, how it does its writes, and what it forces for sync writes and what it doesn't force. THAT is going to make the difference between standard, always, and disabled. My guess is your program uses some combination of sync AND async at the same time. Hence the different performance.

There isn't really such a think as sequential writes. ZFS does a weird writing with metaslabs. You can expect that if you are writing a bunch of data at the same time there's a chance it will all be next to each other. This also means if you do a dd test then it is likely to be physically close to each other. But there is no such thing really of sequential writes for ZFS. ZFS will do what it wants to do based on various factors like how full the pool is, how full your vdevs are, and how full the vdevs are next to each other. There is never any guarantee you will get a "sequential write".


Sync writes will always suck because you will always have the latency from your network and the storage pool between writes. That can take a pool to single digit MB/sec just like you are seeing.

As it stands, you have 8GB of RAM. You can't expect ANY kind of miracle with that little RAM. That's the minimum recommended for ZFS. If I were doing an esxi NFS datastore server I'd start with nothing less than 64GB of RAM, a ZIL, and a smallish L2ARC. l2arc requires RAM, so you're by definition going to need a lot of RAM to have a big l2arc. Don't have enough RAM for your l2arc and you can actually hurt pool performance. This isn't simple, and your hardware is never going to be quick performing with your hardware as old as it is.

The manual sums up this WHOLE thing with one paragraph:

The best way to get the most out of your FreeNAS® system is to install as much RAM as possible. If your RAM is limited, consider using UFS until you can afford better hardware. FreeNAS® with ZFS typically requires a minimum of 8 GB of RAM in order to provide good performance and stability. The more RAM, the better the performance, and the FreeNAS® Forums provide anecdotal evidence from users on how much performance is gained by adding more RAM. For systems with large disk capacity (greater than 8 TB), a general rule of thumb is 1 GB of RAM for every 1 TB of storage. This post describes how RAM is used by ZFS.


To be quite honest, I probably spent far too much time on this thread than I should have. 8GB of RAM is NEVER going to give good performance with sync writes. Your only option is to upgrade in a big way before evaluating why performance sucks.
 

acook8103

Dabbler
Joined
Mar 6, 2012
Messages
12
As it stands, you have 8GB of RAM.

It would help to read to the updated post. http://forums.freenas.org/threads/understanding-sync-performance.16414/#post-87471

This is the Performance Sub-Forum, I'm trying to characterize my system, so I set up a series of tests to find out what it was capable of. This involved forcing ZFS to act certain ways in the name of consistency. It's call the Scientific Method.

Sync writes are important for VMs, so whether I use NFS w/ESXi (ESXi dictates SYNC) or iSCSI w/ ESXi (I set sync=always on the ZVOL), I know I will need SYNC write performance. Hence I ask if any of the options I've come up with will get me closer to ACCEPTABLE performance. On the original build, I was using a PCI card and had no SLOG. Looking back, I don't know how I ever did it. If I had run the same sort of tests, I'm sure I would have gotten speeds you could note in the KB/s. I think the only reason it was usable was because I didn't know any better when I chose iSCSI years ago because it was faster than NFS.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You are using the scientific method, but you don't have the requisite knowledge to actually apply the scientific method and then properly interpret the results. That's all I was trying to say in the previous post. You were asking why NFS gives different results between sync=always,standard, and disabled. The fact that you didn't know the answer means you should stop and go back to the drawing board and figure out WHY. If you don't know why the information is pointless as you cannot properly interpret the data and come to a meaningful answer.

Technically, sync writes are not important for VMs. They are important for your data. Be it a VM or a movie file. The whole reason someone would choose to use a sync write is to protect data(whatever that may be).
 

pbucher

Contributor
Joined
Oct 15, 2012
Messages
180
Ok let's just make this simple. Just buy a Intel S3500 or S3700(depending on your budget) and drop it in as your log device. That is the most effective method of getting sync writes up to speed without burning a bunch of money.

Other option is if you don't mind the risk, is to run sync=disabled on the data set that hosts your ESXi data files, but you had better know what the risk is and how to mitigate against disaster.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ok let's just make this simple. Just buy a Intel S3500 or S3700(depending on your budget) and drop it in as your log device. That is the most effective method of getting sync writes up to speed without burning a bunch of money.

Other option is if you don't mind the risk, is to run sync=disabled on the data set that hosts your ESXi data files, but you had better know what the risk is and how to mitigate against disaster.

Actually, it's not that simple, as I've tried to explain dozens of times on the forum. You can't throw out generic answers like that. In fact, about the only thing you could easily say is:

If you want a SLOG(or L2ARC) the S3500/S3700 is an excellent choice.*

The * would then have about 50 bullets under it for when to use an SLOG and/or L2ARC and a whole list of if/then/whys. It's far from simple and can't be simplified down. This is the whole reason why so many newbies to the forum are getting road rash when they try to use ESXi with NFS or iSCSI.
 
Status
Not open for further replies.
Top