Qos from synchronous writes

Status
Not open for further replies.

no_connection

Patron
Joined
Dec 15, 2013
Messages
480
I was watching this presentation on ZFS in SmartOS that hit an interesting question about Qos and assuring other users don't suffer from users doing synchronous writes. From 15:50-19:50 into the video.
View: http://www.youtube.com/watch?v=6csFi0D5eGY

Does FreeNAS take any consideration about this?
It seems like a worth wile thing to implement of it is not already.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'm not even sure what to say. I see no link between how they did their build and why you'd even want to think about throttling sync writes. They are using the same hardware and "selling" a service to various customers. They are doing what they are doing to ensure that each customer gets their fair share of I/O.

FreeNAS servers are usually used by the same customer(aka.. the company). So trying to "sell" service to different customers is stupid because you don't have different customers. You have one customer.. the company.

Granted, I didn't watch the whole video. As soon as I saw "QoS" and "sync writes" in the same sentence I virtually dismissed the whole concept immediately because there's no basis for this "feature" to be useful for FreeNAS users. The common problem is that sync writes hurt performance of the zpool so much that people can't function. So when a pool is too slow to keep up the solution is to.... add even more latency?

That's like you being upset because the Grocery store checkout lines are too long, so the store manager says "I know the solution" and immediately closes 1/2 of the checkout lanes. Does that even make sense? Hell no.
 

no_connection

Patron
Joined
Dec 15, 2013
Messages
480
Watch the four minutes the the video I posted. He does explain it and how they solved it.

I don't see why this feature is not useful. For the reasons they explain.

See it as allowing one item customers slip in between huge carts of groceries instead of getting queued up behind all of them waiting for hours. Then when they finally get out the ice cream have already melted.
The sync writers don't even care if it takes a few seconds longer to write a few gigabytes of data so why should they get priority above a few quick reads that won't really hurt performance.

If you watched the video you would see how little it would take to let other users work as normal.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ok.. let me ask this question.

What situations in particular are sync writes used extensively(if even at all)? (Yes, this is a baited question.. but I'm trying to prove my point...)
 

no_connection

Patron
Joined
Dec 15, 2013
Messages
480
The common problem is that sync writes hurt performance of the zpool so much that people can't function.

If It's not common than it's not really a problem then?

So FreeNAS is never used as ESXi datastore in companies with no other users than ESXi?
And companies are not interested in keeping acceptable performance for all their users, even with sync writers raining on the party?

Ok forget I posted at all.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
No, its like this... 2 scenarios: 1 with ESXi exclusive and one with ESXi and normal users.

ESXi exclusive:

So if you have 10 VMs all doing sync writes with ESXi, the last thing you want to do is delay them. This does only one thing... makes those VMs go idle while waiting for their turn to write data. So obviously this is not too useful. The obvious answer is to upsize your server, add an L2ARC, etc. Trying to artificially choke VMs because one VM is particluarly busy doesn't solve your problem, it simply creates a new one(VM performance is in the crapper on purpose now).

ESXi with file sharing for normal users:

So you have 10VMs doing sync writes and the normal users using some number of shares. Here's the catch... normally, NFS shares don't have alot of sync writes. So those users aren't throwing lots of load at the server that requires a sync write, so the server can simply write the data at its next transaction. So the only thing you'd really be doing by throttling back sync writes is forcing VMs to go idle while waiting for their sync writes to complete since they are being throttled. And, they're being throttled for no reason because your other users don't have sync writes. So you're basically back to the "ESXi exclusive" category, with some extra writes that aren't even sync writes. Non-sync writes don't appreciably affect pool performance except in VERY large numbers. And if your numbers are that big, you should have built a bigger server to begin with. So who cares?

And you can even add a 3rd category: users that use any other sharing service than NFS(since they don't support sync writes). They're just like the above paragraph except that instead of saying "they don't have a lot of sync writes" you know for 100% certainty that they have zero sync writes. So they can always undoubtedly wait for the next transaction to write their data without any waiting.

So I'll ask again.. how is this supposed to benefit anyone if you are either deliberately adding latency to a VM as a "just in case" another VM needs to do a sync write; or deliberately adding latency to a VM so that another user can write their non-sync write data to the server at its next regularly scheduled transaction anyway. Sounds like a no-win situation for both outcomes.

It looks to me like you're throttling back VMs on purpose, and for no benefit. On top of it, there is a technical solution to the problem, rightsize your server.

In their example it sounds like they have a single large pool that's divided up to different customers(presumably under contract or something) that don't even know who they are sharing resources with or that their server is actually shared with other users. The whole purpose of the QoS system they are discussing is so that if you have 10 different customers on the server 1 client can't make the server so busy that the other 9 clients can't use it. So you deliberately throttle one user so that the other users get a chance for server cycles. This makes sense in that the customers with expected loading can pay for their contract and enjoy the server resources. But the guy that wants more server resources will be throttled from excessive loading(and perhaps might want to purchase a larger contract, maybe even have a dedicated server all to themselves).

In short, what they are doing and what FreeNAS are doing is both using ZFS. But that's where the similarities end. The client side of the house is totally different(and that's where their QoS system allows them to be more profitable). Other than ZFS, they don't really share much else in common.

So, I'll go back to what I said in the first post unless you can explain where I am in error...

I'm not even sure what to say. I see no link between how they did their build and why you'd even want to think about throttling sync writes. They are using the same hardware and "selling" a service to various customers. They are doing what they are doing to ensure that each customer gets their fair share of I/O.

FreeNAS servers are usually used by the same customer(aka.. the company). So trying to "sell" service to different customers is stupid because you don't have different customers. You have one customer.. the company.

That's like you being upset because the Grocery store checkout lines are too long, so the store manager says "I know the solution" and immediately closes 1/2 of the checkout lanes. Does that even make sense? Hell no.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I think that you are missing 1 key point in the whole "sync write battle" with ZFS.

When a VM makes a write to disk that VM is literally halted until the write returns. That's the whole point of sync writes. Stop further processing until the data is ensured to be written to the pool. You have add a millisecond or 2 to each write and you are now talking about a maximum of 50 writes per second for just one VM. You'd want to jump off the nearest bridge if you were limited to just 50 writes per second. Heck, some servers create that many log entries per second just performing their designed function. Can you imagine a server that can't do its designed function strictly because of logs? I don't want to be that admin, he'll be unemployed very quickly. ;)

So deliberately adding latency to those VMs is not exactly useful. Everyone already complains about the poor performance of VMs when their server isn't "rightsized" and QoS isn't going to do much to solve that problem. In fact, you're going to artificially make that problem worse for the busy VMs(potentially seriously impacting performance). How many people here have complained that they can't run 2 VMs at the same time without having 5 minute bootups? Adding some latency with a QoS might make that bootup faster, but your other VM is going to suffer.. tremendously. In fact, many people have complained that the first VM that was slowed down while the 2nd VM was booting was unable to perform its designed function because the VM was too slow to handle the real time load.
 

no_connection

Patron
Joined
Dec 15, 2013
Messages
480
I think I initially mixed up sync writes with sequential writes giving any one with a constant stream of data bad performance of the other users. In which the implementation would be worth wile.
So I get your point about it not being useful here. My bad.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ahh.. ok. That's a common mistake.
 

Ef57uiKnN6

Dabbler
Joined
Mar 25, 2012
Messages
28
Unfortunately almost all container based visualization technologies lack of proper i/o control.
I believe this video makes it a bit easier to understand:

Anyway, implementing this (efficiently), preferably on dataset level, would probably exceed the scope of the devs.
 
Status
Not open for further replies.
Top