Reduce write cache (zil?) so disk subsystem can keep up?

Status
Not open for further replies.

helloha

Contributor
Joined
Jul 6, 2014
Messages
109
So I have a 10GbE setup with 64GB of ram on both sides (I'm actually running OpenZFS on my mac pro as well).

When I push a 20 GB file or more, the transfer speed exceeds the speed of the array, but I assume it goes straight into the memory. As the copy is finished on the client side, the array keeps working to write all the data stored in memory?

My issue is that I have to wait until the system is ready when I want to read a file. If the system is still writing I have to wait or get terrible IOPS performance.

My question: is there a way to tune ZFS so it writes straight to disk?

Or is there something completely different going on here? Also sorry for the bad terminology...

Thanks!
K.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Read up on the ZIL, it won't cache that much data. It's something like X bytes or 5 seconds, whichever comes first. And the fact that your pool can't keep up won't be resolved if you write straight to disk. You will just see longer transfers, but the overall length of time will remain the same.
But what makes you think it's the ZIL? Have you monitored zilstat while your transfers are running? What protocol are you using and are sync writes required? As a test, you can set the dataset properties to disable (allows ZIL to store in RAM) or enable sync writes (requires data to be flushed to disk) and see if that resolves the issue. If disabling sync write resolves it, then you need to decide if you can live with that risk. If you can then you are done, if you can't then get a proper SLOG for that pool. If it doesn't fix anything then the problem is elsewhere.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
depasseg has this partially right. The issue is that your zpool is underperforming. The fix is to improve the performance of the zpool. Add more vdevs, etc.

If your issue is sync writes, you'd have different behavior and different issues. Disabling sync writes won't fix the issue because your still going to have the flushes to the zpool in 5 second increments, for better or worse.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
My issue is that I have to wait until the system is ready when I want to read a file. If the system is still writing I have to wait or get terrible IOPS performance.
How long do you have to wait after a transfer?

What is setup of your pool? (zpool status) and have you watched "zpool iostat -v" during a transfer to see how the drives are performing?

Yeah, sync writes were the suggestion to write directly to the pool (not exactly, but the effect is the close).

I don't understand how the wait can be that long (more than 5 seconds). Nor what else would cause this (other than a slow/unsuitable pool).
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I don't understand how the wait can be that long (more than 5 seconds). Nor what else would cause this (other than a slow/unsuitable pool).

On bootup ZFS does a quick and dirty (and very inaccurate) benchmark test (takes like 3 seconds to run and happens on pool mounting) to determine what kind of speeds the zpool can get. This number is typically horribly inaccurate and much higher than actual performance. The write cache attempts to size itself appropriately. If you transfer a bunch of data into the write cache that the system can't flush at the expected speed (which the expected speed is way too high) then you get a situation where you have to wait for the write cache to complete its flush.

For example, when I do 10Gb file copies from my desktop to my FreeNAS I can copy about 3GB of data in about 3 seconds (effectively saturating the 10Gb NIC). The zpool will then flush to disk, and I'll have to wait about 10 seconds. During the flushing I can only transfer about 50MB of data over those 10 seconds.

There are tunables that, if you run extensive benchmarks and can come up with realistic numbers that are better than the defaults that ZFS figures out, you can set the tunables as appropriate. But since zpool performance changes as the zpool is used over time, generally people leave the auto-benchmark to do its thing or set the values artificially low so that things flush to disk much sooner if the write cache fills.

Now if he's waiting a very long time (more than 30 seconds IMO) then this likely isn't the issue and he should start digging deeper. If he's using things like hard drives that are shingled storage, this behavior is totally normal and there is no way around it. If not he could have a failing disk or he could simply be overloading the system for the hardware that he has. It's tough to say without digging deeper.
 

helloha

Contributor
Joined
Jul 6, 2014
Messages
109
Thanks for your lengthy and thorough reply's, below you can find some more info about my setup:

SERVER: Supermicro X8dti-f - 2x Xeon E5620 - 64GB Ram, TDK TF30 USB 3.0 (boot), 2x Dell H310 with H20 IT firmware, Chelsio S320E-CR, Supermicro 3U Chassis 920Watt Platinum PSU. FreeNAS-9.10-STABLE-201604111739 - POOL1: 2x RAIDZ: 4x 4TB Seagate ST4000DM000 (8 disks) - POOL2: 1x RAIDZ: 6x 3TB Hitachi HDS723030ALA640

CLIENT: Mac Pro 4,1 - 2x Xeon E5520 - 64GB Ram, Chelsio S310E-CR -
POOL1: 1x RAIDZ: 4x 2TB WD Black

Important to note that this issue started before I switched to using ZFS on mac. Before I had a dual x5650 with an 8 port Areca and 4 disks in raid 5.

Upon doing some more research I noticed that making copies from pool1 to pool2 on the server itself works without any hiccups. I get a sustained speed of 400-500 MB/s.

The problem only arises when making copies over the 10GbE. I have tested both AFP and CIFS, both have the same issue.

BUT it appears only to happen when I PULL data from the server (to my local ZFS volume or HFS+ SSD (mac)).

Here is a more accurate description of the issue:

I copy 100 GB from server to client, starting the copy takes about 10 seconds (weird because they are 10 GB files).

It ramps up to 350-400 MB/s, but after a minute it completely stops, and upon checking IOSTAT on both sides, the client is still writing, but after it's finished nothing is happening no reads or writes, for about 30-45 seconds... then it starts again for a minute, then stops again...

I'm sorry if the issue first describes was not entirely accurate...

Many thanks for suggestions!
K.

It all starts nice:

Screen Shot 2016-04-13 at 20.58.30.png


But then comes to a complete stop

Screen Shot 2016-04-13 at 21.00.57.png


For 45-60 seconds...
Screen Shot 2016-04-13 at 21.01.30.png


 
Status
Not open for further replies.
Top