Random write performance within a VM

Status
Not open for further replies.

David E

Contributor
Joined
Nov 1, 2013
Messages
119
I've been playing with a lab server trying to understand the performance I am seeing, and figure out if there is some way to improve it.

Lab Setup:
ESXi 6 on a Dell R630, many cores, lots of ram
FreeNAS 9.3 20161181840 on an Intel E3 1230, 32G of RAM, single 840 Pro 128GB for testing
Networking is Intel 10G SFP+ NICs connected via DACs

Storage connection is via NFSv3 (intentional choice over iSCSI), and I've set sync=disabled for these tests. As you'd expect, writes are much slower with it on.

Performance:
CrystalDiskMark run per the settings seen below in a Win10 VM

dragon-840pro-nosync.png


Question:
1. 4k random writes both qd32 and qd1 seem very low, and I believe are contributing factors to the slowness we feel in our production system. This drive when benchmarked locally achieves 370MB/s (17.8x) and 138MB/s (8x) respectively. I recognize there are a lot of additional overheads here, but is anyone able to achieve higher values here? And if so, how?

2. I'm also surprised that the random write performance doesn't grow to extremely high levels like sequential does when sync is turned off. Are random writes not batched up and laid out on disk sequentially?
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Have you looked for what your bottle neck is? What does the CPU usage look like on the NAS while the test is running? Did you look at the NIC utilization on the NAS. You might want to change the heapsize on the ESXi host. I believe max NFS connections can be raised to 256 and heap can go to 32. Are us using the LSI virtual scsi controller, you could try the paravirtual scsi (pvsci) to lower cpu overhead.
 
Last edited:

David E

Contributor
Joined
Nov 1, 2013
Messages
119
Have you looked for what your bottle neck is? What does the CPU usage look like on the NAS while the test is running?

Looking at just the 4k QD1 tests the CPU is 80-90% idle during those tests.

Did you look at the NIC utilization on the NAS.

Yes, it very closely tracks the throughput numbers mentioned, and is idle otherwise since this is a test lab.

You might want to change the heapsize on the ESXi host. I believe max NFS connections can be raised to 256 and heap can go to 32.

I took a look at this and it seems like these settings are only relevant if you are connecting to >8 NFS mounts, and currently my production servers are only using 4, so well below any thresholds to warrant tweaking.

Are us using the LSI virtual scsi controller, you could try the paravirtual scsi (pvsci) to lower cpu overhead.

I wasn't, I swapped to this on the windows vm but didn't notice any measurable difference.

Good ideas though, keep them coming!

Also, as an aside, I turned on 9k jumbo frames and got huge increases in my sequential reads (2x!) But also surprisingly in the 4k qd1 random read.. which I can't seem to explain, if anyone has any ideas. And unfortunately no change to the random write performance.

dragon-840pro-nosync-9kjumbo.png
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
I realize that you may feel by using an SSD that it may not be relevant, but how full is the pool?

You kind of need to be mindful of what rewriting a block in the middle of a file involves with NFS, which is one of the reasons that iSCSI and zvols are somewhat more attractive these days.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Check what recordsize are you using on dataset shared via NFS. While ZVOLs used for iSCSI storage default to 16KB blocks, datasets used for NFS default to 128KB. The last value is good for normal write-once file storage, but quite often too much for block storage, causing excessive number of read-modify-write operations. PS: Changed recordsize affects only newly written files.
 

David E

Contributor
Joined
Nov 1, 2013
Messages
119
I realize that you may feel by using an SSD that it may not be relevant, but how full is the pool?

Good suggestion, I just checked though and it is only 23% used.

You kind of need to be mindful of what rewriting a block in the middle of a file involves with NFS, which is one of the reasons that iSCSI and zvols are somewhat more attractive these days.

Can you elaborate? I'd have thought there would be little difference from an NFS perspective between a random write test and a sequential write test (both are modifying the existing VMDK), other than one having sequential addresses and the other addresses all over.

Thanks!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Modifying an arbitrary unit of data on a CoW filesystem can be challenging. As mav@ suggests, one of these things is that writing a large amount of data sequentially will typically result in large (128K, 1M) ZFS blocks being written to disk. Using a smaller unit of storage (recordsize=16K is often the happy compromise) allows compression to still work but results in less read-modify-write behaviours. Also, a 32GB system is very small for iSCSI or NFS block storage use, the lack of ARC hobbles the ability of ZFS to store sufficient data to cache relevant blocks.

https://forums.freenas.org/index.ph...res-more-resources-for-the-same-result.28178/

Actually very little to do with iSCSI, mostly applicable to NFS vmdk storage as well. The modern FreeNAS and iSCSI is actually optimized to some of the peculiarities of block storage, and we can thank mav@ in large part for that.
 

David E

Contributor
Joined
Nov 1, 2013
Messages
119
Check what recordsize are you using on dataset shared via NFS. While ZVOLs used for iSCSI storage default to 16KB blocks, datasets used for NFS default to 128KB. The last value is good for normal write-once file storage, but quite often too much for block storage, causing excessive number of read-modify-write operations. PS: Changed recordsize affects only newly written files.

Great suggestion! This made a huge difference to the QD32 random write performance.

dragon-840pro-nosync-9kjumbo-recordsize16k.png


Later today I am going to try changing the ashift from 12 (default) to 13 to match the page size of this drive to see if that makes any additional difference.

Any other thoughts on how to speed up the QD1 read/write performance? And are the numbers I am seeing above for that particular test standard? QD1 Random writes still seem to be about a factor of 7 below native drive performance, and reads are about a factor of 2.

I'd also love to have a comparison to what others are getting in a similar ESXi 6/NFS/Freenas/Windows VM configuration (but presumably different hardware).
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Later today I am going to try changing the ashift from 12 (default) to 13 to match the page size of this drive to see if that makes any additional difference.

I can't say how can it affect performance, but be ready that it will significantly reduce space efficiency and compression rates.

Any other thoughts on how to speed up the QD1 read/write performance? And are the numbers I am seeing above for that particular test standard? QD1 Random writes still seem to be about a factor of 7 below native drive performance, and reads are about a factor of 2.

QD1 is more difficult, since to reach full speeds it heavily depends on prefetched reads and background writes. Sequential reads should be prefetched quite well by ZFS. Prefetcher code was heavily reworked in FreeNAS 9.10, so you may try to switch to it and retest. Random reads can not be prefetched, so the only way to improve performance there is add more caches (ARC and L2ARC) and let them warmup.
ZFS can do background writes, but there are two requirements for that: metadata should be in cache, and writes should be done in full ZFS blocks and perfectly aligned to their borders. New prefetcher in FreeNAS 9.10 is now able to prefetch metadata for sequential write operations, that wasn't supported by 9.3. For problems of random writes and short/misaligned writes there still no solutions unfortunately, so that remains a headache for administrator to optimize initiators for that somehow or not.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Bearing in mind that there's production traffic and no attempt to optimize for speed, the low write speeds are typical.

crystal.png
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
This is on the in-production NAS listed in my signature. NFSv3 backed datastore to ESXi6U2 host with Win 10 VM. Sync writes could be better if I had a faster SLOG but they are fine with what I use sync for.

With sync enabled:

upload_2016-3-28_10-47-32.png



With sync disabled:

upload_2016-3-28_10-56-57.png
 

kspare

Guru
Joined
Feb 19, 2015
Messages
507
upload_2016-3-28_12-20-40.png

I ran this on our system. Using Freenas 9.10. There are also 18 other terminal servers running on the box.

I run an intel 750 400gb as my slog. and another for l2arc. 64gb of ram, 10gb celsios nic using nfs.
 

kspare

Guru
Joined
Feb 19, 2015
Messages
507
I have a new box coming here in a few days with a 400gb intel 750 slog, a 1.2tb intel 750 l2arc, 256gb of ram and a much faster cpu all running 12gb sata. I'll see if that helps at all.

I haven't done anything to tweak the record set, but I may experiment with that once I get my new box.
 

David E

Contributor
Joined
Nov 1, 2013
Messages
119
Thanks everyone for the benchmarks, really helpful to see what everyone else is getting to make sure I'm not off by like an order of magnitude.

@Mlovelace I didn't see a SLOG in your signature, what are you using?
 

David E

Contributor
Joined
Nov 1, 2013
Messages
119
Also @kspare I'd love to hear about the performance of your new box... follow up with a post when you can. And I definitely recommend trying out the recordsize change!
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Thanks everyone for the benchmarks, really helpful to see what everyone else is getting to make sure I'm not off by like an order of magnitude.

@Mlovelace I didn't see a SLOG in your signature, what are you using?
It's an Intel DC S3700 200GB we had sitting around. The newer NVMe stuff is much faster.
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
The pool is not encrypted. The only time I would ever consider encrypting the pool is if the server was housed in a colo.

The P3700 would make a great slog. If this server was used as primary storage I would go in that direction, but I can't justify the expense for a couple Sync NFS shares.
 

David E

Contributor
Joined
Nov 1, 2013
Messages
119
Bearing in mind that there's production traffic and no attempt to optimize for speed, the low write speeds are typical.

crystal.png

Have you posted the specs anywhere for the machine you used in this test? Is it the one you have a P3700 SLOG in? And this is NFS?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Have you posted the specs anywhere for the machine you used in this test? Is it the one you have a P3700 SLOG in? And this is NFS?

I don't even have any solid grasp of what exact point I was showing.
 
Status
Not open for further replies.
Top