Random write performance within a VM

David E · Mar 27, 2016

I've been playing with a lab server trying to understand the performance I am seeing, and figure out if there is some way to improve it.

Lab Setup:
ESXi 6 on a Dell R630, many cores, lots of ram
FreeNAS 9.3 20161181840 on an Intel E3 1230, 32G of RAM, single 840 Pro 128GB for testing
Networking is Intel 10G SFP+ NICs connected via DACs

Storage connection is via NFSv3 (intentional choice over iSCSI), and I've set sync=disabled for these tests. As you'd expect, writes are much slower with it on.

Performance:
CrystalDiskMark run per the settings seen below in a Win10 VM

Question:
1. 4k random writes both qd32 and qd1 seem very low, and I believe are contributing factors to the slowness we feel in our production system. This drive when benchmarked locally achieves 370MB/s (17.8x) and 138MB/s (8x) respectively. I recognize there are a lot of additional overheads here, but is anyone able to achieve higher values here? And if so, how?

2. I'm also surprised that the random write performance doesn't grow to extremely high levels like sequential does when sync is turned off. Are random writes not batched up and laid out on disk sequentially?

Mlovelace · Mar 27, 2016

Have you looked for what your bottle neck is? What does the CPU usage look like on the NAS while the test is running? Did you look at the NIC utilization on the NAS. You might want to change the heapsize on the ESXi host. I believe max NFS connections can be raised to 256 and heap can go to 32. Are us using the LSI virtual scsi controller, you could try the paravirtual scsi (pvsci) to lower cpu overhead.

David E · Mar 27, 2016

Have you looked for what your bottle neck is? What does the CPU usage look like on the NAS while the test is running?

Looking at just the 4k QD1 tests the CPU is 80-90% idle during those tests.

Did you look at the NIC utilization on the NAS.

Yes, it very closely tracks the throughput numbers mentioned, and is idle otherwise since this is a test lab.

You might want to change the heapsize on the ESXi host. I believe max NFS connections can be raised to 256 and heap can go to 32.

I took a look at this and it seems like these settings are only relevant if you are connecting to >8 NFS mounts, and currently my production servers are only using 4, so well below any thresholds to warrant tweaking.

Are us using the LSI virtual scsi controller, you could try the paravirtual scsi (pvsci) to lower cpu overhead.

I wasn't, I swapped to this on the windows vm but didn't notice any measurable difference.

Good ideas though, keep them coming!

Also, as an aside, I turned on 9k jumbo frames and got huge increases in my sequential reads (2x!) But also surprisingly in the 4k qd1 random read.. which I can't seem to explain, if anyone has any ideas. And unfortunately no change to the random write performance.

jgreco · Mar 28, 2016

I realize that you may feel by using an SSD that it may not be relevant, but how full is the pool?

You kind of need to be mindful of what rewriting a block in the middle of a file involves with NFS, which is one of the reasons that iSCSI and zvols are somewhat more attractive these days.

mav@ · Mar 28, 2016

Check what recordsize are you using on dataset shared via NFS. While ZVOLs used for iSCSI storage default to 16KB blocks, datasets used for NFS default to 128KB. The last value is good for normal write-once file storage, but quite often too much for block storage, causing excessive number of read-modify-write operations. PS: Changed recordsize affects only newly written files.

David E · Mar 28, 2016

I realize that you may feel by using an SSD that it may not be relevant, but how full is the pool?

Good suggestion, I just checked though and it is only 23% used.

You kind of need to be mindful of what rewriting a block in the middle of a file involves with NFS, which is one of the reasons that iSCSI and zvols are somewhat more attractive these days.

Can you elaborate? I'd have thought there would be little difference from an NFS perspective between a random write test and a sequential write test (both are modifying the existing VMDK), other than one having sequential addresses and the other addresses all over.

Thanks!

jgreco · Mar 28, 2016

Modifying an arbitrary unit of data on a CoW filesystem can be challenging. As mav@ suggests, one of these things is that writing a large amount of data sequentially will typically result in large (128K, 1M) ZFS blocks being written to disk. Using a smaller unit of storage (recordsize=16K is often the happy compromise) allows compression to still work but results in less read-modify-write behaviours. Also, a 32GB system is very small for iSCSI or NFS block storage use, the lack of ARC hobbles the ability of ZFS to store sufficient data to cache relevant blocks.

https://forums.freenas.org/index.ph...res-more-resources-for-the-same-result.28178/

Actually very little to do with iSCSI, mostly applicable to NFS vmdk storage as well. The modern FreeNAS and iSCSI is actually optimized to some of the peculiarities of block storage, and we can thank mav@ in large part for that.

David E · Mar 28, 2016

mav@ said:
Check what recordsize are you using on dataset shared via NFS. While ZVOLs used for iSCSI storage default to 16KB blocks, datasets used for NFS default to 128KB. The last value is good for normal write-once file storage, but quite often too much for block storage, causing excessive number of read-modify-write operations. PS: Changed recordsize affects only newly written files.

Great suggestion! This made a huge difference to the QD32 random write performance.

Later today I am going to try changing the ashift from 12 (default) to 13 to match the page size of this drive to see if that makes any additional difference.

Any other thoughts on how to speed up the QD1 read/write performance? And are the numbers I am seeing above for that particular test standard? QD1 Random writes still seem to be about a factor of 7 below native drive performance, and reads are about a factor of 2.

I'd also love to have a comparison to what others are getting in a similar ESXi 6/NFS/Freenas/Windows VM configuration (but presumably different hardware).

mav@ · Mar 28, 2016

David E said:
Later today I am going to try changing the ashift from 12 (default) to 13 to match the page size of this drive to see if that makes any additional difference.

I can't say how can it affect performance, but be ready that it will significantly reduce space efficiency and compression rates.

David E said:
Any other thoughts on how to speed up the QD1 read/write performance? And are the numbers I am seeing above for that particular test standard? QD1 Random writes still seem to be about a factor of 7 below native drive performance, and reads are about a factor of 2.

QD1 is more difficult, since to reach full speeds it heavily depends on prefetched reads and background writes. Sequential reads should be prefetched quite well by ZFS. Prefetcher code was heavily reworked in FreeNAS 9.10, so you may try to switch to it and retest. Random reads can not be prefetched, so the only way to improve performance there is add more caches (ARC and L2ARC) and let them warmup.
ZFS can do background writes, but there are two requirements for that: metadata should be in cache, and writes should be done in full ZFS blocks and perfectly aligned to their borders. New prefetcher in FreeNAS 9.10 is now able to prefetch metadata for sequential write operations, that wasn't supported by 9.3. For problems of random writes and short/misaligned writes there still no solutions unfortunately, so that remains a headache for administrator to optimize initiators for that somehow or not.

jgreco · Mar 28, 2016

Bearing in mind that there's production traffic and no attempt to optimize for speed, the low write speeds are typical.

Mlovelace · Mar 28, 2016

This is on the in-production NAS listed in my signature. NFSv3 backed datastore to ESXi6U2 host with Win 10 VM. Sync writes could be better if I had a faster SLOG but they are fine with what I use sync for.

With sync enabled:

With sync disabled:

kspare · Mar 28, 2016

I ran this on our system. Using Freenas 9.10. There are also 18 other terminal servers running on the box.

I run an intel 750 400gb as my slog. and another for l2arc. 64gb of ram, 10gb celsios nic using nfs.

kspare · Mar 28, 2016

I have a new box coming here in a few days with a 400gb intel 750 slog, a 1.2tb intel 750 l2arc, 256gb of ram and a much faster cpu all running 12gb sata. I'll see if that helps at all.

I haven't done anything to tweak the record set, but I may experiment with that once I get my new box.

David E · Mar 29, 2016

Thanks everyone for the benchmarks, really helpful to see what everyone else is getting to make sure I'm not off by like an order of magnitude.

@Mlovelace I didn't see a SLOG in your signature, what are you using?

David E · Mar 29, 2016

Also @kspare I'd love to hear about the performance of your new box... follow up with a post when you can. And I definitely recommend trying out the recordsize change!

Mlovelace · Mar 29, 2016

David E said:
Thanks everyone for the benchmarks, really helpful to see what everyone else is getting to make sure I'm not off by like an order of magnitude.

@Mlovelace I didn't see a SLOG in your signature, what are you using?

It's an Intel DC S3700 200GB we had sitting around. The newer NVMe stuff is much faster.

David E · Mar 29, 2016

Mlovelace said:
It's an Intel DC S3700 200GB we had sitting around. The newer NVMe stuff is much faster.

Thanks, is your pool encrypted? I've got my eye on a P3700 but that price hurts.

Mlovelace · Mar 29, 2016

The pool is not encrypted. The only time I would ever consider encrypting the pool is if the server was housed in a colo.

The P3700 would make a great slog. If this server was used as primary storage I would go in that direction, but I can't justify the expense for a couple Sync NFS shares.

David E · Apr 20, 2016

jgreco said:
Bearing in mind that there's production traffic and no attempt to optimize for speed, the low write speeds are typical.

Have you posted the specs anywhere for the machine you used in this test? Is it the one you have a P3700 SLOG in? And this is NFS?

jgreco · Apr 20, 2016

David E said:
Have you posted the specs anywhere for the machine you used in this test? Is it the one you have a P3700 SLOG in? And this is NFS?

I don't even have any solid grasp of what exact point I was showing.

Important Announcement for the TrueNAS Community.

Random write performance within a VM

Contributor

Guru

Contributor

Resident Grinch

iXsystems

Contributor

Resident Grinch

Contributor

iXsystems

Resident Grinch

Guru

Guru

Guru

Contributor

Contributor

Guru

Contributor

Guru

Contributor

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Random write performance within a VM"

Similar threads