ESXi + NFS + ZFS = Bad Performance?

Status
Not open for further replies.

Nate W

Dabbler
Joined
Jul 10, 2014
Messages
38
I set up a SSD ZFS pool and get amazing performance locally (using dd to write/read). When I mount the storage over NFS via VMWare or directly in an Ubuntu VM, the performance is a small percentage of what I get locally.

Here are the server specs:
FreeNAS 9.2.1.5
Xeon L5640
96GB Memory
24x 480gb Intel 320
Chelcio T520-SO-CR (dual 10G fiber)

I have this set up as Striped 4x raidz2 vdevs.

The benchmarks locally (both compressed and compression off) are pretty good, but when I spin up a VM on the same storage over NFS or even mount the storage over NFS directly on an Ubuntu VM, the speeds are a fraction of the local speeds.

I would love to figure out why the speed difference is so great and get this ZFS box rocking.

Thanks in advance!

Here are some of the benchmarks:

Local Benchmarks (dd)

Without Compression:

[root@zfs] /mnt/storage/nocompression# dd if=/dev/zero of=temp.dat bs=4M count=50k
51200+0 records in
51200+0 records out
214748364800 bytes transferred in 341.918976 secs (628067993 bytes/sec)


[root@zfs] /mnt/storage/nocompression# dd if=temp.dat of=/dev/null bs=4M
51200+0 records in
51200+0 records out
214748364800 bytes transferred in 159.694223 secs (1344747235 bytes/sec)


With Compression:

[root@zfs] /mnt/storage# dd if=/dev/zero of=temp.dat bs=4M count=50k
51200+0 records in
51200+0 records out
214748364800 bytes transferred in 85.979721 secs (2497662969 bytes/sec)

[root@zfs] /mnt/storage# dd if=temp.dat of=/dev/null bs=4M
51200+0 records in
51200+0 records out
214748364800 bytes transferred in 47.818332 secs (4490921285 bytes/sec)

Ubuntu VMs that live on the storage pool, with ESXi mounting it via NFS:

One VM:

dd if=/dev/zero of=temp.dat bs=4M count=5k
5120+0 records in
5120+0 records out
21474836480 bytes (21 GB) copied, 225.22 s, 95.4 MB/s


dd if=temp.dat of=/dev/null bs=4M
5120+0 records in
5120+0 records out
21474836480 bytes (21 GB) copied, 85.0397 s, 253 MB/s

Two VMs (simultaneous):

Write:

dd if=/dev/zero of=temp.dat bs=4M count=5k
5120+0 records in
5120+0 records out
21474836480 bytes (21 GB) copied, 326.994 s, 65.7 MB/s

dd if=/dev/zero of=temp.dat bs=4M count=5k
5120+0 records in
5120+0 records out
21474836480 bytes (21 GB) copied, 334.963 s, 64.1 MB/s


Read:

dd if=temp.dat of=/dev/null bs=4M
5120+0 records in
5120+0 records out
21474836480 bytes (21 GB) copied, 158.938 s, 135 MB/s

dd if=temp.dat of=/dev/null bs=4M
5120+0 records in
5120+0 records out
21474836480 bytes (21 GB) copied, 155.774 s, 138 MB/s


Ubuntu VM that has the NFS mounted:

dd if=/dev/zero of=temp.dat bs=4M count=5k
5120+0 records in
5120+0 records out
21474836480 bytes (21 GB) copied, 136.16 s, 158 MB/s


dd if=temp.dat of=/dev/null bs=4M
5120+0 records in
5120+0 records out
21474836480 bytes (21 GB) copied, 68.4528 s, 314 MB/s
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
To answer your title:

"Yes. Well documented and there are ways to mitigate this penalty. Please search the forums as your answer has been documented no less than 50 times just this year."

I will say that your performance numbers for dd with 24x SSDs was horrible. I get higher speeds with 10 spindle disks. So you may have a more underlying disk subsystem problem to deal with. What HBA (or RAID controller.... cringe) do you have those SSDs connected to?
 

Nate W

Dabbler
Joined
Jul 10, 2014
Messages
38
To answer your title:

"Yes. Well documented and there are ways to mitigate this penalty. Please search the forums as your answer has been documented no less than 50 times just this year."

I will say that your performance numbers for dd with 24x SSDs was horrible. I get higher speeds with 10 spindle disks. So you may have a more underlying disk subsystem problem to deal with. What HBA (or RAID controller.... cringe) do you have those SSDs connected to?

I've found those posts, but most people ended up turning async rights on, which was not an acceptable solution for me.

Is there a solid tutorial for setting up iSCSI using file extents with ESXi? I've gone through the FreeNAS docs, but was curious if someone has documentation that dives into the details.

I don't really want to use an iSCSI block device as it seems like restoring data from it would be more cumbersome (unless I am going about it wrong). Do file extents fix this? Is it possible to work with iSCSI as easily as NFS (as far as accessing the VMs on the FreeNAS side).

Thanks!
 

Nate W

Dabbler
Joined
Jul 10, 2014
Messages
38
Also, it is a 3ware RAID controller with the drives passed through. Obviously not ideal, but this was a proof of concept. If I can get this working well, I could possibly push for new hardware and a more ideal setup.

Despite the SSD performance being poor, if I cannot get performance over the network to be even marginally close to local performance, it will be a hard sell, crappy hardware or not. (Getting only 30% of pool performance over the 10G card is not going to sell it).
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Also, it is a 3ware RAID controller with the drives passed through. Obviously not ideal, but this was a proof of concept. If I can get this working well, I could possibly push for new hardware and a more ideal setup.

Despite the SSD performance being poor, if I cannot get performance over the network to be even marginally close to local performance, it will be a hard sell, crappy hardware or not. (Getting only 30% of pool performance over the 10G card is not going to sell it).

Well, my experience with 3ware is that they add enough latency to kill many of the benefits of SSDs. Mostly because the controllers are not straight passthrough even whhen doing JBOD. You still have to contend with the on-card CPU and such. My guess is that your poor pool performance is related to the choice of your hardware. Unfortunately even as proof of concept you have to be ready to do "all-in" or you'll have other problems.

I can tell you that I have first-hand seen speeds on NFS that nearly saturate 10Gb LAN, so it is very possible. But you will need one heck of a ZIL if you plan to use NFS on ESXi because your SSDs in your ZIL will be your bottleneck. I don't have much experience with a full-SSD pool so I can't really provide much guidance on if a ZIL will be needed in your case. You will certainly still want to look at doing mirrors over RAIDZ1/Z2/Z3 though.
 

Nate W

Dabbler
Joined
Jul 10, 2014
Messages
38
Well, my experience with 3ware is that they add enough latency to kill many of the benefits of SSDs. Mostly because the controllers are not straight passthrough even whhen doing JBOD. You still have to contend with the on-card CPU and such. My guess is that your poor pool performance is related to the choice of your hardware. Unfortunately even as proof of concept you have to be ready to do "all-in" or you'll have other problems.

I can tell you that I have first-hand seen speeds on NFS that nearly saturate 10Gb LAN, so it is very possible. But you will need one heck of a ZIL if you plan to use NFS on ESXi because your SSDs in your ZIL will be your bottleneck. I don't have much experience with a full-SSD pool so I can't really provide much guidance on if a ZIL will be needed in your case. You will certainly still want to look at doing mirrors over RAIDZ1/Z2/Z3 though.

Makes sense. Although, if the hardware is truly the issue, why would I get decent speeds locally but not over the network? I should be seeing awful speeds locally as well, no?

I'll poke at this some more and give iSCSI a shot. I'm not familiar with iSCSI + ESXi super well, so I need to find some good resources on that too.

Thanks!

Nate
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Uh, you don't get decent local speeds. You should have been able to do 2GB/sec with your dd tests at the lower end. You instead got 600MB/sec. So I totally diagre that you "got decent speeds locally but not over the network". You got crappy local speeds too!

Edit: I've gotten 600MB/sec from 3 SSDs in a pool, 24 should have slaughtered my setup and you should have speeds near the highest we've ever seen in the forums. Instead your speeds are lackluster... at best.
 

Nate W

Dabbler
Joined
Jul 10, 2014
Messages
38
Uh, you don't get decent local speeds. You should have been able to do 2GB/sec with your dd tests at the lower end. You instead got 600MB/sec. So I totally diagre that you "got decent speeds locally but not over the network". You got crappy local speeds too!

Edit: I've gotten 600MB/sec from 3 SSDs in a pool, 24 should have slaughtered my setup and you should have speeds near the highest we've ever seen in the forums. Instead your speeds are lackluster... at best.

Look at local vs over NFS. The local speeds are 300% faster. Obviously they are not what they *should* be locally, but the disparity between local vs NFS is what I am referring to.

150 MB/sec != 600 MB/sec
 

diehard

Contributor
Joined
Mar 21, 2013
Messages
162
Just curious, are you sure you have intel 320's? I'm pretty sure those only came in 300GB or 600GB at the top end.

But seriously you have like $12,000 worth of solid states, spend $100 on an HBA.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, but that doesn't matter. You clearly have some kind of local problem as well. Deal with the local stuff, THEN the remote stuff.
 

bestboy

Contributor
Joined
Jun 8, 2014
Messages
198
I'd take a look at your network. I guess the network stack of FreeNAS is probably optimized for 1GbE since that's what most users got at home. I bet you have to adjust it for proper 10 GbE use. This guide may be of help.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Nope. 10Gb works great on FreeNAS. TrueNAS uses the same settings. I use 10Gb and despite not using the most recommended controller and using a horrible vdev layout I still can do 800MB/sec over the network. ;)
 

Nate W

Dabbler
Joined
Jul 10, 2014
Messages
38
Yeah, but that doesn't matter. You clearly have some kind of local problem as well. Deal with the local stuff, THEN the remote stuff.

This is all used leftovers that were going into the bin. It was free to me.

Getting remote vs local speeds closer than a 400% difference is required before I drop more money on this. I would like to address this disparity before I spend another dollar. The cheap 3ware controller shouldn't cause a 400% disparity. I am using the same test, but different mediums (local vs NFS). Can we address that disparity and then if I can get it lower, *then* I will consider a purpose-built machine for this project.
 

Nate W

Dabbler
Joined
Jul 10, 2014
Messages
38
I'd take a look at your network. I guess the network stack of FreeNAS is probably optimized for 1GbE since that's what most users got at home. I bet you have to adjust it for proper 10 GbE use. This guide may be of help.

Thanks for the link! I'll give it a look, although the Chelsio cards are supposed to be pretty solid on FreeBSD.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, good luck in your endeavors.

There's a tried and true way of dealing with problems.. fix local stuff THEN remote stuff. If you want to go do your own thing you are welcome to. It sounds like you have spare hardware you can just throw around and you might be able to figure it out. But for most of us we don't have that luxury and I can't really provide much advice as I don't know what you do and don't have. But the basic rule of FreeNAS is "you build it right or you build it twice".
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
That guide isn't particularly good as you can't (and shouldn't) try to substitute your own settings. You'll need to add them via the WebGUI (the proper way of adding settings) and I will warn that many of those FreeNAS is probably using. FreeNAS is based on FreeBSD but some of the FreeNAS settings deviate from FreeBSD because we've learned from experience that some settings just flat out work better for file servers. ;)

Good luck!
 

Nate W

Dabbler
Joined
Jul 10, 2014
Messages
38
Well, good luck in your endeavors.

There's a tried and true way of dealing with problems.. fix local stuff THEN remote stuff. If you want to go do your own thing you are welcome to. It sounds like you have spare hardware you can just throw around and you might be able to figure it out. But for most of us we don't have that luxury and I can't really provide much advice as I don't know what you do and don't have. But the basic rule of FreeNAS is "you build it right or you build it twice".

I am very familiar with speccing and building things correctly the first time. This is a side project of mine with no actual budget and is a proof of concept. I'll see if I can rustle up a different raid card and go from there.
 

bestboy

Contributor
Joined
Jun 8, 2014
Messages
198
I still don't see how this suboptimal IO pipeline can give such bad overall results. I mean, yes the IO performance may be bad considering the array, but the network performance is just abyssal. He can read with 1200 MB/s from ARC, but just gets 300 MB/s on the wire...

In order to make sure 10 GbE transfer rates can be met, I'd

  • iperf that sucker really hard
  • and then try a test with smaller files that would be read entirely from ARC to eliminate the IO pipeline from the equation
I'd not think about getting a new HBA before these tests are green. The current rig really should be able to pass those simple tests.

BTW: Maybe this DTrace script can provide some insights into the IO problem. Although not being designed for SSDs, I'd still give it a try. You know, just to make sure that the latencies are as low as expected.


 
Last edited:
Status
Not open for further replies.
Top