SOLVED Very Poor 10GBE Performance

Status
Not open for further replies.

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
I am running on the latest FreeNAS 11.1 and I am seeing some very poor performance over NFS, CIFS, and iSCSI and I cannot seem to figure out where my issue is at.

I have 10GBE cards (Chelsio T520)
I have 36 Drives in each Server
I have two identical servers
One server is setup in mirror/stripe pairs (RAID10) with 15 vdevs
One server is setup in RAIDz2 with 10X3 vdevs
Each server has 6 intel S320 SSD's striped for ZIL.
Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
96GB of Ram in each


Local performance numbers are on part with what they should be +1G per second writes, multi GB per second reads. However, once something hits the network, 60MB per second is the fastest I can achieve using any possible combination of access. NFS, CIFS, and iSCSI all show the same. This is why I think this may be a network related issue.

Yes I have read the 10G primer, and any change to the system only makes things worse.

Server 1 with RAID 10 is setup in LAGG.
Server 2 with RAIDZ2 is not

Any pointers would be helpful.

Thanks in advance
~Donny
 
Last edited by a moderator:

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
Oh, and with the same gear on 9.latest I get much more respectable numbers.
 

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
Write Speed direct on the machine
Code:
fio --randrepeat=1 --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=64 --size=30G --readwrite=randrw --rwmixread=0
test: (g=0): rw=randrw, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=64
fio-3.0
Starting 1 process
test: Laying out IO file (1 file / 30720MiB)
Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=1393MiB/s][r=0,w=348 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=76564: Sun Jan 14 19:22:45 2018
  write: IOPS=307, BW=1231MiB/s (1290MB/s)(30.0GiB/24962msec)
   bw (  MiB/s): min=  479, max= 1792, per=99.32%, avg=1222.33, stdev=274.14, samples=49
   iops		: min=  119, max=  448, avg=305.16, stdev=68.61, samples=49
  cpu		  : usr=5.52%, sys=55.33%, ctx=88942, majf=0, minf=0
  IO depths	: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
	 submit	: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
	 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
	 issued rwt: total=0,7680,0, short=0,0,0, dropped=0,0,0
	 latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=1231MiB/s (1290MB/s), 1231MiB/s-1231MiB/s (1290MB/s-1290MB/s), io=30.0GiB (32.2GB), run=24962-24962msec
 
Last edited by a moderator:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Remove the lagg it's causing all your problems, you probably did it wrong. Also test your network using iperf for accurate network bandwidth.
 

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
Both servers perform the same. One has a LAGG one does not.

Also, I did not do the lagg incorrectly, my switch is setup for LACP, and so is the lagg. This same setup worked with 9.11 and achieved respectable numbers.
 
Last edited by a moderator:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Ok good luck
 
Last edited by a moderator:

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
iPerf shows that network layer is pretty close to wireline. It's not just FreeNAS to FreeNAS, its any client to FreeNAS.

------------------------------------------------------------
Client connecting to *.*.*.*, TCP port 5001
TCP window size: 2.00 MByte (default)
------------------------------------------------------------
[ 3] local 172.16.9.252 port 30987 connected with 172.16.9.251 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 1.10 GBytes 9.41 Gbits/sec
[ 3] 1.0- 2.0 sec 1.09 GBytes 9.37 Gbits/sec
[ 3] 2.0- 3.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 3.0- 4.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 4.0- 5.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 5.0- 6.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 6.0- 7.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 7.0- 8.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 8.0- 9.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 9.0-10.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 10.0-11.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 11.0-12.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 12.0-13.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 13.0-14.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 14.0-15.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 15.0-16.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 16.0-17.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 17.0-18.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 18.0-19.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 19.0-20.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 3] 0.0-20.0 sec 21.9 GBytes 9.39 Gbits/sec
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
11.1U1 will bring a bunch of patches; let's hope that it fixes your issue. Until then, I'd roll back to 9.10 or 11.0U4.

Once we get 11.1U1, I'd recommend testing again, and if there are still issues, then we can troubleshoot more. Your tests shows that the network seems to be working, so it's probably safe to rule out your switch/nics as the source of the problem. Your tests also show that your array is pushing more than 10Gbps, so you should be capable of saturating the link. The immediate thing that jumps out to me is there is some issue in the sharing layer.
 

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
Ok cool, thanks. I will wait for the latest patches. That could very well be the issue.

These systems are not in production, and i am more than happy to help the community. I will just wait for the next update.

I also am going to put the mellanox cards back in them, I thought this may have been the issue. I will use these cards for a direct link between the two. They are 40G cards, and worked quite well on linux. They also work well on my edge router which is also FreeBSD 11.1 based.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
LACP is basically an artificial shim layer in between the hardware and the OS, and it adds a little overhead.

Every time someone fixes a driver for issue X, or makes a change to the network stack for Y, or alters LACP for Z, there's the potential for the change to cause some new performance issue, especially because LACP requires perfect operation of several different layers. Unfortunately, not every change is tested under every set of conditions, because that's basically an impossibly large matrix, so it isn't unusual for there to be regressions. The more layers, the more things likely to go wrong, and LACP adds some physical layer challenges at well. When you're working at the very high speeds, any small issue rapidly magnifies into something noticeable.

It's not a solution, just try to understand that working at the high end of things often requires a certain amount of extra effort, especially to get that last ~10%, and often it involves debugging and problem reports and waiting for patches.
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
I will say, I was on 9.10 for a long time, had great 10Gbe performance and similar lost a lot of performance doing testing before the update and immediately after. Nothing else changed. Hoping for some improvement with the update.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yeah, the upside to that is that it suggests that it is potentially fixable, so that, at least, is good.
 

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
I'm sure it will be sorted.

I can report however it does seems to be an nfs/smb issue.

I have 3 of these storage servers and the 3rd one I built a Linux machine with as close to the same parameters.

When I switch to iscsi, all three machines report very similar numbers. Around 500/mbs write 900 read.

It must be something that changed

I will also heed the warnings on lacp and remove it. Makes sense when explained in detail. I have never had an issue before and it's only on one of the two servers.

Thanks guys, I truly appreciate the time.

~D
 

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
Ok, I have more news to report.
6
After setting sync to disabled I get very good performance (surprise surprise). 600/mb write 600/mb read. In my logs I did however see this error being thrown. This is from an ovirt virtual machine. Which does make sense because the limit on the hypervisor is 10GB. I would be very happy if I could get this type of performance in a safe manner. I don't think leaving sync in disabled mode is a fantastic idea, even though my gear is all on UPS.

Thoughts on sync disabled?

mps0: Out of chain frames, consider increasing hw.mps.max_chains

I am not familiar with this tunable, but a quick google points me to here https://www.freebsd.org/cgi/man.cgi?mps(4)

Anyone familiar with this setting?
Has anyone else seen this before?
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
Excellent detective work!

I would strongly recommend not to have sync disabled, primarily because my goal with FreeNAS is maximal data safety, and sync disabled is the opposite of data safety. I would definitely open a bug with this. I feel like this gives you a reproducible case to begin troubleshooting with. Please post the link to your bug back here.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Ok, I have more news to report.
6
After setting sync to disabled I get very good performance (surprise surprise). 600/mb write 600/mb read. In my logs I did however see this error being thrown. This is from an ovirt virtual machine. Which does make sense because the limit on the hypervisor is 10GB. I would be very happy if I could get this type of performance in a safe manner. I don't think leaving sync in disabled mode is a fantastic idea, even though my gear is all on UPS.

Thoughts on sync disabled?

mps0: Out of chain frames, consider increasing hw.mps.max_chains

I am not familiar with this tunable, but a quick google points me to here https://www.freebsd.org/cgi/man.cgi?mps(4)

Anyone familiar with this setting?
Has anyone else seen this before?

Search on it, I recently discussed it somewhere, but the web browser is stressin' right now so I'll let you do the search. :smile:
 

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
@jgreco You did a discussion on sync disabled or hw.mps.max_chains

I am pretty sure sync disabled has been discussed at length, I am more concerned with hw.mps.max_chains.

I understand sync disabled is essentially lying that the write has completed when it actually is only in the remote systems virtual memory, and if there is a power loss at that moment in time, there would also be a data loss.

I do have 6 intel ssd's in a stripe for ZIL, and cannot understand why I am taking a performance hit when sync is enabled {standard,always}. These S320's are not the fastest on the market, but more than capable of doing 600MB/s writes and reads in a 6 way stripe. Am I not understanding what ZIL actually does? When these drives are in a separate pool, they are doing 800-900MB/s. Is it the way the workload is placed on the ZIL.

I would just buy something better, but this is a homelab and I can't afford to spend anymore dollars on it.. the wife would not be happy about that.

I have UPS, but I agree with Nick... Data safety will come at the cost of performance. Fast, Cheap, Reliable... Pick any two.
 

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
Tomorrow I am going to directly connect the two freenas machines and run a copy operation to isolate my switch out of the loop. I have been using FreeNAS for many many years, and the math just doesn't add up here with this issue. FreeNAS has been one of the core pieces of my infra. The only new thing in my infra for the last three years is a new(ish) dell switch, and think its prudent to take that out of the equation to get to the bottom of this issue.

I do know the limits of this gear using other technology like ceph. I get around 500-600MB/s, but I don't want to have to run all three servers all the time.

Also FreeNAS does more in terms of making my life easy. I don't want to build netatalk packages myself, or setup email alerts.. It's a pain in my buns.... FreeNAS is just too easy.

I will report back the results of my testing for those that are interested.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
@jgreco You did a discussion on sync disabled or hw.mps.max_chains

Oh dear oh dear, take your pick..

https://forums.freenas.org/index.ph...xi-nfs-so-slow-and-why-is-iscsi-faster.12506/

https://forums.freenas.org/index.php?threads/false-smart-errors.60608/#post-430661

I do have 6 intel ssd's in a stripe for ZIL, and cannot understand why I am taking a performance hit when sync is enabled {standard,always}. These S320's are not the fastest on the market, but more than capable of doing 600MB/s writes and reads in a 6 way stripe. Am I not understanding what ZIL actually does?

Since commits to the ZIL happen sequentially, I cannot imagine that this is a fast solution. It would probably be faster to have a single S320. You have confused bandwidth (600MB/s) for latency.

https://forums.freenas.org/index.php?threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/

esp subparagraph 3 of "What is a good choice for a SLOG device?"

https://forums.freenas.org/index.php?threads/slog-performacnce.40343/#post-253545

etc

So basically what you did was you created a big giant 18-wheel semi of SSD, a big powerful engine with lots of carrying capacity, but despite the fact that there's plenty of horsepower, it ain't EVER gonna go from 0...60 in 5 seconds. It's just not a supercar. Sorry.

SLOG will *always* impact performance negatively and you have to get into extremely-low-latency stuff like the NVMe stuff if you don't want it to be AS hurt-y. I'm guessing the Optane stuff like the DC P4800X is as good as it is likely to get for a little while.
 

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
The S320's are not a very fast SSD, so going down to one will most likely be more painful.

As for the network, I have linked the two together directly with the same performance. So it's not a networking issue at this point.

I do understand it's not going to move at light speed. I am just trying to discern why local performance is quite respectable while network performance is not so much. I have most of the right equipment, looks like i just need to get a a few more bits.
 
Status
Not open for further replies.
Top