Poor Performance Dell R510, 2x 6Core, 10GBe

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Hi,

I recently upgraded to a Dell R510 12 bay as I need more RAM 32GB wasn't enough so got 64GB of RAM. Issue im facing is really poor performance over 10Gbe, When writing to the Server i can get 300 to 400mbp/s but when copying from the Server to my Workstation over 10Gbe im only getting 230mbp/s then just drops to 1Gbe speed. Performance is really bad. When running iperf i am seeing 6Gigs each end as the 10Gbe card is in the x4 Slot as x16 is taken up by the Graphics Card. I have done some performance tuning in TrueNAS which made writing to the Server fast but writing to my workstation is terrible. Any ideas? I am using NFS on Ubuntu 20.04 with Chelsio T320.

Thanks.

Jack.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Welcome to the forums.

This is a "My Toyota isn't going as fast as I'd expect" class of problem.

What kind of Toyota? Car? Truck? Supercar? What engine size? Hybrid? Electric? Tires? Transmission? Etc.

I hope you see the point; you've been somewhat vague.

Please consider providing some more useful details about your system, such as:

Type and composition of pool, number and type of disks, whether this is mirrors or RAIDZ

Type of network card, interconnect technology (SFP+ fiber, copper, DAC), maybe information on the switch

What sort of data you're trying to access, and does it work faster if the data is already resident in ARC

Pool occupancy and fragmentation info

etc.

In general, people are often surprised to find out that write speeds can be faster than read speeds for uncached data. "1GbE speed" sounds a bit too low even for an ancient R510, but not outside the realm of "I can engineer a situation where it is worse than that." Therefore one of the more interesting and helpful things is to try the same test several times so that your data ends up in ARC. Pick a 30GB file and try copying it repeatedly, for example.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Welcome to the forums.

This is a "My Toyota isn't going as fast as I'd expect" class of problem.

What kind of Toyota? Car? Truck? Supercar? What engine size? Hybrid? Electric? Tires? Transmission? Etc.

I hope you see the point; you've been somewhat vague.

Please consider providing some more useful details about your system, such as:

Type and composition of pool, number and type of disks, whether this is mirrors or RAIDZ

Type of network card, interconnect technology (SFP+ fiber, copper, DAC), maybe information on the switch

What sort of data you're trying to access, and does it work faster if the data is already resident in ARC

Pool occupancy and fragmentation info

etc.

In general, people are often surprised to find out that write speeds can be faster than read speeds for uncached data. "1GbE speed" sounds a bit too low even for an ancient R510, but not outside the realm of "I can engineer a situation where it is worse than that." Therefore one of the more interesting and helpful things is to try the same test several times so that your data ends up in ARC. Pick a 30GB file and try copying it repeatedly, for example.

Hi,

Thanks for your Reply.

I have a 6 Disk RAIDZ2 planning on upgrading soon tho. I am using SFP+ DAC Twinax seeing the link as 10gbp between my workstation and Server. Its a Direct Connection no switch involved.

I have tried different Files. I have a 100GB image file which starts around 230mbp/s then just drops to around 170mbp/s. I have a ISO file which is about 7GB and copies around 230mbp/s and drops when copying to my Workstation. When copying back to the Server its fast but then i think thats the Cache although its disabled. Anything copying from the Server to the Workstation over 10GB Link it just drops. When copying to the Server its fine.

I have created a ZFS Stripe on my workstation with 2 Dell 2TB Drives and the problem still happens. My workstation has a single SSD.

Jumbo Frames makes no difference

I have also done the 10Gbe Fine Tuning in System Turntables.

Different files have different behaviour

Server,
Dell R510
2x 6 Xeon L5640s
64GB DDR3 ECC Reg
6 3TB NAS Drives 5900RPM
1x Chelsio T320
1x Intel Quad NIC Bonded to 2 Cisco Switches.

Workstation,
Z77-DS3H
Core i7 3770k Overclocked
32GB DDR3 Vengeance
Samsung EVO 860 250GB
GTX1050 4GB in PCI-E x16
Chelsio T3 in x4 Slot
Intel P1000 in PCI-E x1

Screenshot from 2021-09-05 16-08-20.png
Screenshot from 2021-09-05 16-08-55.png


Screenshot from 2021-09-05 16-11-21.png
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Also, with the R510, what kind of HBA are you using there? Your signature lists "Perc H310 flashed to IT mode" in your Intel server, so I hope I don't need to point out that any H700i PERC RAID that would normally come in an R510 has to be yanked and replaced.

Please note that "170mbp/s" is likely to be read as "170 megabits per second", which works out to about 22MBytes/sec. That would be truly awful performance. However, your images above suggest you meant megaBYTES per second. We have a Terminology and Abbreviations Primer that explains why what you write and how you abbreviate is important...

One of the quirks of higher speed networking is that latency effects become dominant at some point. For example, writing to a FreeNAS system (without sync) tends to be very fast, because packets are sent over the network, written to the next transaction group in memory, and then acknowledged, and then later flushed out to disk as a large unit. This means that there is very little latency when writing to a ZFS filesystem. However, when reading, if a block is not already in the ARC, your request comes in, isn't found in ARC, and then has to be requested from HDD, which incurs significant latency, which is then sent back to the client some time later. Meanwhile, the client is twiddling its thumbs. Once it gets an answer, it can ask for the next block. This is an oversimplification of a very complicated process, and is just to illustrate why reads often more closely resemble the working speed of a member hard drive. This usually works out to be maybe 1-3x the speed of member HDD's.

It is looking to me like your second file transfer is being served out of ARC, at what I would consider to be a reasonable clip for an old R510. The first one looks to be closer to the transfer speed of a single HDD, which is lower than I'd expect, but not out of the ballpark. There's probably room for improvement there.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Also, with the R510, what kind of HBA are you using there? Your signature lists "Perc H310 flashed to IT mode" in your Intel server, so I hope I don't need to point out that any H700i PERC RAID that would normally come in an R510 has to be yanked and replaced.

Please note that "170mbp/s" is likely to be read as "170 megabits per second", which works out to about 22MBytes/sec. That would be truly awful performance. However, your images above suggest you meant megaBYTES per second. We have a Terminology and Abbreviations Primer that explains why what you write and how you abbreviate is important...

One of the quirks of higher speed networking is that latency effects become dominant at some point. For example, writing to a FreeNAS system (without sync) tends to be very fast, because packets are sent over the network, written to the next transaction group in memory, and then acknowledged, and then later flushed out to disk as a large unit. This means that there is very little latency when writing to a ZFS filesystem. However, when reading, if a block is not already in the ARC, your request comes in, isn't found in ARC, and then has to be requested from HDD, which incurs significant latency, which is then sent back to the client some time later. Meanwhile, the client is twiddling its thumbs. Once it gets an answer, it can ask for the next block. This is an oversimplification of a very complicated process, and is just to illustrate why reads often more closely resemble the working speed of a member hard drive. This usually works out to be maybe 1-3x the speed of member HDD's.

It is looking to me like your second file transfer is being served out of ARC, at what I would consider to be a reasonable clip for an old R510. The first one looks to be closer to the transfer speed of a single HDD, which is lower than I'd expect, but not out of the ballpark. There's probably room for improvement there.

Yeah sorry about that. It came originally with a H700i which i replaced with a H200i and flashed it to IT Mode. It has Dells Firmware so it can work in the integrated slot as well as Boot from it with 2 Mirror SSDs as the boot drive in the integrated 2.5inch bay.

H310 is the old system i need to put that right sorry about that.

So what do you suggest I do? because this not great. You see the old Server board I had suffered with the same problem. I have another Dell R510 which is being used as a SAN for iSCSI and it works fine but this Server its not great. Im beginning to wonder if its something to do with the Drives or maybe my workstation. I cannot seem to get anything faster than what i am seeing. I think sync is turned off.

The Second screenshot is copying files to the Server.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You don't need Dell firmware for it to work in the integrated slot, and you probably want to make sure you're running LSI IT 20.00.07.00.

A single client operation is going to be at a disadvantage. I don't recall offhand how bad NFS is for this sort of thing, but for SMB it is absolutely a major factor, which is one of the reasons I've generally promoted high CPU frequency processors for NAS use over raw core count. You might find that two clients doing independent accesses work out to better performance in aggregate. Someone with a similar setup might have more comments to offer.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
When writing to the Server i can get 300 to 400mbp/s but when copying from the Server to my Workstation over 10Gbe im only getting 230mbp/s then just drops to 1Gbe speed.
Looks to me that you're hitting the write limits of your workstation. A single SATA drive is not going to saturate a 10 GbE link, and may drop to a lot less than that when its write cache is full.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
You don't need Dell firmware for it to work in the integrated slot, and you probably want to make sure you're running LSI IT 20.00.07.00.

A single client operation is going to be at a disadvantage. I don't recall offhand how bad NFS is for this sort of thing, but for SMB it is absolutely a major factor, which is one of the reasons I've generally promoted high CPU frequency processors for NAS use over raw core count. You might find that two clients doing independent accesses work out to better performance in aggregate. Someone with a similar setup might have more comments to offer.

Urmm i tried the SAS Card in with just LSI's Firmware and the Server refused to boot in the integrated slot so I ended up using Dells Firmware with LSI IT Firmware in the end.

Looks to me that you're hitting the write limits of your workstation. A single SATA drive is not going to saturate a 10 GbE link, and may drop to a lot less than that when its write cache is full.

I am not expecting to get the full speed but I mean come on I should be getting better speeds that what I am getting.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Urmm i tried the SAS Card in with just LSI's Firmware and the Server refused to boot in the integrated slot so I ended up using Dells Firmware with LSI IT Firmware in the end.

Mmm. Well, the LSI IT firmware is the important bit.

I am not expecting to get the full speed but I mean come on I should be getting better speeds that what I am getting.

Perhaps. I expect it is capable of it for parallel clients.

What sort of NFS tuning have you done on the Linux client?
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Do you mean that you're running LSI's phase 20.00.07.00 firmware with Dell's BIOS? Some Dell systems won't recognize an HBA in the integrated slot if it doesn't identify as a true 'Dell internal HBA'. The eBay seller 'Art of Server' has a good instructional video on configuring Dell HBAs to run the LSI firmware while being recognized as a 'Dell internal HBA' -- basically, you have to twiddle a bit in the SBR:


But if your system recognizes your HBA 'as-is' and you're truly running LSI's 20.00.07.00 IT firmware, then you don't need to worry about this.

Your tunables are much the same as mine, except I'm loading CUBIC for TCP congestion control, and I use a couple of thread-related settings -- net.isr.bindthreads and net.isr.maxthreads.

bandit-tunables-2.jpg


I also use Jumbo frames, which will make @jgreco shake his head... but they really do help, at least on my systems. YMMV

ChrystalDiskMark is handy for testing transfer speeds:

I seldom get transfer speeds higher than 3Gb/s on my systems, despite iperf showing rates close to 10Gb/s. Disk subsystem constraints become the new bottleneck when you move to 10Gb/s networking.

Your NIC is in an x4 PCIe slot, so that's going to hurt performance quite a bit.

Reconfiguring your pool as mirrors instead of using RAIDZ2 would also help performance, but I understand that may not be a viable option.

Good luck!
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Mmm. Well, the LSI IT firmware is the important bit.



Perhaps. I expect it is capable of it for parallel clients.

What sort of NFS tuning have you done on the Linux client?

Ive added the following to the fstab.

Code:
192.168.2.1:/mnt/tank/Storage /home/violetdragon/Telsa nfs auto,nofail,noatime,nolock,intr,tcp,actimeo=1800,rsize=32768,wsize=32768 00


Ive also added the following to the sysctl.conf

Code:
net.ipv4.tcp_rmem = 10000000 10000000 10000000
net.ipv4.tcp_wmem = 10000000 10000000 10000000
net.ipv4.tcp_mem = 10000000 10000000 10000000

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.optmem_max = 524287
net.core.netdev_max_backlog = 300000


If there is more tuning you know of please let me know.

Here is the information of the SAS Controller the other controller in the Dell R510 has been flashed the same and works fine. I think this is client issue more than anything.

after-the-flash-768x1024.jpg
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Do you mean that you're running LSI's phase 20.00.07.00 firmware with Dell's BIOS? Some Dell systems won't recognize an HBA in the integrated slot if it doesn't identify as a true 'Dell internal HBA'. The eBay seller 'Art of Server' has a good instructional video on configuring Dell HBAs to run the LSI firmware while being recognized as a 'Dell internal HBA' -- basically, you have to twiddle a bit in the SBR:


But if your system recognizes your HBA 'as-is' and you're truly running LSI's 20.00.07.00 IT firmware, then you don't need to worry about this.

Your tunables are much the same as mine, except I'm loading CUBIC for TCP congestion control, and I use a couple of thread-related settings -- net.isr.bindthreads and net.isr.maxthreads.

View attachment 49227

I also use Jumbo frames, which will make @jgreco shake his head... but they really do help, at least on my systems. YMMV

ChrystalDiskMark is handy for testing transfer speeds:

I seldom get transfer speeds higher than 3Gb/s on my systems, despite iperf showing rates close to 10Gb/s. Disk subsystem constraints become the new bottleneck when you move to 10Gb/s networking.

Your NIC is in an x4 PCIe slot, so that's going to hurt performance quite a bit.

Reconfiguring your pool as mirrors instead of using RAIDZ2 would also help performance, but I understand that may not be a viable option.

Good luck!

Yes Correct. The H200i Firmware. I did a guide on doing it for both of my Severs. Jumbo Frames makes no difference whether its enabled or disabled here. https://www.violetdragonsnetwork.co...ell-h200-to-it-mode-with-h200i-firmware-bios/

The Link between my Workstation and Server is being seen as 6gbp which is good enough due to the Disks.

Screenshot from 2021-09-05 16-41-00.png
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Ive added the following to the fstab.

Code:
rsize=32768,wsize=32768

OOOOh. You kneecapped NFS. It is almost certainly suffering from latency effects because you're not letting it do very much at once.

Bet it will go somewhat faster if you raise these as high as it'll let you (within reason). Try 262144. Don't go higher than 1048576, use powers of two.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
OOOOh. You kneecapped NFS. It is almost certainly suffering from latency effects because you're not letting it do very much at once.

Bet it will go somewhat faster if you raise these as high as it'll let you (within reason). Try 262144. Don't go higher than 1048576, use powers of two.

Ok let me try. Should i still use the following in sysctl.conf ? or should i remove them?

Code:
net.ipv4.tcp_rmem = 10000000 10000000 10000000
net.ipv4.tcp_wmem = 10000000 10000000 10000000
net.ipv4.tcp_mem = 10000000 10000000 10000000

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.optmem_max = 524287
net.core.netdev_max_backlog = 300000
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
OOOOh. You kneecapped NFS. It is almost certainly suffering from latency effects because you're not letting it do very much at once.

Bet it will go somewhat faster if you raise these as high as it'll let you (within reason). Try 262144. Don't go higher than 1048576, use powers of two.

Thats looking a bit better i am now getting 305MB/sec
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Thats looking a bit better i am now getting 305MB/sec

That's pretty reasonable. Be aware that you're just not going to get monster speeds with a single client, RAIDZ, slowish CPU, especially on an old platform like the R510. With some experimentation, I could imagine you getting a bit more. However, even on a very fast machine with all the tweaks, it isn't likely to be a ton faster.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
That's pretty reasonable. Be aware that you're just not going to get monster speeds with a single client, RAIDZ, slowish CPU, especially on an old platform like the R510. With some experimentation, I could imagine you getting a bit more. However, even on a very fast machine with all the tweaks, it isn't likely to be a ton faster.

Its better the E3 hardware performed about the same. The CPUs don't get pegged neither like on the E3 system. About 5% of CPU used. I think my workstation is slow tho. R510 was a free be so not complaining. I need to upgrade my i7 rig it's showing its age had it since 2012. The R710 seems to run pretty good it's fast with iscsi to the other 510 I have. Will upgrade when I get some spare cash things are very tight atm it sucks hopefully it'll pick up.

Cheers for your help tho.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
That's pretty reasonable. Be aware that you're just not going to get monster speeds with a single client, RAIDZ, slowish CPU, especially on an old platform like the R510. With some experimentation, I could imagine you getting a bit more. However, even on a very fast machine with all the tweaks, it isn't likely to be a ton faster.

Hi,

Just a quick update. I am seeing around 4gbps to the Server and around 3gbps to my Workstation, According to Activity Monitor in Ubuntu its in MiB/s which i done a conversion. So the problem is solved. I will look at upgrading to my Workstation as its showing its age but for the Server its not the issue here.

Thanks again.

Jack.
 
Top