Very slow R/W speeds

TheUsD

Contributor
Joined
May 17, 2013
Messages
116
Network Engineer here with a medium to large exposure to the Dell Compellent Storage line (help support about 4PB across 26 locations). I'm always hungry and wanting to grow my knowledge so I decided to learn more about Free/TrueNAS by building a small box. I'm very aware I'm not using anything enterprise / server grade or quality here (except for 4 out of 10 disks) so I'm not looking for anything amazing when it comes to speeds, however I feel there is a major bottleneck in the R/W speeds on both Pools but more especially Pool: Storage-01. I have my suspicions and want to see if anyone can confirm them (will not post suspicions so I do not influence you).

TrueNAS Setup:
OS: TrueNAS-12.0-U1.1​
MB: GIGABYTE H370M DS3H
CPU: Intel i5 8500​
RAM: 4x8GB (32GB) DDR4 2400 (NON-ECC)
SAS: SAS9211-8I 8PORT Int 6GB (in IT Mode)​
NIC: dual Intel 82576 SFP 1Gbps DACs (in LACP)​
Storage Drives:
Purpose secondary drives for VMs (no need for fast storage) and VM backups:
Pool Name: Storage-01;​
Hardware: 4x Seagate Exos 7E8 8TB (ST800NM0055) attached SAS9211-8I 8PORT Int 6GB
Pool type: RAIDz1 (no zVOL)​
Share type to ESXi Cluster: NFSv3​
(note) I tried to used a 128GB NVMe2.0 as a SYSOL and then as a L2ARC for speed testing.​
Best R or W speed has been 13MB/s.​

Purpose: VM OS's / boot drives:
Pool Name: Boot-01
Harware: 4x Samsung Evo 860 SSD attached to MB​
Pool type: RAIDz1 (no zVOL)​
Share type to ESXi Cluster: NFSv3​
Best R or W speed has been 112MB/s​
Other Storage Device:
NetGear ReadyNAS 214​
Storage: RAID5 4TB rando HDDs​
NIC: daul 1Gps Ethernet in LACP​
Share type to ESXi Cluster NFSv3​
ESXi Hosts:
2x Dell OptiPlex 7070's SFF​
OS: ESXi 6.7u3​
CPU: i5 9500​
RAM 64GB DDR4 2400 (non-ECC)​
NIC1: dual-port 1.25Gb/s Ethernet Intel 82576 Chip (dedicated to VMs​
NIC2: Single-port​
1x Dell OptiPlex 7040m​
OS: ESXi 6.7u3 w /vsphere​
CPU: i5 6500​
RAM: 24GB DDR2400
NIC: Rando onboard 1GB Ethernet intel NIC​

Network Infrastructure:
Switch: FortiSwitch 224E-POE
Firewall: FortiGate 80F
There is no high i/o on the switch or FW

If you need more details on specs and setup, please let me know. I couldn't thing of much more to add with my limited knowledge of TrueNAS.


Migrating any VM storage/Boot between the Boot-01 and the ReadyNAS will transfer around the 105-112MB/s.
Migrating any VM storage/Boot between Boot-01 and Storage-01 will not exceed the 13MB/s
Migrating any VM storage/Boot between ReadyNASand Storage-01 will not exceed the 13MB/s

Any thoughts, advice or troubleshooting tips is appreciated. Snark remarks and condescending replies will be ignored. (Had to mention this because there seems to be a lot of people here that offer advice but especially focus more on trying to tell others how wrong they are and all but calling them "stupid" in the process.)
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
I'd say give it a whirl again after 12.0-U2 drops today. There was some performance issues fixed with Intel networking specifically which could be the culprit here.
 

TheUsD

Contributor
Joined
May 17, 2013
Messages
116
Thank you for the reply, Kris. I will look forward to the update and will report back any findings once the update has dropped and I have upgraded.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Any thoughts, advice or troubleshooting tips is appreciated.

RAIDZ is probably not helping here.

https://www.truenas.com/community/r...and-why-we-use-mirrors-for-block-storage.112/

Snark remarks and condescending replies will be ignored. (Had to mention this because there seems to be a lot of people here that offer advice but especially focus more on trying to tell others how wrong they are and all but calling them "stupid" in the process.)

Postel's law might help you here. Do consider that everyone participating here is taking time out of their days to try to help you out. There is no need to consider remarks as "snark" or "condescending" if they can be reasonably taken in another way; don't take offense where none might have been meant (Postel's law as applied to interpersonal communications). If you feel something exceeds reasonableness, report it to the moderation team.
 

TheUsD

Contributor
Joined
May 17, 2013
Messages
116
I'd say give it a whirl again after 12.0-U2 drops today. There was some performance issues fixed with Intel networking specifically which could be the culprit here.
So these are the speeds after updating. Mind you this is a RAIDZ1, 4x Seagate Exos 7E8 8TB (ST800NM0055) attached SAS9211-8I 8PORT Int 6GB on a 12Gb/s SAS plate
Thanks for the article, I do appreciate it and understand what it is saying but I don't feel the issue I am seeing is a result of RAIDZ.
I've transfered a 1.8GB .cab file, then started a data migration of a VM that is roughly 2TB and their speeds were the same. The screen shot below the 2TB vm being transferred. Roughly 2.5hrs and it has transfer 42GB which comes out to roughly 7.755MB/s
1613015777056.png
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Start with a dd test to a dataset that has compression disabled. Do this for read and write. Next do an iperf test going both directions. One of these will tell you what the problem is.
 

TheUsD

Contributor
Joined
May 17, 2013
Messages
116
Start with a dd test to a dataset

I had been searching for good commands on this for about two days now. Are their any good docs out there that show you how to accomplish this?
Did several iperf tests, host to truenas, truenas to host, pc to truenas and everything came back +/- 20MB/s within each other. Pics below is from two VMs each on a different host to TrueNas using just the basic -c <ip> command:
1613052125952.png

1613052165416.png
 
Last edited:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I had been searching for good commands on this for about two days now. Are their any good docs out there that show you how to accomplish this?
Did several iperf tests, host to truenas, truenas to host, pc to truenas and everything came back +/- 20MB/s within each other. Pic below is one of my hosts to TrueNas with different iperf commands.
View attachment 45049
Looks like you have a network probably. It should be pegged at 940mbps both directions.

Dd test is simple dd if=/dev/zero of=./10gig.dat vs=1M count=10000

Then read test dd if=./10gig.dat of=/dev/null bs=1M
 
Last edited:

TheUsD

Contributor
Joined
May 17, 2013
Messages
116
Looks like you have a network probably. It should be pegged at 940mbps both directions.

Dd test is simple dd if=/dev/zero of=./10gig.dat vs=1M count=10000

Then read test dd if=./10gig.dat of=/dev/null vs=1M

Looks like I edited my post after you commented but I didn't see it. I have the wrong information in my reply. (was still half asleep).

Will do your DD test and post results.
 

TheUsD

Contributor
Joined
May 17, 2013
Messages
116
Ok, here are the results in a cleaner format:
Code:
dd if=/dev/zero of=./10gig.dat bs=1M count=10000
for each 8TB drive
1613056231805.png

238064341, 236931472, 237321255, 237018758 average = 237333956.5 converted to MB/s = 237.3339565
The Maximum write speed for the drives is 249MB/s so I would say the drives are writing at fair speeds.

Network Speeds:
Commands:
TrueNAS: iperf -s
ESXi Host: iperf -c <ip>

From ESXi host 1 to TrueNas:
1613056667953.png

From ESXi host 2 to TrueNas:
1613057443380.png

From ESXi host 3 to TrueNAS
1613056750762.png


Return traffic (TrueNAS to each VM on different Host) in same order:
1613060263009.png
 
Last edited:

TheUsD

Contributor
Joined
May 17, 2013
Messages
116
Took the SFP card out of LACP and also deleted pool and changed it to stripe, with compression. back to 10.94MB/s.

This does not resolve the issue as the speeds are still slower than the SSDs (which are also at subpart r/w speeds) and the card was in LACP mode. Still think I'm missing something here.
 
Last edited:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Ok, here are the results in a cleaner format:
Code:
dd if=/dev/zero of=./10gig.dat bs=1M count=10000
for each 8TB drive
View attachment 45053
238064341, 236931472, 237321255, 237018758 average = 237333956.5 converted to MB/s = 237.3339565
The Maximum write speed for the drives is 249MB/s so I would say the drives are writing at fair speeds.

Network Speeds:
Commands:
TrueNAS: iperf -s
ESXi Host: iperf -c <ip>

From ESXi host 1 to TrueNas:
View attachment 45054
From ESXi host 2 to TrueNas:
View attachment 45058
From ESXi host 3 to TrueNAS
View attachment 45055

Return traffic (TrueNAS to each VM on different Host) in same order:
View attachment 45060
No clue what you are testing here. Looks like you just wanted to test the read speed of each individually disk. I would suggest testing the read and write speed of the pool like I posted above.
 

TheUsD

Contributor
Joined
May 17, 2013
Messages
116
Dd test is simple dd if=/dev/zero of=./10gig.dat vs=1M count=10000

Then read test dd if=./10gig.dat of=/dev/null bs=1M

This is the commands you told me to use, correct?

dd if=/dev/zero of=./10gig.dat vs=1M count=10000"
It gave me the error below.
1613076948636.png

So I changed the vs to a bs thinking maybe you just made a typo. "bs" worked it gave me the results I posted.
 

TheUsD

Contributor
Joined
May 17, 2013
Messages
116
Ok, I see what you were saying and sorry I misunderstood. I was looking to see if there was a faulty drive that was causing the pool to be slow (my mind was thinking back to the article)
This is the results of the entire pool:
Command: dd if=/dev/zero of=mnt/<pool>/10gig.bat bs=1m count=10000
1613078752910.png

dd if=/mnt/<pool>/10gig.dat of=/dev/null bs=1M
1613079042815.png
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Ok, I see what you were saying and sorry I misunderstood. I was looking to see if there was a faulty drive that was causing the pool to be slow (my mind was thinking back to the article)
This is the results of the entire pool:
Command: dd if=/dev/zero of=mnt/<pool>/10gig.bat bs=1m count=10000
View attachment 45083
It has to be a dataset without compression enabled.
 

TheUsD

Contributor
Joined
May 17, 2013
Messages
116
1613079652208.png

Without compression. (Sorry it turned it back on auto when recreating the pool).

How were you able to determine compression was on?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
View attachment 45085
Without compression. (Sorry it turned it back on auto when recreating the pool).

How were you able to determine compression was on?
when you get a number like 7GB/s write speed that is just absurd and it's from compression being on. Speeds like 613MB/s for your pool design is more accurate.

Ok so seeing your network speeds during the transfer at 10-13MB/s sure looks like a nic is using 100mbps link and not a gigabit link.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Share type to ESXi Cluster: NFSv3

Sorry for being late to the party here, but this jumped out to me.

NFS from ESXi defaults to "sync writes for everything" and this will result in poor performance without a sufficiently beefy SLOG and/or pool devices behind it.

@TheUsD - as a test, would you be willing to set sync=disabled on a test dataset inside either of the pools and see if this provides a drastic difference to your write speeds? This will override the default behavior and cause your ESXi NFS writes to be asynchronous; but this is an "unsafe" configuration as it carries a risk of data loss. Use it only for testing and benchmark purposes.
 
Last edited:

TheUsD

Contributor
Joined
May 17, 2013
Messages
116
when you get a number like 7GB/s write speed that is just absurd and it's from compression being on. Speeds like 613MB/s for your pool design is more accurate.

Makes sense. Good cardinal knowledge to remember.

Ok so seeing your network speeds during the transfer at 10-13MB/s sure looks like a nic is using 100mbps link and not a gigabit link.

I can understand where you are coming from but the iperf tests not negate that theory? I guess it could be possible something changes with how the NIC and TrueNAS handles a payload?

I ended up pulling the OS drive I was using and replaced it with another drive and loaded a fresh copy of 12.0-U2, removed the Dual SFP card and attempted to use the onboard NIC (intel chipset) which is a 1000BaseT. Attempted another migration and still only managed to get a blip of 812.Mb/s off the nic then it stuck around 0.12-0.60Mb/s. @HoneyBadger I disabled the sync like you suggested.

So with this info, I'm really starting to think the issue is networking and more specifically with the intel chipset and TrueNAS like @Kris Moore mentioned (though he said 12.0.-U2 might fix this once I upgraded) and nothing to do with the disks.
1613161017859.png

1613161037327.png
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Attempted another migration and still only managed to get a blip of 812.Mb/s off the nic then it stuck around 0.12-0.60Mb/s. @HoneyBadger I disabled the sync like you suggested.

Is this represented by the traffic spike at around 1402h-1404h in the interface traffic in your second attached image?

What operation did you start at 1347h? It spiked up rapidly to 600Mbps and then dropped down rapidly to first about 350Mbps and then 200Mbps. Was this "before pool recreation" using RAIDZ and the latter was the same migration with a stripe pool?

I use Intel NICs with the em driver on TN12 and I'm definitely not seeing the same limitations; that said, I'm also using iSCSI for my VMware cluster.
 
Top