Very slow R/W speeds

TheUsD · Feb 9, 2021

Network Engineer here with a medium to large exposure to the Dell Compellent Storage line (help support about 4PB across 26 locations). I'm always hungry and wanting to grow my knowledge so I decided to learn more about Free/TrueNAS by building a small box. I'm very aware I'm not using anything enterprise / server grade or quality here (except for 4 out of 10 disks) so I'm not looking for anything amazing when it comes to speeds, however I feel there is a major bottleneck in the R/W speeds on both Pools but more especially Pool: Storage-01. I have my suspicions and want to see if anyone can confirm them (will not post suspicions so I do not influence you).

TrueNAS Setup:

OS: TrueNAS-12.0-U1.1

MB: GIGABYTE H370M DS3H
CPU: Intel i5 8500

RAM: 4x8GB (32GB) DDR4 2400 (NON-ECC)
SAS: SAS9211-8I 8PORT Int 6GB (in IT Mode)

NIC: dual Intel 82576 SFP 1Gbps DACs (in LACP)

Storage Drives:

Purpose secondary drives for VMs (no need for fast storage) and VM backups:
Pool Name: Storage-01;

Hardware: 4x Seagate Exos 7E8 8TB (ST800NM0055) attached SAS9211-8I 8PORT Int 6GB

Pool type: RAIDz1 (no zVOL)

Share type to ESXi Cluster: NFSv3

(note) I tried to used a 128GB NVMe2.0 as a SYSOL and then as a L2ARC for speed testing.

Best R or W speed has been 13MB/s.

Purpose: VM OS's / boot drives:

Pool Name: Boot-01
Harware: 4x Samsung Evo 860 SSD attached to MB

Pool type: RAIDz1 (no zVOL)

Share type to ESXi Cluster: NFSv3

Best R or W speed has been 112MB/s

Other Storage Device:

NetGear ReadyNAS 214

Storage: RAID5 4TB rando HDDs

NIC: daul 1Gps Ethernet in LACP

Share type to ESXi Cluster NFSv3

ESXi Hosts:

2x Dell OptiPlex 7070's SFF

OS: ESXi 6.7u3

CPU: i5 9500

RAM 64GB DDR4 2400 (non-ECC)

NIC1: dual-port 1.25Gb/s Ethernet Intel 82576 Chip (dedicated to VMs

NIC2: Single-port

1x Dell OptiPlex 7040m

OS: ESXi 6.7u3 w /vsphere

CPU: i5 6500

RAM: 24GB DDR2400
NIC: Rando onboard 1GB Ethernet intel NIC

Network Infrastructure:
Switch: FortiSwitch 224E-POE
Firewall: FortiGate 80F
There is no high i/o on the switch or FW

If you need more details on specs and setup, please let me know. I couldn't thing of much more to add with my limited knowledge of TrueNAS.

Migrating any VM storage/Boot between the Boot-01 and the ReadyNAS will transfer around the 105-112MB/s.
Migrating any VM storage/Boot between Boot-01 and Storage-01 will not exceed the 13MB/s
Migrating any VM storage/Boot between ReadyNASand Storage-01 will not exceed the 13MB/s

Any thoughts, advice or troubleshooting tips is appreciated. Snark remarks and condescending replies will be ignored. (Had to mention this because there seems to be a lot of people here that offer advice but especially focus more on trying to tell others how wrong they are and all but calling them "stupid" in the process.)

Kris Moore · Feb 9, 2021

I'd say give it a whirl again after 12.0-U2 drops today. There was some performance issues fixed with Intel networking specifically which could be the culprit here.

TheUsD · Feb 9, 2021

Thank you for the reply, Kris. I will look forward to the update and will report back any findings once the update has dropped and I have upgraded.

jgreco · Feb 9, 2021

TheUsD said:
Any thoughts, advice or troubleshooting tips is appreciated.

RAIDZ is probably not helping here.

https://www.truenas.com/community/r...and-why-we-use-mirrors-for-block-storage.112/

TheUsD said:
Snark remarks and condescending replies will be ignored. (Had to mention this because there seems to be a lot of people here that offer advice but especially focus more on trying to tell others how wrong they are and all but calling them "stupid" in the process.)

Postel's law might help you here. Do consider that everyone participating here is taking time out of their days to try to help you out. There is no need to consider remarks as "snark" or "condescending" if they can be reasonably taken in another way; don't take offense where none might have been meant (Postel's law as applied to interpersonal communications). If you feel something exceeds reasonableness, report it to the moderation team.

TheUsD · Feb 10, 2021

Kris Moore said:
I'd say give it a whirl again after 12.0-U2 drops today. There was some performance issues fixed with Intel networking specifically which could be the culprit here.

So these are the speeds after updating. Mind you this is a RAIDZ1, 4x Seagate Exos 7E8 8TB (ST800NM0055) attached SAS9211-8I 8PORT Int 6GB on a 12Gb/s SAS plate

jgreco said:
RAIDZ is probably not helping here.

https://www.truenas.com/community/r...and-why-we-use-mirrors-for-block-storage.112/

Thanks for the article, I do appreciate it and understand what it is saying but I don't feel the issue I am seeing is a result of RAIDZ.
I've transfered a 1.8GB .cab file, then started a data migration of a VM that is roughly 2TB and their speeds were the same. The screen shot below the 2TB vm being transferred. Roughly 2.5hrs and it has transfer 42GB which comes out to roughly 7.755MB/s

SweetAndLow · Feb 10, 2021

Start with a dd test to a dataset that has compression disabled. Do this for read and write. Next do an iperf test going both directions. One of these will tell you what the problem is.

TheUsD · Feb 11, 2021

SweetAndLow said:
Start with a dd test to a dataset

I had been searching for good commands on this for about two days now. Are their any good docs out there that show you how to accomplish this?
Did several iperf tests, host to truenas, truenas to host, pc to truenas and everything came back +/- 20MB/s within each other. Pics below is from two VMs each on a different host to TrueNas using just the basic -c <ip> command:

SweetAndLow · Feb 11, 2021

TheUsD said:
I had been searching for good commands on this for about two days now. Are their any good docs out there that show you how to accomplish this?
Did several iperf tests, host to truenas, truenas to host, pc to truenas and everything came back +/- 20MB/s within each other. Pic below is one of my hosts to TrueNas with different iperf commands.
View attachment 45049

Looks like you have a network probably. It should be pegged at 940mbps both directions.

Dd test is simple dd if=/dev/zero of=./10gig.dat vs=1M count=10000

Then read test dd if=./10gig.dat of=/dev/null bs=1M

TheUsD · Feb 11, 2021

SweetAndLow said:
Looks like you have a network probably. It should be pegged at 940mbps both directions.

Dd test is simple dd if=/dev/zero of=./10gig.dat vs=1M count=10000

Then read test dd if=./10gig.dat of=/dev/null vs=1M

Looks like I edited my post after you commented but I didn't see it. I have the wrong information in my reply. (was still half asleep).

Will do your DD test and post results.

TheUsD · Feb 11, 2021

Ok, here are the results in a cleaner format:

Code:

dd if=/dev/zero of=./10gig.dat bs=1M count=10000

for each 8TB drive

238064341, 236931472, 237321255, 237018758 average = 237333956.5 converted to MB/s = 237.3339565
The Maximum write speed for the drives is 249MB/s so I would say the drives are writing at fair speeds.

Network Speeds:
Commands:
TrueNAS: iperf -s
ESXi Host: iperf -c <ip>

From ESXi host 1 to TrueNas:

From ESXi host 2 to TrueNas:

From ESXi host 3 to TrueNAS

Return traffic (TrueNAS to each VM on different Host) in same order:

TheUsD · Feb 11, 2021

Took the SFP card out of LACP and also deleted pool and changed it to stripe, with compression. back to 10.94MB/s.

This does not resolve the issue as the speeds are still slower than the SSDs (which are also at subpart r/w speeds) and the card was in LACP mode. Still think I'm missing something here.

SweetAndLow · Feb 11, 2021

TheUsD said:
Ok, here are the results in a cleaner format:

Code:
dd if=/dev/zero of=./10gig.dat bs=1M count=10000
for each 8TB drive
View attachment 45053
238064341, 236931472, 237321255, 237018758 average = 237333956.5 converted to MB/s = 237.3339565
The Maximum write speed for the drives is 249MB/s so I would say the drives are writing at fair speeds.

Network Speeds:
Commands:
TrueNAS: iperf -s
ESXi Host: iperf -c <ip>

From ESXi host 1 to TrueNas:
View attachment 45054
From ESXi host 2 to TrueNas:
View attachment 45058
From ESXi host 3 to TrueNAS
View attachment 45055

Return traffic (TrueNAS to each VM on different Host) in same order:
View attachment 45060

No clue what you are testing here. Looks like you just wanted to test the read speed of each individually disk. I would suggest testing the read and write speed of the pool like I posted above.

TheUsD · Feb 11, 2021

SweetAndLow said:
Dd test is simple dd if=/dev/zero of=./10gig.dat vs=1M count=10000

Then read test dd if=./10gig.dat of=/dev/null bs=1M

This is the commands you told me to use, correct?

dd if=/dev/zero of=./10gig.dat vs=1M count=10000"
It gave me the error below.

So I changed the vs to a bs thinking maybe you just made a typo. "bs" worked it gave me the results I posted.

TheUsD · Feb 11, 2021

Ok, I see what you were saying and sorry I misunderstood. I was looking to see if there was a faulty drive that was causing the pool to be slow (my mind was thinking back to the article)
This is the results of the entire pool:
Command: dd if=/dev/zero of=mnt/<pool>/10gig.bat bs=1m count=10000

dd if=/mnt/<pool>/10gig.dat of=/dev/null bs=1M

SweetAndLow · Feb 11, 2021

TheUsD said:
Ok, I see what you were saying and sorry I misunderstood. I was looking to see if there was a faulty drive that was causing the pool to be slow (my mind was thinking back to the article)
This is the results of the entire pool:
Command: dd if=/dev/zero of=mnt/<pool>/10gig.bat bs=1m count=10000
View attachment 45083

It has to be a dataset without compression enabled.

TheUsD · Feb 11, 2021

Without compression. (Sorry it turned it back on auto when recreating the pool).

How were you able to determine compression was on?

SweetAndLow · Feb 11, 2021

TheUsD said:
View attachment 45085
Without compression. (Sorry it turned it back on auto when recreating the pool).

How were you able to determine compression was on?

when you get a number like 7GB/s write speed that is just absurd and it's from compression being on. Speeds like 613MB/s for your pool design is more accurate.

Ok so seeing your network speeds during the transfer at 10-13MB/s sure looks like a nic is using 100mbps link and not a gigabit link.

HoneyBadger · Feb 12, 2021

TheUsD said:
Share type to ESXi Cluster: NFSv3

Sorry for being late to the party here, but this jumped out to me.

NFS from ESXi defaults to "sync writes for everything" and this will result in poor performance without a sufficiently beefy SLOG and/or pool devices behind it.

@TheUsD - as a test, would you be willing to set sync=disabled on a test dataset inside either of the pools and see if this provides a drastic difference to your write speeds? This will override the default behavior and cause your ESXi NFS writes to be asynchronous; but this is an "unsafe" configuration as it carries a risk of data loss. Use it only for testing and benchmark purposes.

TheUsD · Feb 12, 2021

SweetAndLow said:
when you get a number like 7GB/s write speed that is just absurd and it's from compression being on. Speeds like 613MB/s for your pool design is more accurate.

Makes sense. Good cardinal knowledge to remember.

SweetAndLow said:
Ok so seeing your network speeds during the transfer at 10-13MB/s sure looks like a nic is using 100mbps link and not a gigabit link.

I can understand where you are coming from but the iperf tests not negate that theory? I guess it could be possible something changes with how the NIC and TrueNAS handles a payload?

I ended up pulling the OS drive I was using and replaced it with another drive and loaded a fresh copy of 12.0-U2, removed the Dual SFP card and attempted to use the onboard NIC (intel chipset) which is a 1000BaseT. Attempted another migration and still only managed to get a blip of 812.Mb/s off the nic then it stuck around 0.12-0.60Mb/s. @HoneyBadger I disabled the sync like you suggested.

So with this info, I'm really starting to think the issue is networking and more specifically with the intel chipset and TrueNAS like @Kris Moore mentioned (though he said 12.0.-U2 might fix this once I upgraded) and nothing to do with the disks.

HoneyBadger · Feb 12, 2021

TheUsD said:
Attempted another migration and still only managed to get a blip of 812.Mb/s off the nic then it stuck around 0.12-0.60Mb/s. @HoneyBadger I disabled the sync like you suggested.

Is this represented by the traffic spike at around 1402h-1404h in the interface traffic in your second attached image?

What operation did you start at 1347h? It spiked up rapidly to 600Mbps and then dropped down rapidly to first about 350Mbps and then 200Mbps. Was this "before pool recreation" using RAIDZ and the latter was the same migration with a stripe pool?

I use Intel NICs with the em driver on TN12 and I'm definitely not seeing the same limitations; that said, I'm also using iSCSI for my VMware cluster.

Important Announcement for the TrueNAS Community.

Very slow R/W speeds

Contributor

SVP of Engineering

Contributor

Resident Grinch

Contributor

Sweet'NASty

Contributor

Sweet'NASty

Contributor

Contributor

Contributor

Sweet'NASty

Contributor

Contributor

Sweet'NASty

Contributor

Sweet'NASty

actually does care

Contributor

actually does care

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Very slow R/W speeds"

Similar threads