low write perfomance NVME over ZFS/iSCSi

as43031

Dabbler
Joined
Aug 10, 2020
Messages
29
I have one NVME PCI-E SSD - PLEXTOR 1TB (PX-1TM9PeY)
If i tested perfomance in my comuter (windows) and i get a performance of about 3 gigabytes read and 2 gigabytes write.
I created a pool from this disk in freenas, and share to iscsi over 10Gbit/sec, and if i tested perfomance in freenas server, i get a performance of about 850 megabytes read and 700 megabytes write (see screenshot).
Deduplication is disabled, compression of lz4 (default).
Disabling synchronization and atime does not give anything.
I understand that i run into 10 gigabits of the network for reading, but I do not understand why the write speed does not run into 10 Gbit/sec limit?
I planned use PLEXTOR disk for the pool cache, but my current disks in the pool show about the same write speed result, which makes it meaningless to use it in principle.
thx for any help.
 

Attachments

  • plextor.png
    plextor.png
    57.5 KB · Views: 303

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
What's your MTU? are you using jumbo packets?
 

as43031

Dabbler
Joined
Aug 10, 2020
Messages
29
yea, im use jumboframe, 9000 bytes
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I guess there are some things to confirm here:

The best case scenario for 10Gbit network is 1024 x 1024 x 1024 (10,737,418,240) bits per second, so divide by 8 to get bytes: 1,342,177,280 / 1024 /1024 /1024 = 1.25 GBytes/second. So you're already well over 50% on sustained throughput

It seems you're talking about using this drive as SLOG to speed up your IOPS for block storage, so a large single stream file copy isn't the test you need to compare your pool with the SLOG drive. Large numbers of small, nonsequential transactions are what you need to test... although with an SSD pool,, you should perform well there unless you're using RAIDZ.

Your pool is already SSD, so the question would be why do you need SLOG? is your pool already mirrored pairs? Is the performance not sufficient?

The fact that your pool can match the throughput (and maybe the IOPS) of the SLOG may actually be a good thing as the SLOG can only hold a maximum of 5 seconds of transactions before needing to let the pool catch up, so your pool will always catch up and the SLOG should never need to block transactions.
 

as43031

Dabbler
Joined
Aug 10, 2020
Messages
29
for comparison, I have a vsan pool on vmvare and there I run into 10 gbps for both reading and writing, 50% will not be enough
 

as43031

Dabbler
Joined
Aug 10, 2020
Messages
29
i use RAIDZ for fault tolerance
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
i use RAIDZ for fault tolerance
So that's not the best way to get maximum IOPS from your server and hence not the best config for block storage for VMs...

You need mirrored pairs (as many as you can) to increase the IOPS of your pool. Currently with a RAIDZ pool, you have only the IOPS of a single disk.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
for comparison, I have a vsan pool on vmvare and there I run into 10 gbps for both reading and writing, 50% will not be enough
vSAN is a different beast. What's the hardware being used to implement the vSAN solution (HBA/cache/capacity disks?)

Remote storage will always be slower than the same hardware used locally because of the added overhead of TCP and physically shipping the packets across the wire.
 

as43031

Dabbler
Joined
Aug 10, 2020
Messages
29
You need mirrored pairs (as many as you can) to increase the IOPS of your pool. Currently with a RAIDZ pool, you have only the IOPS of a single disk.
I ran an experiment and the results surprised me. All variants have the same result
Any idea why this is so?
ssd - 960GB SSD INTEL DC D3-S4510
 

Attachments

  • ONE SSD.png
    ONE SSD.png
    67.3 KB · Views: 285
  • MIRROR.png
    MIRROR.png
    56.2 KB · Views: 256
  • plextor.png
    plextor.png
    57.5 KB · Views: 263

as43031

Dabbler
Joined
Aug 10, 2020
Messages
29
vSAN is a different beast. What's the hardware being used to implement the vSAN solution (HBA/cache/capacity disks?)

Remote storage will always be slower than the same hardware used locally because of the added overhead of TCP and physically shipping the packets across the wire.
4 node x 8 x ssd 128 GB Samsung 860 Pro
the network is the same
 

as43031

Dabbler
Joined
Aug 10, 2020
Messages
29

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I have several questions for you.

What program are you using to benchmark? It looks like HDTach. What workload does this benchmark run, or what is it measuring? It appears to be large 1M sequential reads and writes. That is not the usual determining factor for a VM that feels "responsive."

4 node x 8 x ssd 128 GB Samsung 860 Pro
the network is the same
8x SSD in each node or 8x total (2 per node)? Are you using these in a cache role as well as capacity? Just looking to clarify; they aren't on the HCL for either role but they'll survive longer as capacity tier. They have no capacitor to protect in-flight data and vSAN expects one on the cache devices which is why you may be seeing amplified results.

To be clear as well - for your TrueNAS solution, have you set your iSCSI zvol to use sync=standard or sync=always when you say that sync is enabled? The former is an unsafe write configuration (may lose data on power loss) but the latter is safe.
 

as43031

Dabbler
Joined
Aug 10, 2020
Messages
29
Program - HD tune pro 5.75 (if this tool is not correct, which one do you recommend for a farm of virtual machines on vmvare?)
vSAN: 8xSSD (to data)+1NVME (to cache) in each node.
I am not confused by the absence of a capacitor and HCL
Synhronization - standart (default)
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
For a swarm of VMs I would suggest the VMware HCIBench fling as it is specifically designed to stress hyperconverged and vSAN-based systems: https://flings.vmware.com/hcibench
This tool lets you test with configurable I/O patterns that can represent more typical VM workloads - such as "70/30 read/write split, 8K size"

For your vSAN hosts which NVMe disk is being used for cache? In all-flash vSAN your writes will all be sent here first, since HDTune is only generating a 5GB test file it will live entirely in the cache tier and be lazily flushed to capacity in the background.

The lack of HCL is not a problem if this is for home lab and testing, but I would not use it for any business/production workload. (I don't even see the 860 Pro in 128GB size on their spec sheets.) A lack of capacitors may cause potential for data loss (especially on cache drive) because vSAN expects that any data it sends to the cache drive is "safe" - if in reality it is in a DRAM buffer on your cache SSD, it may be lost on a power failure.

Finally sync=standard is not a write-safe configuration for iSCSI ZVOLs on VMware. The writes are only in RAM on the TrueNAS server and would be lost in a power outage. You must use sync=always which will ensure that the pending writes are on stable storage. This often requires the use of a separate log device (or SLOG) to ensure performance does not suffer.
 

as43031

Dabbler
Joined
Aug 10, 2020
Messages
29
okay we're a little distracted
I created a pool: 5 vdevs (2 ssd each Intel 960GB DC D3-4510 SATA-3 in the mirror)
I shared this pool using an iscsi and got the result - see the screenshot, no idea why? I expected to run into the network bandwidth, but this did not happen.
The SYLOG and Cache device were not used.
 

Attachments

  • Безымянный.png
    Безымянный.png
    88.7 KB · Views: 290

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
If you're only testing with a single client/thread (equivalent to a single VM doing a lot of writes), then you're not going to see the benefit of the extra IOPS you get with a pool of mirrors. You need to simulate more realistic load with the tools that @HoneyBadger suggested.
 
Top