low write perfomance NVME over ZFS/iSCSi

as43031 · Oct 28, 2020

I have one NVME PCI-E SSD - PLEXTOR 1TB (PX-1TM9PeY)
If i tested perfomance in my comuter (windows) and i get a performance of about 3 gigabytes read and 2 gigabytes write.
I created a pool from this disk in freenas, and share to iscsi over 10Gbit/sec, and if i tested perfomance in freenas server, i get a performance of about 850 megabytes read and 700 megabytes write (see screenshot).
Deduplication is disabled, compression of lz4 (default).
Disabling synchronization and atime does not give anything.
I understand that i run into 10 gigabits of the network for reading, but I do not understand why the write speed does not run into 10 Gbit/sec limit?
I planned use PLEXTOR disk for the pool cache, but my current disks in the pool show about the same write speed result, which makes it meaningless to use it in principle.
thx for any help.

sretalla · Oct 28, 2020

What's your MTU? are you using jumbo packets?

as43031 · Oct 28, 2020

yea, im use jumboframe, 9000 bytes

sretalla · Oct 28, 2020

I guess there are some things to confirm here:

The best case scenario for 10Gbit network is 1024 x 1024 x 1024 (10,737,418,240) bits per second, so divide by 8 to get bytes: 1,342,177,280 / 1024 /1024 /1024 = 1.25 GBytes/second. So you're already well over 50% on sustained throughput

It seems you're talking about using this drive as SLOG to speed up your IOPS for block storage, so a large single stream file copy isn't the test you need to compare your pool with the SLOG drive. Large numbers of small, nonsequential transactions are what you need to test... although with an SSD pool,, you should perform well there unless you're using RAIDZ.

Your pool is already SSD, so the question would be why do you need SLOG? is your pool already mirrored pairs? Is the performance not sufficient?

The fact that your pool can match the throughput (and maybe the IOPS) of the SLOG may actually be a good thing as the SLOG can only hold a maximum of 5 seconds of transactions before needing to let the pool catch up, so your pool will always catch up and the SLOG should never need to block transactions.

as43031 · Oct 28, 2020

for comparison, I have a vsan pool on vmvare and there I run into 10 gbps for both reading and writing, 50% will not be enough

as43031 · Oct 28, 2020

i use RAIDZ for fault tolerance

sretalla · Oct 28, 2020

as43031 said:
i use RAIDZ for fault tolerance

So that's not the best way to get maximum IOPS from your server and hence not the best config for block storage for VMs...

You need mirrored pairs (as many as you can) to increase the IOPS of your pool. Currently with a RAIDZ pool, you have only the IOPS of a single disk.

HoneyBadger · Oct 28, 2020

as43031 said:
for comparison, I have a vsan pool on vmvare and there I run into 10 gbps for both reading and writing, 50% will not be enough

vSAN is a different beast. What's the hardware being used to implement the vSAN solution (HBA/cache/capacity disks?)

Remote storage will always be slower than the same hardware used locally because of the added overhead of TCP and physically shipping the packets across the wire.

jgreco · Oct 28, 2020

as43031 said:
i use RAIDZ for fault tolerance

That's a fail, yessir.

The following post was written for people doing SAN storage:

https://www.truenas.com/community/threads/the-path-to-success-for-block-storage.81165/

as43031 · Oct 29, 2020

You need mirrored pairs (as many as you can) to increase the IOPS of your pool. Currently with a RAIDZ pool, you have only the IOPS of a single disk.

I ran an experiment and the results surprised me. All variants have the same result
Any idea why this is so?
ssd - 960GB SSD INTEL DC D3-S4510

as43031 · Oct 29, 2020

HoneyBadger said:
vSAN is a different beast. What's the hardware being used to implement the vSAN solution (HBA/cache/capacity disks?)

Remote storage will always be slower than the same hardware used locally because of the added overhead of TCP and physically shipping the packets across the wire.

4 node x 8 x ssd 128 GB Samsung 860 Pro
the network is the same

as43031 · Oct 29, 2020

jgreco said:
That's a fail, yessir.

The following post was written for people doing SAN storage:

https://www.truenas.com/community/threads/the-path-to-success-for-block-storage.81165/

i will read this

HoneyBadger · Oct 29, 2020

I have several questions for you.

What program are you using to benchmark? It looks like HDTach. What workload does this benchmark run, or what is it measuring? It appears to be large 1M sequential reads and writes. That is not the usual determining factor for a VM that feels "responsive."

as43031 said:
4 node x 8 x ssd 128 GB Samsung 860 Pro
the network is the same

8x SSD in each node or 8x total (2 per node)? Are you using these in a cache role as well as capacity? Just looking to clarify; they aren't on the HCL for either role but they'll survive longer as capacity tier. They have no capacitor to protect in-flight data and vSAN expects one on the cache devices which is why you may be seeing amplified results.

To be clear as well - for your TrueNAS solution, have you set your iSCSI zvol to use sync=standard or sync=always when you say that sync is enabled? The former is an unsafe write configuration (may lose data on power loss) but the latter is safe.

as43031 · Oct 29, 2020

Program - HD tune pro 5.75 (if this tool is not correct, which one do you recommend for a farm of virtual machines on vmvare?)
vSAN: 8xSSD (to data)+1NVME (to cache) in each node.
I am not confused by the absence of a capacitor and HCL
Synhronization - standart (default)

HoneyBadger · Oct 29, 2020

For a swarm of VMs I would suggest the VMware HCIBench fling as it is specifically designed to stress hyperconverged and vSAN-based systems: https://flings.vmware.com/hcibench
This tool lets you test with configurable I/O patterns that can represent more typical VM workloads - such as "70/30 read/write split, 8K size"

For your vSAN hosts which NVMe disk is being used for cache? In all-flash vSAN your writes will all be sent here first, since HDTune is only generating a 5GB test file it will live entirely in the cache tier and be lazily flushed to capacity in the background.

The lack of HCL is not a problem if this is for home lab and testing, but I would not use it for any business/production workload. (I don't even see the 860 Pro in 128GB size on their spec sheets.) A lack of capacitors may cause potential for data loss (especially on cache drive) because vSAN expects that any data it sends to the cache drive is "safe" - if in reality it is in a DRAM buffer on your cache SSD, it may be lost on a power failure.

Finally sync=standard is not a write-safe configuration for iSCSI ZVOLs on VMware. The writes are only in RAM on the TrueNAS server and would be lost in a power outage. You must use sync=always which will ensure that the pending writes are on stable storage. This often requires the use of a separate log device (or SLOG) to ensure performance does not suffer.

as43031 · Oct 30, 2020

okay we're a little distracted
I created a pool: 5 vdevs (2 ssd each Intel 960GB DC D3-4510 SATA-3 in the mirror)
I shared this pool using an iscsi and got the result - see the screenshot, no idea why? I expected to run into the network bandwidth, but this did not happen.
The SYLOG and Cache device were not used.

sretalla · Oct 30, 2020

If you're only testing with a single client/thread (equivalent to a single VM doing a lot of writes), then you're not going to see the benefit of the extra IOPS you get with a pool of mirrors. You need to simulate more realistic load with the tools that @HoneyBadger suggested.

Important Announcement for the TrueNAS Community.

low write perfomance NVME over ZFS/iSCSi

as43031

Dabbler

Attachments

sretalla

Powered by Neutrality

as43031

Dabbler

sretalla

Powered by Neutrality

as43031

Dabbler

as43031

Dabbler

sretalla

Powered by Neutrality

HoneyBadger

actually does care

jgreco

Resident Grinch

as43031

Dabbler

Attachments

as43031

Dabbler

as43031

Dabbler

HoneyBadger

actually does care

as43031

Dabbler

HoneyBadger

actually does care

as43031

Dabbler

Attachments

sretalla

Powered by Neutrality

Similar threads

Important Announcement for the TrueNAS Community.

low write perfomance NVME over ZFS/iSCSi

Dabbler

Attachments

Powered by Neutrality

Dabbler

Powered by Neutrality

Dabbler

Dabbler

Powered by Neutrality

actually does care

Resident Grinch

Dabbler

Attachments

Dabbler

Dabbler

actually does care

Dabbler

actually does care

Dabbler

Attachments

Powered by Neutrality

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "low write perfomance NVME over ZFS/iSCSi"

Similar threads