Slow? ZFS performance

levinet

Cadet
Joined
Jan 9, 2021
Messages
9
Hey!

I have this setup:

HP DL380E G8
LSI 9207-8i HBA
Qlogic QLE2562 FC
64GB RAM

HDD:
3X HGST 6TB 4Kn
2x Seagate enterprise capacity 6TB 512e
2x Seagate EXOS 7E2 6TB 512e


Here is the problem:

I created a zvol and i connected via FC to an esxi host.

I tested with RDM and a VMFS datastore.

Here are the benchmark results without zfs set primarycache=none
Here are the benchmark result with zfs set pimarycache=none

When i copy a file to the RDM or the VMFS disk the speeds are very interesting...

Starts with 240MB/s (SATA2 SSD) or 400-500MB/s (NVMe SSD) (depends on where do i copy from) and after a while slows down to 100-165MB/s or lower, sometimes reach 0MB/s and stop for a second.

I think something is not good here.

Thanks

ashift=12
zvol block size = 128k
The main dataset record size=1M
 
Last edited by a moderator:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703

levinet

Cadet
Joined
Jan 9, 2021
Messages
9
I recommend reading this material:

The section on VMs mentions 4K block size.

You may want to read the section under the General recommendations on NVME overprovisioning and pool geometry too.

RAIDZ1 isn't a good option for block storage. Read here for more information on that: https://www.truenas.com/community/threads/the-path-to-success-for-block-storage.81165/


Thank you, i'll read it.

I use RAIDz2 for this 7 hdd
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703

levinet

Cadet
Joined
Jan 9, 2021
Messages
9
OK, RAIDZ2 is an equally poor choice. You need mirrors for IOPS.

I dont want to use this pool for running VM's, i just want to use it for DATA only, but the speed result are interesting, thats why i asked
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I dont want to use this pool for running VM's, i just want to use it for DATA only, but the speed result are interesting, thats why i asked

The overall issue remains the same - your network (or FC in this case) connection is faster than your back-end vdev, and you're writing asynchronously.

Your FC card can ingest data at 800MB/s which you see in the cached results, but your single 7-disk RAIDZ2 can't write that fast. Eventually the ZFS write throttle kicks in and forces the data to back off, eventually all the way to zero. As soon as it can make some headroom in the dirty data buffer, the FC firehose opens again for a fraction of a second and the throttle is back on.
 

levinet

Cadet
Joined
Jan 9, 2021
Messages
9
I tried an SMB share on the freenas server and the results are so low, starting with 115MB/s (this is fine) but within 10secs drops to 40-60MB/s speed, so no Fibre Channel and speed results are the same...
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Any idea?
You don't mention anything about having tried the 4K suggestion (or anything else you might have found in the ZFS tuning document I linked to.

I dont want to use this pool for running VM's,
Here is the problem:

I created a zvol and i connected via FC to an esxi host.

I tested with RDM and a VMFS datastore.
Those two statements seem contradictory.
Can you explain better what you are doing?
 

levinet

Cadet
Joined
Jan 9, 2021
Messages
9
I wrote this previously: "I tried an SMB share on the freenas server and the results are so low, starting with 115MB/s (this is fine) but within 10secs drops to 40-60MB/s speed, so no Fibre Channel and speed results are the same... " <--- These result are measured on the freenas box SMB share, so the problem is not with the fibre channel, thats why am i complaining...

So i try to explain what i want to do:

I want ONE raidz2 pool.
I just want to use for storing data only not for running VM's operating systems!
I want to connect the freenas box via via FibreChannel to the esxi host.
Thats all and thats why i created this thread.

I did this what am i wrote and speed results are not too good.

thanks
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
If you're using it as block storage to an ESXi host over whatever connection, you should consider the tuning points for block storage.

It will be important how the guest in ESXi is asking for writes and reads from ESXi and then how that translates to the requests sent to FreeNAS.

It's not just a case of direct access from the VM to the disks attached to FreeNAS.
 

hamstercnw

Cadet
Joined
May 7, 2020
Messages
5
Hello

The bottleneck may be zfs write confirmations.

When your guest OS writes to disk, it will wait for confirmation that the data has been written to disk. This is slow on zfs, especially if you have spinning disks. You should add a small SSD drive* and add it as a "log" device. Zfs will quickly write data to the log device (being an ssd) and confirm to guest OS. Then, during the following 5 seconds, the data is written to the zfs raidz disks.

*) You will only need to store data for up to 5 seconds. Calculate how many GB you might need to accumulate within 5 seconds. If you have local os or guests, network og fibrechannel links. Add them up. You'll probably be adequate with 32 or 64 GB but you won't find disks that small.

I added a cheap desktop ssd to my zfs as log device and io wait time was cut by 90%.

You may also add a small M2 SSD. If your motherboard do not support it, get a $2 pcie x4 adapter for it.
In TrueNas, add vdev to existing store and select slog or log device. The log device can be removed at any time by commandline without any disruptions.

RAIDz1_with_logs_vdev.png
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Hello

The bottleneck may be zfs write confirmations.

When your guest OS writes to disk, it will wait for confirmation that the data has been written to disk. This is slow on zfs, especially if you have spinning disks. You should add a small SSD drive* and add it as a "log" device. Zfs will quickly write data to the log device (being an ssd) and confirm to guest OS. Then, during the following 5 seconds, the data is written to the zfs raidz disks.

*) You will only need to store data for up to 5 seconds. Calculate how many GB you might need to accumulate within 5 seconds. If you have local os or guests, network og fibrechannel links. Add them up. You'll probably be adequate with 32 or 64 GB but you won't find disks that small.

I added a cheap desktop ssd to my zfs as log device and io wait time was cut by 90%.

You may also add a small M2 SSD. If your motherboard do not support it, get a $2 pcie x4 adapter for it.
In TrueNas, add vdev to existing store and select slog or log device. The log device can be removed at any time by commandline without any disruptions.

View attachment 58591

This is horrible advice. Adding a SLOG device is only meaningful if you bother to validate that the device has power loss protection or other equivalent data protection, otherwise you are just burning up an SSD's endurance for no particularly good reason. You would be better off simply disabling sync writes instead of adding some random janky SSD as a SLOG device.

References include


Failing to use a properly PLP protected SSD means that you aren't protected in any reasonable manner, and if you're "just" storing data and not VM disk storage on such a datastore, you're suffering a huge performance hit to do the sync writes but not gaining proper protection, plus you're just destroying the SSD along the way. In such a case, you can probably just turn off sync writes, and writes will go as fast as the system can manage.

I want ONE raidz2 pool.

Okay, but RAIDZ is terrible at block storage. Including relatively slow single client performance.


I just want to use for storing data only not for running VM's operating systems!
I want to connect the freenas box via via FibreChannel to the esxi host.

Um. ESXi doesn't care what kind of data you're storing. When you connect up over iSCSI or FC, you are performing block storage whether or not you like it, and regardless of what the data you "think" you're storing is. If your filesystem type is showing up as VMFS5 or VMFS6 in vCenter, the rules outlined in


apply to you. That last bit isn't just a single article, but links to additional articles that discuss specific subtopics which are usually relevant. Do not expect a RAIDZ1 or RAIDZ2 to be particularly fast with block storage for VMFS. Data storage is almost always better done over NFS or SMB directly from the client VM, rather than having it proxied through the hypervisor's VMFS layer.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Just to emphasise what @jgreco said.
@levinet - do not pay any attention to what @hamstercnw says it contains just enough fact to look reasonable whist in actuality being very dangerous (to your data) advice - I am not saying it will cause a spontaneous explosion that burns down your house and kills your pet rabbit

@Moderators - the "advice" offered by @hamstercnw is sufficiently wrong as to be below the awful bar, I would hate for someone to do a google search and follow that advice. It should be labelled as just plain bad advice - or with a health (of data) warning.
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
It should be labelled as just plain bad advice

I think I went one step beyond that,

This is horrible advice.

On one hand, I fully hear your concern about searches presenting bad results.

On the other hand, we do want to encourage participation by new members. In cases where wrong information is presented, community members or moderators are welcome to post corrections, preferably with facts to explain, because (*almost) no one comes here and invests time writing a post unless their intentions are good.

(*) I have counterexamples of course. But this doesn't feel anything like that.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I didn't even suggest (and don't want) banning the participant. Hopefully he/she has learnt something and we can all move on. I would just like to see that post edited with a warning saying that this is wrong.
Maybe I am just worrying over nothing - but nothing ever really leaves the internet once its posted.
 

hamstercnw

Cadet
Joined
May 7, 2020
Messages
5
Thanks for the references to why my solution is bad and data-dangerous. It's fine, and actually good, that you correct/comment my statement here, @jgreco and @NugentS. Other community members might already be trusting such a bold setup not knowing the imminent danger - but still happy with performance.

I'm running guest hosts in my proxmox environment, utilizing virtual disks stored on TrueNas zfs via a 10GB ethernet nfs connection. The latency on disk io on the guest OS was initially horrible but performed way better with a zil SSD device. No doubt about that.

The zfs system, however, do support dual zil or SLOG vdevs to increase security, and to create a redundancy to mitigate SSD wearout. I still have no idea how long the SSD will last/survive. But I expect or hope TrueNas system will warn on SSD wear. Time will show. Adding and removing zil device is simple and can be done on-the-fly.
(OBTW: UPS=yes)
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
The zfs system, however, do support dual zil or SLOG vdevs to increase security, and to create a redundancy to mitigate SSD wearout.
You're going to have to explain to me how having 2 SSDs writing exactly the same data mitigates either of them wearing out more than a single one would. (you can't stripe SLOG devices)

I still have no idea how long the SSD will last/survive.
A good indicator is the TBW (Terabytes Written) value on the box... don't expect the drive to survive much beyond that.

But I expect or hope TrueNas system will warn on SSD wear.
If you're performing SMART tests regularly, you could get the value from that... I recommend running this script at least weekly to pick up and make sense of the SMART and pool data:

I'm running guest hosts in my proxmox environment, utilizing virtual disks stored on TrueNas zfs via a 10GB ethernet nfs connection. The latency on disk io on the guest OS was initially horrible but performed way better with a zil SSD device. No doubt about that.
While that's true and makes sense, it would also be way faster and about as safe to run with sync=disabled

If your objective was to minimize the slowdown while also having the safety of sync writes, then a proper SLOG is the only upgrade you can really do... burning out a perfectly OK SSD for no additional safety and actually doing the job slower than it can be makes no sense.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
A good indicator is the TBW (Terabytes Written) value on the box... don't expect the drive to survive much beyond that.
German magazine c't found in 2016/17 that at least the then current models could bear at least twice the TBW and sometimes much much more than that. The longest lasting one (Samsung 850 Pro, 256 GB) wrote 9.1 PB before dying.

 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
The zfs system, however, do support dual zil or SLOG vdevs to increase security, and to create a redundancy to mitigate SSD wearout. I still have no idea how long the SSD will last/survive. But I expect or hope TrueNas system will warn on SSD wear. Time will show. Adding and removing zil device is simple and can be done on-the-fly.
(OBTW: UPS=yes)
Having a UPS is no proper substitute for PLP on the SLOG drive itself.
Mirroring SLOG is for reliability rather than security. No data is lost upon failure by a SLOG device, only performance—but that in itself is likely not acceptable for professional use. It takes a crash/sudden power loss AND a SLOG failure upon the next boot to lose data.
While HDDs tend to give warnings as they gradually fail, SSDs tend to just "go poof". So there's no guarantee that two identical, and unsuitable, consumer SSDs put to SLOG duty would not fail without warning and in short sequence.

If sync=never is not an acceptable risk and you do need a SLOG for performance, do yourself a favour: Get a proper SLOG device. Radian RMS, Optane (DC), or data centre-grade drive with PLP and high endurance. These are not THAT expensive…
 
Top