Extremely poor latency accessing pool

CelticWebs

Dabbler
Joined
Dec 4, 2023
Messages
18
I'm running Proxmox backup server in a VM. Setup as follows:

Spec
Dual Xeon based server running Trueness scale
48GB memory
LSI 9211-4i raid controller in IT mode.
2x 500gb Enterprise SSD I as mirrored boto drive
24 Enterprise 4TB SAS disks in a Draid draid3:8d:22c:1s-0

I've realised the main pool has latency issues but have no idea how to diagnose. I did some basic tests from the Truenas Terminal, results below.

Speed
dd if=/dev/zero of=test1.img bs=1G count=1 oflag=dsync

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.77884 s, 386 MB/s

Latency
dd if=/dev/zero of=test2.img bs=512 count=1000 oflag=dsync

512000 bytes (512 kB, 500 KiB) copied, 11.3417 s, 45.1 kB/s

As you can see the write speed is "OK" but the latency is awful, the equivalent latency on my Synology DS1019+ with 5 disk BTRFS raid is :
512000 bytes (512 kB, 500 KiB) copied, 0.328728 s, 1.6 MB/s


Does anyone have any suggestions what could be causing this poor latency? I am at a loss as to what it could be!

Thanks in advance for your help!
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
I’m not sure exactly what you are testing for here. You are intentionally writing 512 byte blocks and only writing 512K of data.

This isn’t really a real world test. Can you kick off a backup for Proxmox? While running kick off this command on the TrueNAS

Zpool iostat -vvyl 120 1

Which will sample pool performance, including latency, for two minutes
 

CelticWebs

Dabbler
Joined
Dec 4, 2023
Messages
18
I’m not sure exactly what you are testing for here. You are intentionally writing 512 byte blocks and only writing 512K of data.

This isn’t really a real world test. Can you kick off a backup for Proxmox? While running kick off this command on the TrueNAS

Zpool iostat -vvyl 120 1

Which will sample pool performance, including latency, for two minutes
Thanks for the response. So run a backup and then run the iostat command outside of the vm in the main truenas terminal?
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Thanks for the response. So run a backup and then run the iostat command outside of the vm in the main truenas terminal?
Yep, while the backup is running that command will observe the performance and print out telemetry
 

CelticWebs

Dabbler
Joined
Dec 4, 2023
Messages
18
Yep, while the backup is running that command will observe the performance and print out telemetry
Here’s the results, not sure if they’re good or bad tbh.
 

Attachments

  • IMG_0722.png
    IMG_0722.png
    878.5 KB · Views: 135

CelticWebs

Dabbler
Joined
Dec 4, 2023
Messages
18
I’ll do it from an actual computer shortly so that the screen is better laid out to read.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
@CelticWebs you have disk latency averaging about 12ms. What this effectively means is that the bottleneck you are trying to illustrate is probably not there. I think your testing methodology is just poorly illustrative of the workload.
 

CelticWebs

Dabbler
Joined
Dec 4, 2023
Messages
18
@CelticWebs you have disk latency averaging about 12ms. What this effectively means is that the bottleneck you are trying to illustrate is probably not there. I think your testing methodology is just poorly illustrative of the workload.
Thanks Nick, I did some tests on my Synology system and figures wise, everything performed better on the Truenas Scale system, whcih you'd expect wit hath spec difference. So I'm at a total loss why It operates so slowly when doing backups and Garbage collection in Proxmox Backup Server.

It's been running for a day on a garbage collection and as you can see, it hasn't got to 2% yet and the system it's self is under no load at all!

I'm wondering if it's the way the VM is setup somehow.


Image 02-04-2024 at 18.57.jpeg

Image 02-04-2024 at 18.59.jpeg
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
I'm not sure entirely how Proxmox Backup does it's thing. Looking here:

Are you using compression on the Proxmox Side and the ZFS side?

Code:
For block based backups (like VMs), fixed-sized chunks are used. The content (disk image), is split into chunks of the same length (typically 4 MiB).


You can try increasing the ZFS record size to 1M from 128K to see if that helps.
 
Top