Slow transfer speeds with compression on dataset

gyfer · Nov 26, 2023

winnielinnie said:
I've tried every combination

How about try this: Copy a BlueRay DVD image file into your TrueNAS with compression ON, then try to copy it to a different location in your TrueNAS using your Windows GUI ?

using /dev/zero does not put a load on Windows Defender because is "zero bits"

asap2go · Nov 27, 2023

Just to rule out memory issues:
Do you see any activity on your swap while copying those files?

gyfer · Nov 28, 2023

asap2go said:
Just to rule out memory issues:
Do you see any activity on your swap while copying those files?

If there are memory issues, the transfer rates for both uncompressed and compressed file transfers are likely to be affected, not just compressed file.

Elliot Dierksen · Nov 28, 2023

Have you looked at the dashboard while you are doing the copy? I ask to see if one of your CPU's is spiking during the process. I believe Samba is still single threaded. I have noticed this when I do CLI commands to try and create some CPU load and network traffic like this. find / -type f -exec cksum {} \;.I don't know the inner workings of NFS well enough to explain this, but I see the load distributed across CPU's when a Vmotion is happening, but several other things like an SMB copy or my cksum commands only execute on a single CPU.

HoneyBadger · Nov 28, 2023

Circling back on this as a break from the other fun. :P

The copy_file_range call (fired by server-side copy) has extremely high latency on highly-compressed files under FreeBSD. We haven't been able to reproduce this pathology under Linux.

It works with a local cp because only a single copy_file_range is called, so you only pay the "entry cost" of that latency once; but when Windows does a server-side copy over SMB, it creates the new file, ftruncates to the full size, and then does copy_file_range in 1M chunks. Because Windows doesn't see compressed records (only ZFS does) this scales to the decompressed size of the file (LSIZE, not ASIZE) - so for a 10G logically sized file, you pay that latency cost 10000x.

So, that's why this happens.

winnielinnie · Nov 28, 2023

HoneyBadger said:
but when Windows does a server-side copy over SMB, it creates the new file, ftruncates to the full size, and then does copy_file_range in 1M chunks. Because Windows doesn't see compressed records (only ZFS does) this scales to the decompressed size of the file (LSIZE, not ASIZE) - so for a 10G logically sized file, you pay that latency cost 10000x.

I know you're referring to "Windows" here, but does this supposedly apply to any client OS connected via SMB? Or is this a particular behavior that is unique to Windows?

EDIT: But then again, I could not reproduce this myself. Everything is set up practically the same, except for the difference of mirror vs RAIDZ.

HoneyBadger · Nov 28, 2023

winnielinnie said:
is this a particular behavior that is unique to Windows?

Appears to be this part. Linux clients seem to be closer to the behavior of "just copy_file_range the whole thing in one go" which as mentioned only pays the syscall entry-cost once.

Underlying vdev speed does lightly impact things, so mirrors might improve over RAIDZ, but might need a larger file to tease out the differences. Could smash a few copies of that sample AVID video together and see if that exacerbates things?

winnielinnie · Nov 28, 2023

HoneyBadger said:
Could smash a few copies of that sample AVID video together and see if that exacerbates things?

You mean appending the video to itself multiple times (without re-encoding) so that it's an even bigger, massive (highly compressible) video file?

winnielinnie · Nov 28, 2023

Created a highly compressible file from the sample AVID video, by appending it to itself multiple times, where the new file is 12.4 GB large (yet only consumes 400 MB on my ZSTD dataset).

I copied it via SMB to the same dataset (SSD pool), and it once again happened relatively quickly, in a matter of seconds. (Fairly sustained speeds as well.)

Praise be to mirror vdevs? Or maybe I'm just lucky.

Reminder: TrueNAS Core 13.0-U6

anodos · Nov 28, 2023

winnielinnie said:
Created a highly compressible file from the sample AVID video, by appending it to itself multiple times, where the new file is 12.4 GB large (yet only consumes 400 MB on my ZSTD dataset).

I copied it via SMB to the same dataset (SSD pool), and it once again happened relatively quickly, in a matter of seconds. (Fairly sustained speeds as well.)

Praise be to mirror vdevs? Or maybe I'm just lucky.

If you're morbidly curious you can do a copy_file_range syscall latency histogram using dtrace under a variety of scenarios. It's not particularly hard.

winnielinnie · Nov 28, 2023

anodos said:
If you're morbidly curious you can fuðseg tþeisþoad dohs wmne toaere tat lroik cłþo ywriðûpaþ ieþnt hya egud ohl wahw eenhâsi horemao heyras sþevd aesso do geoeðeoa ir osofhew çsie taaþawi. It's not particularly hard.

What?

anodos · Nov 28, 2023

winnielinnie said:
What?

There are key words you should be able to google. Writing a dtrace script isn't hard if you're familiar with script writing.

awasb · Nov 28, 2023

dtrace script latency.d ... just for a start ...

Code:

#!/usr/sbin/dtrace -s
 
syscall::copy_file_range:entry
{
  self->t = timestamp;
}
 
syscall::copy_file_range:return
{
  @latencies = quantize(timestamp - self->t);
}

Execute with something like

Code:

$ dtrace -s latency.d

Again ... just for a start.

Edith yells at me: Under /usr/share/dtrace/ are some great (and much better) examples, what can be done.

jperham-ct · Dec 4, 2023

Sorry for the late replies on this..an update on the issue I was having.

I migrated to TrueNAS Scale as suggested which seemed to fix the issue for the most part. There was still some speed issues when it came to transferring files which was inconsistently occurring.

After about six hours, I started getting error messages in the console pointing to some type of hardware issue:

Code:

Nov 16 18:19:48 fs3 kernel: mce: [Hardware Error]: Machine check events logged
Nov 16 18:19:48 fs3 kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_MC#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xb7db3f offset:0xc00 grain:32 syndrome:0x0 -  err_code:0x0101:0x0090 ProcessorSocketId:0x0 MemoryControllerId:0x0 PhysicalRankId:0x1 Row:0xafb9 Column:0x3f0 Bank:0x1 BankGroup:0x2 retry_rd_err_log[0001a20d 00000000 01000000 04fc604d 0000afb9] correrrcnt[0000 0001 0000 0000 0000 0000 0000 0000])
Nov 16 18:19:48 fs3 kernel: mce: [Hardware Error]: Machine check events logged
Nov 16 18:19:48 fs3 kernel: EDAC MC0: 1 CE memory scrubbing error on CPU_SrcID#0_MC#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xb7db3a offset:0xc80 grain:32 syndrome:0x0 -  err_code:0x0008:0x00c0 ProcessorSocketId:0x0 MemoryControllerId:0x0 PhysicalRankId:0x1 Row:0xafb9 Column:0x2b8 Bank:0x1 BankGroup:0x2 retry_rd_err_log[0001a20d 00000000 01000000 04fc604d 0000afb9] correrrcnt[0000 0001 0000 0000 0000 0000 0000 0000])
Nov 16 18:24:54 fs3 kernel: mce: [Hardware Error]: Machine check events logged
Nov 16 18:24:54 fs3 kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_MC#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xb7db3e offset:0x480 grain:32 syndrome:0x0 -  err_code:0x0101:0x0090 ProcessorSocketId:0x0 MemoryControllerId:0x0 PhysicalRankId:0x1 Row:0xafb9 Column:0x398 Bank:0x1 BankGroup:0x2 retry_rd_err_log[0001a20d 00000000 01000000 04fc604d 0000afb9] correrrcnt[0000 0003 0000 0000 0000 0000 0000 0000])
Nov 16 18:24:54 fs3 kernel: mce: [Hardware Error]: Machine check events logged
Nov 16 18:24:55 fs3 kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_MC#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xb7db3f offset:0xc00 grain:32 syndrome:0x0 -  err_code:0x0101:0x0090 ProcessorSocketId:0x0 MemoryControllerId:0x0 PhysicalRankId:0x1 Row:0xafb9 Column:0x3f0 Bank:0x1 BankGroup:0x2 retry_rd_err_log[0001a20d 00000000 01000000 04fc604d 0000afb9] correrrcnt[0000 0003 0000 0000 0000 0000 0000 0000])
Nov 16 18:30:11 fs3 kernel: mce: [Hardware Error]: Machine check events logged
Nov 16 18:30:11 fs3 kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_MC#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xb7db3f offset:0x80 grain:32 syndrome:0x0 -  err_code:0x0101:0x0090 ProcessorSocketId:0x0 MemoryControllerId:0x0 PhysicalRankId:0x1 Row:0xafb9 Column:0x3c8 Bank:0x1 BankGroup:0x2 retry_rd_err_log[0001a20d 00000000 10000000 04f2604d 0000afb9] correrrcnt[0000 0003 0000 0000 0000 0000 0000 0000]

The hardware (motherboard, PCIe cards, RAM, and CPU) was replaced by support and that seems to have resolved the issue fully. I suspect Core wasn't detecting these hardware issues (or not reporting it as an issue I guess) and that was the entire root cause of this issue.

I appreciate everybody looking into it.

Important Announcement for the TrueNAS Community.

Slow transfer speeds with compression on dataset

gyfer

Dabbler

asap2go

Patron

gyfer

Dabbler

Elliot Dierksen

Guru

HoneyBadger

actually does care

winnielinnie

MVP

HoneyBadger

actually does care

winnielinnie

MVP

winnielinnie

MVP

anodos

Sambassador

winnielinnie

MVP

anodos

Sambassador

awasb

Patron

jperham-ct

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

Slow transfer speeds with compression on dataset

Dabbler

Patron

Dabbler

Guru

actually does care

MVP

actually does care

MVP

MVP

Sambassador

MVP

Sambassador

Patron

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Slow transfer speeds with compression on dataset"

Similar threads