Slow transfer speeds with compression on dataset

gyfer

Dabbler
Joined
Feb 13, 2020
Messages
14
I've tried every combination

How about try this: Copy a BlueRay DVD image file into your TrueNAS with compression ON, then try to copy it to a different location in your TrueNAS using your Windows GUI ?

using /dev/zero does not put a load on Windows Defender because is "zero bits"
 

asap2go

Patron
Joined
Jun 11, 2023
Messages
228
Just to rule out memory issues:
Do you see any activity on your swap while copying those files?
 
Joined
Dec 29, 2014
Messages
1,135
Have you looked at the dashboard while you are doing the copy? I ask to see if one of your CPU's is spiking during the process. I believe Samba is still single threaded. I have noticed this when I do CLI commands to try and create some CPU load and network traffic like this. find / -type f -exec cksum {} \;.I don't know the inner workings of NFS well enough to explain this, but I see the load distributed across CPU's when a Vmotion is happening, but several other things like an SMB copy or my cksum commands only execute on a single CPU.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Circling back on this as a break from the other fun. :P

The copy_file_range call (fired by server-side copy) has extremely high latency on highly-compressed files under FreeBSD. We haven't been able to reproduce this pathology under Linux.

It works with a local cp because only a single copy_file_range is called, so you only pay the "entry cost" of that latency once; but when Windows does a server-side copy over SMB, it creates the new file, ftruncates to the full size, and then does copy_file_range in 1M chunks. Because Windows doesn't see compressed records (only ZFS does) this scales to the decompressed size of the file (LSIZE, not ASIZE) - so for a 10G logically sized file, you pay that latency cost 10000x.

So, that's why this happens.
 
Joined
Oct 22, 2019
Messages
3,641
but when Windows does a server-side copy over SMB, it creates the new file, ftruncates to the full size, and then does copy_file_range in 1M chunks. Because Windows doesn't see compressed records (only ZFS does) this scales to the decompressed size of the file (LSIZE, not ASIZE) - so for a 10G logically sized file, you pay that latency cost 10000x.
I know you're referring to "Windows" here, but does this supposedly apply to any client OS connected via SMB? Or is this a particular behavior that is unique to Windows?


EDIT: But then again, I could not reproduce this myself. Everything is set up practically the same, except for the difference of mirror vs RAIDZ.
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
is this a particular behavior that is unique to Windows?
Appears to be this part. Linux clients seem to be closer to the behavior of "just copy_file_range the whole thing in one go" which as mentioned only pays the syscall entry-cost once.

Underlying vdev speed does lightly impact things, so mirrors might improve over RAIDZ, but might need a larger file to tease out the differences. Could smash a few copies of that sample AVID video together and see if that exacerbates things?
 
Joined
Oct 22, 2019
Messages
3,641
Could smash a few copies of that sample AVID video together and see if that exacerbates things?
You mean appending the video to itself multiple times (without re-encoding) so that it's an even bigger, massive (highly compressible) video file?
 
Joined
Oct 22, 2019
Messages
3,641
Created a highly compressible file from the sample AVID video, by appending it to itself multiple times, where the new file is 12.4 GB large (yet only consumes 400 MB on my ZSTD dataset).

I copied it via SMB to the same dataset (SSD pool), and it once again happened relatively quickly, in a matter of seconds. (Fairly sustained speeds as well.)

Praise be to mirror vdevs? Or maybe I'm just lucky.

Reminder: TrueNAS Core 13.0-U6
 
Last edited:

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Created a highly compressible file from the sample AVID video, by appending it to itself multiple times, where the new file is 12.4 GB large (yet only consumes 400 MB on my ZSTD dataset).

I copied it via SMB to the same dataset (SSD pool), and it once again happened relatively quickly, in a matter of seconds. (Fairly sustained speeds as well.)

Praise be to mirror vdevs? Or maybe I'm just lucky.
If you're morbidly curious you can do a copy_file_range syscall latency histogram using dtrace under a variety of scenarios. It's not particularly hard.
 
Joined
Oct 22, 2019
Messages
3,641
If you're morbidly curious you can fuðseg tþeisþoad dohs wmne toaere tat lroik cłþo ywriðûpaþ ieþnt hya egud ohl wahw eenhâsi horemao heyras sþevd aesso do geoeðeoa ir osofhew çsie taaþawi. It's not particularly hard.
What?
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
dtrace script latency.d ... just for a start ...

Code:
#!/usr/sbin/dtrace -s
 
syscall::copy_file_range:entry
{
  self->t = timestamp;
}
 
syscall::copy_file_range:return
{
  @latencies = quantize(timestamp - self->t);
}


Execute with something like

Code:
$ dtrace -s latency.d


Again ... just for a start.

Edith yells at me: Under /usr/share/dtrace/ are some great (and much better) examples, what can be done.
 
Last edited:

jperham-ct

Cadet
Joined
Feb 23, 2022
Messages
5
Sorry for the late replies on this..an update on the issue I was having.

I migrated to TrueNAS Scale as suggested which seemed to fix the issue for the most part. There was still some speed issues when it came to transferring files which was inconsistently occurring.

After about six hours, I started getting error messages in the console pointing to some type of hardware issue:

Code:
Nov 16 18:19:48 fs3 kernel: mce: [Hardware Error]: Machine check events logged
Nov 16 18:19:48 fs3 kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_MC#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xb7db3f offset:0xc00 grain:32 syndrome:0x0 -  err_code:0x0101:0x0090 ProcessorSocketId:0x0 MemoryControllerId:0x0 PhysicalRankId:0x1 Row:0xafb9 Column:0x3f0 Bank:0x1 BankGroup:0x2 retry_rd_err_log[0001a20d 00000000 01000000 04fc604d 0000afb9] correrrcnt[0000 0001 0000 0000 0000 0000 0000 0000])
Nov 16 18:19:48 fs3 kernel: mce: [Hardware Error]: Machine check events logged
Nov 16 18:19:48 fs3 kernel: EDAC MC0: 1 CE memory scrubbing error on CPU_SrcID#0_MC#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xb7db3a offset:0xc80 grain:32 syndrome:0x0 -  err_code:0x0008:0x00c0 ProcessorSocketId:0x0 MemoryControllerId:0x0 PhysicalRankId:0x1 Row:0xafb9 Column:0x2b8 Bank:0x1 BankGroup:0x2 retry_rd_err_log[0001a20d 00000000 01000000 04fc604d 0000afb9] correrrcnt[0000 0001 0000 0000 0000 0000 0000 0000])
Nov 16 18:24:54 fs3 kernel: mce: [Hardware Error]: Machine check events logged
Nov 16 18:24:54 fs3 kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_MC#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xb7db3e offset:0x480 grain:32 syndrome:0x0 -  err_code:0x0101:0x0090 ProcessorSocketId:0x0 MemoryControllerId:0x0 PhysicalRankId:0x1 Row:0xafb9 Column:0x398 Bank:0x1 BankGroup:0x2 retry_rd_err_log[0001a20d 00000000 01000000 04fc604d 0000afb9] correrrcnt[0000 0003 0000 0000 0000 0000 0000 0000])
Nov 16 18:24:54 fs3 kernel: mce: [Hardware Error]: Machine check events logged
Nov 16 18:24:55 fs3 kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_MC#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xb7db3f offset:0xc00 grain:32 syndrome:0x0 -  err_code:0x0101:0x0090 ProcessorSocketId:0x0 MemoryControllerId:0x0 PhysicalRankId:0x1 Row:0xafb9 Column:0x3f0 Bank:0x1 BankGroup:0x2 retry_rd_err_log[0001a20d 00000000 01000000 04fc604d 0000afb9] correrrcnt[0000 0003 0000 0000 0000 0000 0000 0000])
Nov 16 18:30:11 fs3 kernel: mce: [Hardware Error]: Machine check events logged
Nov 16 18:30:11 fs3 kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_MC#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xb7db3f offset:0x80 grain:32 syndrome:0x0 -  err_code:0x0101:0x0090 ProcessorSocketId:0x0 MemoryControllerId:0x0 PhysicalRankId:0x1 Row:0xafb9 Column:0x3c8 Bank:0x1 BankGroup:0x2 retry_rd_err_log[0001a20d 00000000 10000000 04f2604d 0000afb9] correrrcnt[0000 0003 0000 0000 0000 0000 0000 0000]


The hardware (motherboard, PCIe cards, RAM, and CPU) was replaced by support and that seems to have resolved the issue fully. I suspect Core wasn't detecting these hardware issues (or not reporting it as an issue I guess) and that was the entire root cause of this issue.

I appreciate everybody looking into it.
 
Last edited:
Top