Slow SMB - Lost - Need direction

Daisuke · Jan 29, 2023

jgreco said:
However, there is increased parallelism available with the mirror configuration; each vdev can be serving two different requests simultaneously. This is substantially better than RAIDZ which is optimized towards large file/single access.

@vexter0944 as @jgreco mentioned very well into his post, it depends on your usage. Since you’ve set your recordsize to 1M, that tells me you deal with large media files, where a raidz2 will be very beneficial.

vexter0944 · Jan 29, 2023

Considering I've already cut over - not sure how I could test if that would help me or not at this point unfortunately. I'm open to whatever makes the most sense for me as I'm not familiar with ZFS and am new to it. My nas is almost 100% dedicted to media files for nvidia shields within the house. I do keep documents, photos etc - but access of those is minimal.

Daisuke · Jan 29, 2023

@vexter0944 you have a pool of 4 mirrors. Beside the neglectable speed differences, RaidZ2 will give you an additional 20TB of usable disk space. @jgreco is very experienced and he will be able to confirm the use of mirrors with large data files is marginally beneficial with very slightly performance increase, versus the advantage of gained capacity. I'm always using multiples of 12 disk RaidZ2 VDEVs and these are my fio results for 128k recordsize dataset. (1087MB/s-1135MB/s) translates to (8.69Gbit/s-9.08Gbit/s), on a 10Gbit network setup. You will get better results with 1M recordsize, since you use files larger than 5MB. Is all explained into my guide, see Pools and Datasets section. To me that's a very acceptable speed compromise, versus the massive storage gains and data maintenance ease of RaidZ2.

I'm going to highlight one important detail from guide, related to recordsize. You cannot just change it on an existing dataset full of files. You need to create a new dataset and move the files from old dataset to new dataset. Read all related details into guide.

Note that changing the recordsize affects only newly created files, existing files are unaffected. Therefore, is important to move your data to a new dataset with the correct recordsize.

vexter0944 · Jan 29, 2023

Daisuke said:
@vexter0944 you have a pool of 4 mirrors. Beside the neglectable speed differences, RaidZ2 will give you an additional 20TB of usable disk space. @jgreco is very experienced and he will be able to confirm the use of mirrors with large data files is marginally beneficial with very slightly performance increase, versus the advantage of gained capacity. I'm always using multiples of 12 disk RaidZ2 VDEVs and these are my fio results for 128k recordsize dataset. (1087MB/s-1135MB/s) translates to (8.69Gbit/s-9.08Gbit/s), on a 10Gbit network setup. You will get better results with 1M recordsize, since you use files larger than 5MB. Is all explained into my guide, see Pools and Datasets section. To me that's a very acceptable speed compromise, versus the massive storage gains and data maintenance ease of RaidZ2.

I'm going to highlight one important detail from guide, related to recordsize. You cannot just change it on an existing dataset full of files. You need to create a new dataset and move the files from old dataset to new dataset. Read all related details into guide.

yes - I made sure the dataset was set to 1M before moving the data over form the QNAP. So it should be correct - I left my photos etc on another dataset with 128K.

Davvo · Jan 29, 2023

Daisuke said:
I'm going to highlight one important detail from guide, related to recordsize. You cannot just change it on an existing dataset full of files. You need to create a new dataset and move the files from old dataset to new dataset. Read all related details into guide.

He could also change the property and then run the rebalancing script, without the need to create another dataset.
Not that's reallymatters if he's gonna change the pool layout.

Daisuke · Jan 29, 2023

Davvo said:
He could also change the property and then run the rebalancing script

I understand, my point is, he is using Scale to serve media files, where storage capacity is way more important than a very slight performance increase.

Davvo · Jan 29, 2023

Daisuke said:
I understand, my point is, he is using Scale to serve media files, where storage capacity is way more important than a very slight performance increase.

Sure, but that's another point? I just pointed out that he can skip the extra step of new dataset and moving if he plans of rebalancing his pool.

And anyway, the reason for my suggestion of using mirrors is because he stressed on the performance.

vexter0944 said:
I'm looking for performance overall with at least 'some' kind of fault tolerance. [...] But yes - I'd like to leverage as much of my newly installed 10gbe network as possible.

Btw, your guides about SCALE are impressive.

vexter0944 · Feb 1, 2023

@Davvo @jgreco

how long should that script take? Been stuck on this part for almost 2 days...that's more than 1136 min :) I do have the rest of the results of this as well. This test just seems to be stuck.

vexter0944 · Feb 1, 2023

got tired of waiting and quit - but here's some of the data @jgreco @Davvo - side note - when I ran it the other day from the truenas shell the 1st time - the --SLOW-- and ++FAST++ on sdh was not there in those results.

Disk Disk Size MB/sec %ofAvg
------- ---------- ------ ------
sda 12000138MB 206 104
sdb 12000138MB 204 103
sdc 12000138MB 204 104
sdd 12000138MB 203 103
sde 12000138MB 196 99
sdf 12000138MB 203 103
sdg 12000138MB 196 99
sdh 12000138MB 165 84 --SLOW--

This next test attempts to read all devices in parallel. This is
primarily a stress test of your disk controller, but may also find
limits in your PCIe bus, SAS expander topology, etc. Ideally, if
all of your disks are of the same type and connected the same way,
then all of your disks should be able to read their contents in
about the same amount of time. Results that are unusually slow or
unusually fast may be tagged as such. It is up to you to decide if
there is something wrong.

Performing initial parallel array read
Sat Jan 28 20:43:51 CST 2023
The disk sda appears to be 12000138 MB.
Disk is reading at about 192 MB/sec
This suggests that this pass may take around 1039 minutes

Serial Parall % of
Disk Disk Size MB/sec MB/sec Serial
------- ---------- ------ ------ ------
sda 12000138MB 206 200 97
sdb 12000138MB 204 202 99
sdc 12000138MB 204 203 99
sdd 12000138MB 203 202 99
sde 12000138MB 196 194 99
sdf 12000138MB 203 202 99
sdg 12000138MB 196 193 99
sdh 12000138MB 165 193 117 ++FAST++

Awaiting completion: initial parallel array read
Sun Jan 29 14:45:32 CST 2023
Completed: initial parallel array read

Disk's average time is 62092 seconds per disk

Disk Bytes Transferred Seconds %ofAvg
------- ----------------- ------- ------
sda 12000138625024 60634 98
sdb 12000138625024 60626 98
sdc 12000138625024 60596 98
sdd 12000138625024 60914 98
sde 12000138625024 64321 104
sdf 12000138625024 60778 98
sdg 12000138625024 63963 103
sdh 12000138625024 64901 105

This next test attempts to read all devices while forcing seeks.
This is primarily a stress test of your hard disks. It does thhis
by running several simultaneous dd sessions on each disk.

Performing initial parallel seek-stress array read
Sun Jan 29 14:45:32 CST 2023
The disk sda appears to be 12000138 MB.
Disk is reading at about 176 MB/sec
This suggests that this pass may take around 1136 minutes

Serial Parall % of
Disk Disk Size MB/sec MB/sec Serial
------- ---------- ------ ------ ------
sda 12000138MB 206 174 85
sdb 12000138MB 204 174 86
sdc 12000138MB 204 175 86
sdd 12000138MB 203 175 86
sde 12000138MB 196 168 86
sdf 12000138MB 203 174 86
sdg 12000138MB 196 168 86
sdh 12000138MB 165 168 102

Awaiting completion: initial parallel seek-stress array read

jgreco · Feb 1, 2023

vexter0944 said:
how long should that script take? Been stuck on this part for almost 2 days...that's more than 1136 min :) I do have the rest of the results of this as well. This test just seems to be stuck.

It's not stuck. However, it can take a very long time. Some of the time estimations on Linux seem to be off a bit for reasons I haven't investigated in depth. The seek-stress test involves launching several dd sessions in parallel but spaced apart by a small amount of time; the idea is to force the heads to do a lot of seeking to complete the job. Accurate seeking is incredibly important for hard drives, and this is one of those things where you want the drive to fault out now if it is going to fault out at some point.

vexter0944 · Feb 1, 2023

jgreco said:
It's not stuck. However, it can take a very long time. Some of the time estimations on Linux seem to be off a bit for reasons I haven't investigated in depth. The seek-stress test involves launching several dd sessions in parallel but spaced apart by a small amount of time; the idea is to force the heads to do a lot of seeking to complete the job. Accurate seeking is incredibly important for hard drives, and this is one of those things where you want the drive to fault out now if it is going to fault out at some point.

Gotcha - just seemed off and I understand what your saying about time being off etc. It's stopped now and I'm not going to rerun it this minute.

Is there anything in the data that was output that shows why I seem to cap at about 400-500MBs?

jgreco · Feb 1, 2023

vexter0944 said:
Is there anything in the data that was output that shows why I seem to cap at about 400-500MBs?

Locally on the NAS itself? That would seem slow. Over SMB? Might be reasonable. I'm afraid I've lost track of exactly what your current setup is. Make sure Samba isn't maxxing out CPU; this has historically been a terrible performance problem, which can maybe be solved with multichannel support, which I really don't know much about. There are also some ideas for TCP tuning tweaks that I posted recently in the Resources section, though I haven't gotten around to a SCALE-oriented version of that just yet. If you are doing mirror pairs for eight drives that bench at 170MBytes/sec, then theoretically with four vdevs, that's more than 600MBytes/sec using just one side of the mirrors. I am usually a little skeptical about ZFS actually making good use of both sides of the mirror for a single consumer, so I'm not looking at the optimistic 1300MBytes/sec you could theoretically get if everything was racing along at full speed. But I do feel like 400-500MBytes/sec is a little low, so we may be missing some tuning opportunities somewhere, or some constraint like Samba that's dragging it down.

vexter0944 · Feb 1, 2023

jgreco said:
Locally on the NAS itself? That would seem slow. Over SMB? Might be reasonable. I'm afraid I've lost track of exactly what your current setup is. Make sure Samba isn't maxxing out CPU; this has historically been a terrible performance problem, which can maybe be solved with multichannel support, which I really don't know much about. There are also some ideas for TCP tuning tweaks that I posted recently in the Resources section, though I haven't gotten around to a SCALE-oriented version of that just yet. If you are doing mirror pairs for eight drives that bench at 170MBytes/sec, then theoretically with four vdevs, that's more than 600MBytes/sec using just one side of the mirrors. I am usually a little skeptical about ZFS actually making good use of both sides of the mirror for a single consumer, so I'm not looking at the optimistic 1300MBytes/sec you could theoretically get if everything was racing along at full speed. But I do feel like 400-500MBytes/sec is a little low, so we may be missing some tuning opportunities somewhere, or some constraint like Samba that's dragging it down.

That's right where I'm at - I don't feel like 400-500MB/s is right for the setup I have. the cpu 'seems' ok - here's a screenshot of me copying a 30gb blu ray file from the nas to the emby windows 10 box - cpu hit like 20% - this really feels like some kind of tuning around SMB or similar to me.

this post intrigues me - https://www.reddit.com/r/truenas/co.../?utm_source=share&utm_medium=web2x&context=3 - esp this part:

jgreco · Feb 1, 2023

vexter0944 said:
That's right where I'm at - I don't feel like 400-500MB/s is right for the setup I have. the cpu 'seems' ok - here's a screenshot of me copying a 30gb blu ray file from the nas to the emby windows 10 box - cpu hit like 20% - this really feels like some kind of tuning around SMB or similar to me.

If I may be blunt, that's not particularly useful because your CPU has multiple cores, and knowing that some dumb middleware has added the utilization of all the cores together and then divided by the number of cores to give you the "20%" answer, that's next to useless. If you had a 5 core CPU, that might be one CPU running at 100% and four more at 0%.

You can get a better idea of what's going on by going to the console and running "top". Here's an example from an older forum posting:

Code:

last pid: 44414;  load averages:  2.06,  2.15,  2.13                                        up 72+17:39:41  02:09:58
136 processes: 2 running, 134 sleeping
CPU:  1.6% user,  0.0% nice,  5.8% system,  2.0% interrupt, 90.6% idle
Mem: 510M Active, 8206M Inact, 318M Laundry, 169G Wired, 9405M Free
ARC: 146G Total, 37G MFU, 96G MRU, 581M Anon, 3509M Header, 8896M Other
     110G Compressed, 208G Uncompressed, 1.89:1 Ratio
Swap: 38G Total, 38G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
 3760 root         24  20    0    11M  2728K zfsvfs   9 741.1H  45.79% nfsd
41442 root          1  83    0    24M    14M CPU17   17  23:49  43.31% ssh
40561 root          1  25    0   476M   446M kqread   1  54.0H   9.16% smbd
 1473 root         14  20    0  1017M   950M usem     6 639:15   8.48% python3.9
 1350 root         37  20    0  1071M   924M kqread  21  27.0H   8.28% python3.9
41443 root          5  22    0    19M  7892K tq_dra  22   9:19   7.08% zfs
 3792 root          1  20    0    83M    24M zfsvfs   8  21.5H   2.19% rpc.lockd
87859 root          1  20    0   258M   225M kqread   2  31:38   0.78% smbd
 3356    556       88  44    0    33G  4617M uwait   15 191:00   0.17% java
44396 root          1  20    0    14M  4268K CPU10   10   0:00   0.09% top
 3249 root          1  20    0   258M   225M kqread   3   1:46   0.06% smbd
 2374 root          1 -52   r0    11M    11M nanslp  18  11:43   0.04% watchdogd
87903 root          1  20    0   259M   226M kqread  20  59:45   0.02% smbd
81500 root          1  20    0   265M   228M kqread   9 241:17   0.02% smbd
37552 root          1  20    0    11M  2816K pause   18   0:04   0.01% iostat
80890 www           1  20    0    37M    10M kqread  13   0:07   0.01% nginx
 3748 root          1  20    0    84M    25M select  16   8:08   0.01% mountd
 9460 root          1  20    0    28M    14M select  13   0:35   0.01% sshd

where you can see that nfsd is running at 45% of a CPU core. I don't know what "blacklisting" might have happened with Samba, but it sounds like a real Reddit thing to be trying to do such tuning by changing settings on the Samba daemon. Don't, it's probably a bad idea and it doesn't fix other TCP services. Check out my resource for tuning in the Resources section. Read it for comprehension, then check out our friends over at ES.net who always have good high performance computing information, and maybe try to merge their generalized tuning with what I've suggested.

Linux Tuning

This page contains a quick reference guide for Linux tuning.

fasterdata.es.net

I will eventually work out some good TrueNAS-specific tuning advice but I just don't have it yet. Sorry.

vexter0944 · Feb 1, 2023

jgreco said:
If I may be blunt, that's not particularly useful because your CPU has multiple cores, and knowing that some dumb middleware has added the utilization of all the cores together and then divided by the number of cores to give you the "20%" answer, that's next to useless. If you had a 5 core CPU, that might be one CPU running at 100% and four more at 0%.

You can get a better idea of what's going on by going to the console and running "top". Here's an example from an older forum posting:

Code:
last pid: 44414; load averages: 2.06, 2.15, 2.13 up 72+17:39:41 02:09:58 136 processes: 2 running, 134 sleeping CPU: 1.6% user, 0.0% nice, 5.8% system, 2.0% interrupt, 90.6% idle Mem: 510M Active, 8206M Inact, 318M Laundry, 169G Wired, 9405M Free ARC: 146G Total, 37G MFU, 96G MRU, 581M Anon, 3509M Header, 8896M Other 110G Compressed, 208G Uncompressed, 1.89:1 Ratio Swap: 38G Total, 38G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 3760 root 24 20 0 11M 2728K zfsvfs 9 741.1H 45.79% nfsd 41442 root 1 83 0 24M 14M CPU17 17 23:49 43.31% ssh 40561 root 1 25 0 476M 446M kqread 1 54.0H 9.16% smbd 1473 root 14 20 0 1017M 950M usem 6 639:15 8.48% python3.9 1350 root 37 20 0 1071M 924M kqread 21 27.0H 8.28% python3.9 41443 root 5 22 0 19M 7892K tq_dra 22 9:19 7.08% zfs 3792 root 1 20 0 83M 24M zfsvfs 8 21.5H 2.19% rpc.lockd 87859 root 1 20 0 258M 225M kqread 2 31:38 0.78% smbd 3356 556 88 44 0 33G 4617M uwait 15 191:00 0.17% java 44396 root 1 20 0 14M 4268K CPU10 10 0:00 0.09% top 3249 root 1 20 0 258M 225M kqread 3 1:46 0.06% smbd 2374 root 1 -52 r0 11M 11M nanslp 18 11:43 0.04% watchdogd 87903 root 1 20 0 259M 226M kqread 20 59:45 0.02% smbd 81500 root 1 20 0 265M 228M kqread 9 241:17 0.02% smbd 37552 root 1 20 0 11M 2816K pause 18 0:04 0.01% iostat 80890 www 1 20 0 37M 10M kqread 13 0:07 0.01% nginx 3748 root 1 20 0 84M 25M select 16 8:08 0.01% mountd 9460 root 1 20 0 28M 14M select 13 0:35 0.01% sshd

where you can see that nfsd is running at 45% of a CPU core. I don't know what "blacklisting" might have happened with Samba, but it sounds like a real Reddit thing to be trying to do such tuning by changing settings on the Samba daemon. Don't, it's probably a bad idea and it doesn't fix other TCP services. Check out my resource for tuning in the Resources section. Read it for comprehension, then check out our friends over at ES.net who always have good high performance computing information, and maybe try to merge their generalized tuning with what I've suggested.

Linux Tuning

This page contains a quick reference guide for Linux tuning.

fasterdata.es.net

I will eventually work out some good TrueNAS-specific tuning advice but I just don't have it yet. Sorry.

Here's a top while running hte same copy - looks like about 45% to 50% usage -

jgreco · Feb 1, 2023

vexter0944 said:
Here's a top while running hte same copy - looks like about 45% to 50% usage -
View attachment 63224

So that's really good. The answers are more difficult if you are topping out CPU cores. I would suggest looking at the TCP tuning stuff.

vexter0944 · Feb 2, 2023

jgreco said:
So that's really good. The answers are more difficult if you are topping out CPU cores. I would suggest looking at the TCP tuning stuff.

ok - tcp tuning reading here I come! lol

last question @jgreco - where in the resources is the " There are also some ideas for TCP tuning tweaks that I posted recently in the Resources section, though I haven't gotten around to a SCALE-oriented version of that just yet" you mentioned - I might go read it for ideas too.

jgreco · Feb 2, 2023

Click "Resources", it's the first one in the list since it's the newest resource.

vexter0944 · Feb 2, 2023

jgreco said:
Click "Resources", it's the first one in the list since it's the newest resource.

Thank you!

im.thatoneguy · Feb 7, 2023

I ran all of the 10Gbe TCP tuning and it seemed to offer a little bit of help.

Tuning 10Gb NICs highway to hell

If you are trying to achieve a maximum performance with 10Gb or 40Gb NICs in RHEL or similar prepare yourself to a battle. This article is for experienced users, don’t mess up with default k…

darksideclouds.wordpress.com

Important Announcement for the TrueNAS Community.

Slow SMB - Lost - Need direction

Contributor

Dabbler

Contributor

Dabbler

MVP

Contributor

MVP

Dabbler

Dabbler

Resident Grinch

Dabbler

Resident Grinch

Dabbler

Resident Grinch

Dabbler

Resident Grinch

Dabbler

Resident Grinch

Dabbler

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Slow SMB - Lost - Need direction"

Similar threads