(Title Update) nfstd 100% cause bottleneck

iliak

Contributor
Joined
Dec 18, 2018
Messages
148
Title updated due to latest discovery Post7 on this thread

I think i reached a bottleneck, due to ram limits,
i have relative intensive read operations each time it reads approx of 200-500MBs for 10 minutes which is approx 200GB (data is not repeated, it is 200GB of unique data)

the flow after fresh restart:
at the first 2-4 minutes when the ram in not fully allocated (i see the chart of ram usage in the dashboard ) it reads fast and stable
but when the ram is getting full there are a lot of io delays reported by the consumers of the data,

do you think it is the ram?
 
Last edited:

iliak

Contributor
Joined
Dec 18, 2018
Messages
148
I ordered 256gb ram. it should arrive next week. I'll give an update then.
 

iliak

Contributor
Joined
Dec 18, 2018
Messages
148
i just upgraded to 256GB ram, allocated ram is stable between 150-200GB depend on the execution load there are 20% improvement, still trying to figure out there are still lots of io delays ( thuput should be approx 3 GB read, but it peaked at 1.2GB)

how i can test what caused the issue ( i think the limit comes from the freenas server)
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Well, let's start with the basics. Please detail EVERY component in your server. Also include full details of how your storage pool is configured.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
We also need to know exactly what the workload is. You bumping against a few maximums and if your IO pattern is not perfectly sequential, you not going to to get close to 3GB/s or 24Gb/s. We would also need to look at hardware interrupts and that not something I'm knowledgeable about.

(data is not repeated, it is 200GB of unique data)
Unless that 200 GB is read and re-read, there is little to no point in adding RAM. The RAM typically helps in the form of read cache. If you don't read it again, there's no point in caching it.
 

iliak

Contributor
Joined
Dec 18, 2018
Messages
148
Well, let's start with the basics. Please detail EVERY component in your server. Also include full details of how your storage pool is configured.
the exec hardware in my signature
  • main pool 8 ssds in z2 800GB each
    • data set - 500GB used for container with low read access
    • data set 150GB have no access\use on my tests
    • data set 3.2TB i tried with sync disabled, - this is the pool that the data is stored in


We also need to know exactly what the workload is. You bumping against a few maximums and if your IO pattern is not perfectly sequential, you not going to to get close to 3GB/s or 24Gb/s. We would also need to look at hardware interrupts and that not something I'm knowledgeable about.
sequential or not, in average approx 10-30 simultaneous clients try to read data on ssds, and i cannot understand where is the bottleneck, it is not the ssds iops or throughput, and not the network, and not the cpu (under 5% all the time )

Unless that 200 GB is read and re-read, there is little to no point in adding RAM. The RAM typically helps in the form of read cache. If you don't read it again, there's no point in caching it.
i know but each scenario uses the same data,, and sometimes it is the same one,,


i have found something: running
Code:
top -m io -o total

freenas.png


and
Code:
iostat -w 1

freenass2.png
 
Last edited:

iliak

Contributor
Joined
Dec 18, 2018
Messages
148
anyone got an idea? how i can further investigate the reason ?
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Just because nfsd is responsible for almost 100% on your total IO does not make it a bottle neck. Look at your disks... Their only doing about 20MB/s... Look at the disk busy percent in the GUI. RAIDz2 is awful for random IO performance
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Also when you take screenshots, including the column headers... I have not memorized the columns.
 

iliak

Contributor
Joined
Dec 18, 2018
Messages
148
Just because nfsd is responsible for almost 100% on your total IO does not make it a bottle neck. Look at your disks... Their only doing about 20MB/s... Look at the disk busy percent in the GUI. RAIDz2 is awful for random IO performance

this is what i found 8 ssds with similar values

1573660962782.png



1573661086600.png


zfs info
1573661419127.png

1573661443773.png
 

Attachments

  • 1573661409956.png
    1573661409956.png
    79.4 KB · Views: 285
Top