Locating the cause of a high %b (%util)

Status
Not open for further replies.

feenberg

Cadet
Joined
Aug 7, 2015
Messages
9
Usually our TrueNAS and FreeNAS servers perform well with the %b column from "iostat -x"showing hundreds of megabytes/second read or written with the %b ( %busy or same as %util in Linux) at only a couple of percent for each disk. But every few months performance goes to hell, with total throughput only 1 or 2 mbs and %b for group of disks at 99% or 100%. While this is happening a simple ls can take 5 minutes. I assume this is because a client is doing a lot of random I/O that keeps the heads moving for very little data transfer. How do I locate that job among the many jobs from many users on many nfs clients? On the client computer I can find out how many bytes are transferred by each process, but that number is small for all jobs - the one doing random I/O doesn't get more bytes than the jobs doing sequential I/O, it just exercises the heads more. I need this information to contact the user doing random I/O and work with them to do something else.

thanks
dan feenberg
NBER
 
Last edited by a moderator:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I have the feeling this might be a job for dtrace, but I have no experience whatsoever using it. I'd recommend bringing it up with your TrueNAS support person.
 

feenberg

Cadet
Joined
Aug 7, 2015
Messages
9
I can add that qlen goes up to ranges of 5 to 20.

My impression of Dtrace is that it would require real expertise.
 
Status
Not open for further replies.
Top