Locating the cause of a high %b (%util)

feenberg · Apr 19, 2017

Usually our TrueNAS and FreeNAS servers perform well with the %b column from "iostat -x"showing hundreds of megabytes/second read or written with the %b ( %busy or same as %util in Linux) at only a couple of percent for each disk. But every few months performance goes to hell, with total throughput only 1 or 2 mbs and %b for group of disks at 99% or 100%. While this is happening a simple ls can take 5 minutes. I assume this is because a client is doing a lot of random I/O that keeps the heads moving for very little data transfer. How do I locate that job among the many jobs from many users on many nfs clients? On the client computer I can find out how many bytes are transferred by each process, but that number is small for all jobs - the one doing random I/O doesn't get more bytes than the jobs doing sequential I/O, it just exercises the heads more. I need this information to contact the user doing random I/O and work with them to do something else.

thanks
dan feenberg
NBER

Ericloewe · Apr 19, 2017

I have the feeling this might be a job for dtrace, but I have no experience whatsoever using it. I'd recommend bringing it up with your TrueNAS support person.

feenberg · Apr 19, 2017

I can add that qlen goes up to ranges of 5 to 20.

My impression of Dtrace is that it would require real expertise.

Important Announcement for the TrueNAS Community.

Locating the cause of a high %b (%util)

feenberg

Cadet

Ericloewe

Server Wrangler

feenberg

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

Locating the cause of a high %b (%util)

feenberg

Cadet

Ericloewe

Server Wrangler

feenberg

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Locating the cause of a high %b (%util)"

Similar threads