IO not balanced across VDEVS

naqu3 · Apr 3, 2018

Hi all,

Backdrop:
Multiple Freenas systems (inherited) running on Supermicro SSG-6048R-E1CR24L Storage Servers with 128G of ECC RAM. They are configured with 3 Raidz2 VDEVS of 7, 6TB disks each.

OS:
Running on DOM mirror

Storage:
3 vdevs of raidz2, each with 7 * 6TB disks
3 - 1.8TB SSDs for L2ARC
2 - 240G SSDs for Log

Main use is virtualization, with the bulk of the connections over iSCSI.

Note: The systems are not in a good shape (setup/configuration), with auto-tune enabled, Raidz2 instead of mirroring, 1:many iscsi extent mapping.. In the near future, data will be moved to alternate locations and the storage re-installed and configured as per recommendations (Mirroring, no auto-tune, 1:1 iSCSI, etc)

The main problem at the moment is that seemingly at random, one of the vdevs gets hammered, and all services using the system grind to a halt.

When this happens, using gstat, I can see the particular vdev disks in the red

I have not yet included furtner details, but would appreciate any pointers on where to start looking.

Cheers!

Nich

MatthewSteinhoff · Apr 3, 2018

Here's my guess knowing absolutely nothing...

The VDEVs weren't all added at the same time. The first VDEV of seven drive was installed and the pool got full. The second VDEV of seven drives was added to the pool and that, too, filled. Finally, the third VDEV was added to the pool. While ZFS is pretty good about balancing data across all VDEVs, it isn't perfect and the closer you were to full before adding the next VDEV, the longer it takes to balance IO across all VDEVs in the pool.

So, for reasons of chance, you probably have some data set that exists only (or primarily) on one VDEV. Maybe it is a mostly-read set which is why it hasn't been redistributed? When that data is accessed, it comes all from one VDEV and performance grinds to a halt?

If you can identify the clump of data that is not distributed across all the VDEVs, you could copy it to distribute across the VDEVs then delete the original. Maybe.

Cheers,
Matt

toadman · Apr 3, 2018

That would be my initial guess as well.

naqu3 · Apr 3, 2018

Hi Matthew & toadman,

Many thanks for the response. I think all vdevs were configured at once, but I cannot be 100% sure I will try to ask around.
I did try to find a tool to track what is written to which disks, at least to try to localise the source, if indeed there is a differentiator. So far, that has not been succesfull.

Indeed, since zfs re balances, it should resolve itself on the writing of new data.

Nich

ps: Unfortunately I cannot seem to upload images at the moment (742 x 489, 138k)

MatthewSteinhoff · Apr 4, 2018

naqu3 said:
The main problem at the moment is that seemingly at random, one of the vdevs gets hammered,

Take a hard look at the SMART data from the drives in that VDEV. It could be you have a drive failing in a less than complete way.

Cheers,
Matt

Important Announcement for the TrueNAS Community.

IO not balanced across VDEVS

naqu3

Cadet

MatthewSteinhoff

Guru

toadman

Guru

naqu3

Cadet

MatthewSteinhoff

Guru

Similar threads

Important Announcement for the TrueNAS Community.

IO not balanced across VDEVS

naqu3

Cadet

MatthewSteinhoff

Guru

toadman

Guru

naqu3

Cadet

MatthewSteinhoff

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "IO not balanced across VDEVS"

Similar threads