Small writes on *ALL* pools every 5 seconds (appears to be ESXi/iSCSI related)

Status
Not open for further replies.

xyzzy

Explorer
Joined
Jan 26, 2016
Messages
76
I was watching my pools this morning with "zpool iostat -v 1" and noticed that every 5 seconds small writes are happening to all pools.

This does not appear to be the "system dataset" thing that comes up in the forums every once in a while.

All 3 pools are used as ESXi 6.5 datastores that are accessed via iSCSI over 10G Ethernet. I'm seeing this activity even when all VMs (including VCSA) are paused or shut down. The only time it stops is when I power down the ESXi host. When I restart the ESXi host, the "every 5 seconds" activity starts about mid-way through the "yellow screen" boot cycle.

I've confirmed that all 3 datastores have VMware's Storage I/O Control disabled.

Any idea what's causing this?

Thanks in advance!

(System: FreeNAS 11.0-U3 on Supermicro X9SRE-F with E5-1650v2, 128GB RAM, mirrored USB flash for boot disk, 3 pools, each pool has 1-2 SLOG SSDs)
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
If you have any kind of HA, each ESXi host will "Heartbeat" to the storage. This is to determine if an non responding host is totally dead or just the management network. With this information we can decide to reboot the VMs that were on the host or just let them keep running in the hope that the network the VM is on it still up on that host.

It could also be the ESXi logs flushing to disk depending on how this is setup.
 

xyzzy

Explorer
Joined
Jan 26, 2016
Messages
76
I only have a single ESXi host so I doubt its HA. However, I've just learned about ATS heartbeating and am beginning to think this might be it.

HOWEVER, this morning, I'm seeing the "every 5 second" pattern only on 2 datastores. The only one that's all-SSD is only doing the small writes every once in a while now. I'm not sure why. (I'm observing all of this with all VMs paused or shut down).
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
It could still just be a noop sort of thing vmware does. You may be able to use esxtop to find the process that is writing to disk.
 

xyzzy

Explorer
Joined
Jan 26, 2016
Messages
76
That's what I'm thinking but I haven't quite figured out how to use esxtop to track disk activity on a per process basis. Any tips on how I do that? (I tried Googling for this but was not successful.)
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
I did a little research and fond a few write-ups including this. It sounds just like performance logging. You could try what they did and put your host in maintenance mode and kill hostd to see if it stops.
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
Don't forget that the txg timeout is 5 seconds, so the ESX write activity could be more erratic than that. You'd just need a write in the txg timeout window to trigger a flush at 5 seconds.

Also, do you have atime enabled on the volumes? if so, I'm not sure if there's anything special in place to control atime updates on the ISCSI target files.. if not, it might be something in a FreeNas layer updating an atime..

In any case, does it really matter? You're using these volumes for ISCSI to an ESX box, so there's a potential for I/O at any time. A small I/O event isn't going to have any measurable effect on your performance.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Don't forget that the txg timeout is 5 seconds, so the ESX write activity could be more erratic than that. You'd just need a write in the txg timeout window to trigger a flush at 5 seconds.

Also, do you have atime enabled on the volumes? if so, I'm not sure if there's anything special in place to control atime updates on the ISCSI target files.. if not, it might be something in a FreeNas layer updating an atime..

In any case, does it really matter? You're using these volumes for ISCSI to an ESX box, so there's a potential for I/O at any time. A small I/O event isn't going to have any measurable effect on your performance.
Yeah a few kb here and there are not going to be noticable but from an academic standpoint it's nice to understand the services running and how they utilise resources.
 

xyzzy

Explorer
Joined
Jan 26, 2016
Messages
76
I did a little research and fond a few write-ups including this. It sounds just like performance logging. You could try what they did and put your host in maintenance mode and kill hostd to see if it stops.
Thanks....that's a neat write-up.

It turns out that esxtop in "u" mode, then using "e" on a specific device shows what I would need. What I didn't realize was that a "world ID" is essentially a process ID and that the Path/World/Partition column is showing the process ID in this mode.

Unfortunately (or I guess fortunately) the activity has stopped so I don't have anything to chase down at the moment. It seems very strange.
 

xyzzy

Explorer
Joined
Jan 26, 2016
Messages
76
Don't forget that the txg timeout is 5 seconds, so the ESX write activity could be more erratic than that. You'd just need a write in the txg timeout window to trigger a flush at 5 seconds.
Very good point. I had forgotten about that.
Also, do you have atime enabled on the volumes? if so, I'm not sure if there's anything special in place to control atime updates on the ISCSI target files.. if not, it might be something in a FreeNas layer updating an atime..
I've got atime disabled on all my volumes so that's not in play here.
In any case, does it really matter? You're using these volumes for ISCSI to an ESX box, so there's a potential for I/O at any time. A small I/O event isn't going to have any measurable effect on your performance.
You're right in that its certainly not a huge deal. I was primarily curious to see what would be generating those writes when all VMs are off. However, if it turned out to be something that wasn't needed, I was interested in turning it off so it wouldn't impact performance (a possibility with rotational media). Plus, the array is currently close enough that I could hear this distracting rat-tat-tat pattern....luckily, its final position is further away and more isolated, sound-wise.
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
You're right in that its certainly not a huge deal. I was primarily curious to see what would be generating those writes when all VMs are off. However, if it turned out to be something that wasn't needed, I was interested in turning it off so it wouldn't impact performance (a possibility with rotational media). Plus, the array is currently close enough that I could hear this distracting rat-tat-tat pattern....luckily, its final position is further away and more isolated, sound-wise.

I admit I would be curious enough to run it down as well.. But I had to ask.. :)
 
Status
Not open for further replies.
Top