ARC Hit Ratio artificially high?

MikeyG · Apr 20, 2020

I'm trying to understand how ARC hit ratio is calculated, and not getting very far googling. My hit ratio seems to sit at well over 90% most of the time. I just did a 3TB transfer of data from one pool to another, and noticed that during that period of time it went down to about 80%. I've got 192GB of RAM, and this data is never accessed, so there's no way that 80% of it was coming from the cache. If it was, I'd assume that it would be from the MRU cache, which according to arc_summary.py is 30GB.

What does the ARC hit ratio actually mean? How could it remain so high while doing a file transfer where almost none of the data should be in ARC?

Graph during file transfer:

garm · Apr 20, 2020

Don’t cite me on this, but a stream of data is easy to anticipate and thus ZFS is able to load ARC with the needed data to a large extent. Therefor the hit ratio remains high, had it been random IO you would have seen a bigger drop

MikeyG · Apr 20, 2020

Thanks @garm. That would make sense to me if prefetch data showed a very high hit, but it doesn't:

Am I looking at the wrong thing and misinterpreting what "prefetch" means here?

MikeyG · Apr 21, 2020

I'm also wondering if metadata hit ratio factors into ARC hit ratio. You can see here metadata hits are near 100%:

If metadata hit rate is part of the ARC overall hit rate, and most metadata is able to be stored in ARC, I could see how all those requests for meta data could skew the ARC hit ratio, even if none of the actual data is being retrieved from ARC.

MikeyG · Apr 21, 2020

I think you were right.

Looking at the combination of metadata and prefetch reads as a total of ARC hits in netdata, the percentages match up. I'm not sure what exactly the numbers mean, but it seems that when I was seeing say 80% hit on ARC, that 80% was consisting of metadata that was found in ARC, and prefetch data that was (I think) read ahead by ZFS from disk. I believe actual prefetch hits would be prefetch data read from ARC, which continued to be very little. The remaining 20% would be data requested from ARC that was unavailable. My suspicion is that the hit ratio is calculated by the number of requests, not the amount of actual data requested. So if 10 requests are made to ARC, 9 of them being metadata which is held completely in ARC, then the hit ratio ends up being 90%, even if the amount of actual data requested dwarfs the amount of information read to retrieve the metadata.

Of course, this is a guess, so if someone wants to correct me that would be great.

Important Announcement for the TrueNAS Community.

ARC Hit Ratio artificially high?

MikeyG

Patron

garm

Wizard

MikeyG

Patron

MikeyG

Patron

MikeyG

Patron

Similar threads

Important Announcement for the TrueNAS Community.

ARC Hit Ratio artificially high?

MikeyG

Patron

garm

Wizard

MikeyG

Patron

MikeyG

Patron

MikeyG

Patron

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "ARC Hit Ratio artificially high?"

Similar threads