I'm not trying to get you to change your workflow, but is there something you can do differently upstream in which you're creating massively-sized duplicate files in the first place?
It seems like a waste of RAM, CPU cycles, and storage. Perhaps there's another way?
I know that the program czkawka (GUI and command-line version) employs a "partial hash" trick, which not only speeds up the process, but tries to avoid unnecessarily loading and computing the hash for the entire file if happens to be the same exact size as another file.
The way this works is that if two or more files have the same exact size, it doesn't just load them in their entirety to compute the hash for the entire file. Rather, it computes the hash of the first few kilobytes of data. If they differ, then it doesn't bother to move onto the next step. If they match? Only then will it compute the hash for the entire file.
This spares your system from loading and computing the hash for every single "file size match".
However, if you're using the binary pkg manager, installing czkawka
will pull in many dependencies, due to its GUI elements.