One function took 462 seconds to fingerprint. Here is the algorithm that turned that into a couple of seconds, and why it is really an old string-matching idea run backwards.
You Cythonized the hot loop and it got faster. Now the hard question: is it optimal, and how would you even tell?
Cython and IDA Python for super fast reversing tools