You Cythonized the hot loop and it got faster. Now the hard question: is it optimal, and how would you even tell?