I didn't find it, it found me

Git, your memory lets you down - here's also a thread about f1__spor: h**p://www.reteam.org/board/showthread.php?t=1777&page=3
Indeed, the rce'ed f1_nodongle as well as f1__spor use a lookup table to pre-calculate s=d*d but I don't think it's any faster than a simple imul, especially if you take the cache miss into consideration which will definitely happen at the start of the first loop.
Out of curiosity, where did you spotted this value 7569 ?