May I propose a different approach for much faster mining?
Currently, most, if not all of CPU-mineable coins, are cripple-mined.The reason is simple: Under-utilizing of the SIMD nature of SSE & AVX sets.
SSE and AVX commands are used in SISD fashion (single instruction single data, instead of Multiple data SIMD), meaning they are not processing 2 batches of information but one.
Right now hashing goes on like that:
The main mining routine sends one output to each hash, where it will be subject to a process of SERIAL transmutations permutation and in the end the hash will output that data back to the miner (some times to send it to the next hash).
This serial process doesn't allow for much Single Instruction Multiple Data utilization.
What should be done instead is that the miner program should issue 2-4 hash candidates to the hashing routines. The hashing routines should be able to get 2-4 inputs (instead of 1) and return back 2-4 outputs. In this way the process would be paralleled and SIMD utilization (packed processing of similar instructions) would result in much faster processing.
Now this might require a lot of recoding, or, one could adjust the code in C for use with a special compiler which runs multiple instances of serial data crunching in order to process them in "packs" with SIMD or "packed" instructions - and then let the compiler do all the packing. Performance benefits of such an approach here:
http://ispc.github.io/perf.html