Ran through timetravel10 today, looks like with 8 fpgas (one dedicated to each algo) you might be able to get up into 1-10Gh/s. Bitcore definitely needs to do something. A small fpga cluster could 51% them pretty easily.
You've made some interesting optimizations.
My naïve reading is:
a) they have 11 algorithms coded
b) only first 10 are used
c) the whole hash is a nesting of always 10 sub-hashes
d) chosen without repetition
e) which gives 10! possibilities
f) the choice of permutation is keyed from the block height
g) not sequentially, but skipping up to 8! permutations
So my naïve implementation (one card dedicated to each sub-hash) would require 10 FPGA cards.
What is your secret ingredient?
Edit: Link to the source code:
https://github.com/LIMXTEC/BitCore/blob/master/src/crypto/hashblock.h