Why is c11 slower than x11?
I haven't been able to integrate the groestl AES_NI optimizations yet. I'm having
that problem with many algos that use groestl. Only x11 and quark are working.
It could do wonders on some other algos especially groestl itself. If I can get it
working it coould be a 100% boost.
4% boost in quark coming due to implementation of fast reinit_groestl. Maybe it
will work in ccminer too. It' was simple, just clone init_groestl and remove the constant
initializations. It speeds up the init every time groestl is run. Just make sure to do
a full init the first time.
Edit: only worked for quark because quark runs twice in the chain. Only need to do a
reinit before the second run. Full init works but is slower. No init is even faster but
never finds blocks.
Some of my improvements have come from optimizing the ctx init, avoiding
doing it for nothing. I haven't looked a ccminer but there may be opportunities there.
If it works for you don't forget where you got the idea.
