Currently you can test keccak in github. use the K kernel (just imagine K stands for Keccak).
The specified blocks/warps config is used for keccak. A good guess would be a larger multiple of your
GPU's SMX with 32 warps. Fermi CPUs can only run 16 warps.
Autotune is definitely NOT working.
the keccak256 code for maxcoin is currently compiled against compute_10 which means it runs on any GPU.
Performance isn't stellar yet.
cudaminer --algo=keccak -d gtx780 -L 16 -l K192x32 --benchmark
some 20 MHash/s already... Beats my CPU!

Only 40% TDP. There's headroom!
More work to be done tomorrow. I need to get rid of the huge scrypt scratchpad buffers. They are not needed for keccak.
Also we may want to have some autotune. I currently use the -L parameter to artificially make the scratchpad smaller,
so I can run more blocks.
Uh, PCI express bandwidth is going to be a bottleneck at these MHash rates. Gotta do the hash verification on the GPU!
Without the device to host memory transfers I can hit 80 MHash/s, getting 75% TDP. So this is the way to go!
One AMD CPU core: 1 MHash. lol

Christian