ok so i just tried to run
./cudaminer --algo=keccak -d gtx780 -L 16 -l K300x32 --benchmark
now thats using 2890MB memory of my 3GB, so all memory is being used, still only 34Mhash/s
you can increase the -L to 128 if need be. keccak doesn't need any memory, yet cudaminer still allocates the scratchpad as if we were mining scrypt coins. The -L argument reduces the scratchpad to sane sizes.
with huge -L you can try stuff like -l K 2048x32 ;-)
Christian