Getting ~525khash/s with settings: -l Z12x24 on latest commit x64.
Getting ~432khash/s with settings: -l T12x24 on 12-18-2013 x64.
I'd definately say the nVidia guy did a good job

That's some good data. Thanks for testing!
Here are my results
[2014-01-23 00:42:52] GPU #0: GeForce GTX 780 Ti with compute capability 3.5
[2014-01-23 00:42:52] GPU #0: interactive: 0, tex-cache: 0 , single-alloc: 1
[2014-01-23 00:42:52] GPU #0: 32 hashes 4.0 MB per warp.
[2014-01-23 00:42:52] GPU #0: using launch configuration Z15x24
[2014-01-23 00:49:17] GPU #0: GeForce GTX 780 Ti, 626.61 khash/s
[2014-01-23 00:49:59] GPU #0: GeForce GTX 780 Ti, 626.99 khash/s
I think the machine delivered 530 kHash/s per card before my scrypt-jane additions to the T kernel.
So the net gain is nearly 100 kHash/s for me.
My 1.6 MHash/s scrypt mining rig just became a 1.88 MHash/s scrypt rig !
I take back what I said about "marginal improvements over David Andersen's work".

I will let it run on middlecoin for the night to see if I can still leech some good coins off the DOGE craze that has been going on.