Tried out 2013-12-01 version on 9800GT + Win Vista SP2 x64.
Now getting little better results, but still result varies 2x with every launch of the miner.
If MCU is used up to 30%, getting ~30.2kh/s
If MCU is used only 15% - getting up to 18.2kh/s (edited) with new version of cudaminer MCU gets load 16% doesn't matter better or worse result...
here are the screen with same parameters, but different results (sorry for big images, but don't think someone can read from smaller ones):


After trying a bit, about 5mins can't get better result, just 18kh/s

P.S.: just started the same command line with x64 version of cudaminer and got ~28kh/s. After that getting 30kh/s with 32bit version of cudaminer again...but after stopping and starting miner again - geting 18kh/s...
