Thanks Bipben for the GPU generator and your hard work!  I've gone ahead and doubled your burst wallet balance for you. 

Wow! Thanks a lot for your support 

I will continue to enhance the plot generator. The next version will be centered on performance.
Some people pointed out that it would be great to be able to split the work between many graphic cards. I will add this feature to the roadmap.
v.2.0.0 seems solid, thanks.... 
Also, could you explain the relationship of the stagger size/hashes ratio? It was kind of vague to understand it.
There is no relationship between the two.
The <staggerSize> parameter is used to order the scoops into the resulting plots file. The greater the stagger, the more RAM will be required to generate the nonces.
The <hashesSize> parameter is a value between 1 and 8160 used to split the step2 global work in chunks. Those chunks of workload are then queued to your graphic card one after another.
Example : ./gpuPlotGenerator 0 0 <path> <address> 0 100 10 64 400
-> Creating generation buffer ((PLOT_SIZE + 16) * staggerSize)
-> Creating result buffer (PLOT_SIZE * staggerSize)
-----> Generating 10 (staggerSize) nonces (from 0 to 9)
-> Step1: Buffer initialisation spread on <threadsNumber=64> threads
-> Step2:
  -> Computing 400 hashes spread on <threadsNumber=64> threads (from 0 to 399)
  -> Computing 400 hashes spread on <threadsNumber=64> threads (from 400 to 799)
  -> ...
-> Step3: reordering scoops : moving them from the generation buffer to the result buffer
-> Writing nonces to disk
-----> Generating 10 (staggerSize) nonces (from 10 to 19)
... and so on
As you can see, without the reordering process, we should be able to reduce the amount of memory needed by the plot generation (or at least scaling it as we want without any impact in the mining) and greatly enhance its parallelization. Postponing this step could greatly improve the whole generation process but will increased I/O operations. I will make different versions to test some ideas on that matter.
Also, I will try to lighten has much as possible the step2 kernel processing to increase the nonces/minutes.