I committed the mods. Linux user can try them. (Edit: or Windows user who compile,
I updated project files)
./kangaroo -w save.work -wi 10 in.txt (Save work file every 10 sec)
./kangaroo -w save2.work -wi 10 in.txt (Save work file every 10 sec)
-snip
Thanks to test
I made the test with separate works on GPU, merge, continue on CPU, transfer to another machine with another GPU and return back. I did not save kangaroos, only the work (so kangaroos were created again).
The test was made with 2^80 range (for info: to solve it on 4xRTX2080ti i need only 6-8 minutes).
1) I started the job with 1xRTX2080ti and had expected operations 2^41.38 and about 38min time --> made work1 for 22 min (2^40.37 operations)
2) Started the same job from the beginning on the same 1xRTX2080ti and made work2 for 2min (2^36.5 operations)
3) Merged work1 and work2 to work3, and started work3 for 1 minute
4) Continued the job on CPU (4 threads) - started and stop 3 times.
5) Transferred the job to another machine with Tesla T4 (expected time there to finish was 1:16hour with expected operations 2^41.26). I started on Tesla (continued) with 2^40.84 operations saving the work to the same file, after 7 minutes stops and start again.
6) Tesla T4 did not finish the work for expected time and operations. Actually it finished the work with total time 2:22 hour (1 hour more as expected) and with total operations 2^42.26 - actually 2 times more.
While Tesla T4 was trying to finish the job, I copied the current work file from Tesla T4 machine back to 2080ti and continued the job on the same machine there I start (continued from 2^41.

I tried to continue the job 7 times - for the 1st time 2080ti solved the key for the extra 1 minute (with total operations 2^41.9), but all other 6 attempts I stopped while they reach 2^42.3 group operations (actually 2 times more than the expected).
Suggested DP on 2080ti was 18, but on Tesla T4 it was 19, however DP size was also 18 (as was started). So I do not think that different machines use different distinguish points pattern.
The feature to save work, merge work and continue work is a very good. Does this option takes care about hardware configuration change? The idea was to implement "a pause" button, and continue the "movie" later, or later on another screen.
As you developed these, probably you could understand what was the reason for such 2 times longer delay?
- Just no luck
- Creation of new kangaroos many times (instead of saving them and continue the full job)
- Configuration change (2080ti, then CPU, and then Tesla T4)