Ok, yeah, I missed some of those things. But regarding the 4.8 MK/s what am I missing there? What do you get on a standard public program like keyhunt-cuda or VS range limited version? I get 5 MK/s with out of the box keyhunt-cuda for address search on a single core.
Seems like you took cyclone or some other program and just added functionality to it versus changing the math or other type of real speed up. But it is early/late so maybe I missed something else too, especially since I am using my RDP to try and get a hint on the exact keyspace for 68

Youre right, I added features to Cyclone itself since Ive seen so many posts about it, so implementing my approach there made sense.
I havent tried keyhunt-cuda yet, but Ill check it out to see what speed I get. Theres no doubt that much higher speeds are available in the market.
I don't know why but i also search 68 mostly in this range.. lol
I know why y'all are. Because 67s h160 starting chars were the range in which it was found. 68's starts with E0, so that is probably why.
Lol you are right i even remember the post of @mcdouglasx as he mention this range in his probability list

If low-level custom firmware can be developed to give us direct access to the ASICs SHA-256 hashing function (without its built-in mining logic), then this approach could work. We could offload the SHA-256 part to the ASIC while handling the rest on the GPU/CPU, creating a hybrid GPU + ASIC system for Bitcoin address generation.
It's easier to simply use two GPUs to increase throughput, instead of trying to solve technical bottlenecks.
Because, before any hashing takes place, a GPU can produce much more public keys then the amount it can transfer out. The memory can't keep up.
An ASIC is required to do everything internally, on-chip.
The biggest issue is exactly what you pointed out: before any hashing even happens,
a GPU can generate way more public keys than it can transfer out. The memory bandwidth becomes the limiting factor, not just the compute power.
ASICs, on the other hand, are designed to handle everything on-chip, avoiding this bottleneck entirely. So unless an ASIC is specifically designed for this kind of workload, trying to mix ASICs and GPUs effectively might not be worth the effort.
Lets see who will be the first to bring this approach into a practical, working solution. Though, to be honest, it sounds much easier in theory than it will be in practice!
