Bitcointalk Mobile Browser presented by IRLBTC™

<< >> (p.201)

	Author	Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.1 (Read 5806639 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic. (3 posts by 1+ user deleted.)

-ck (OP)

Legendary

Offline

Activity: 4634
Merit: 1701

Ruu \o/

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 09:03:58 AM

#4001

Quote from: ckolivas on February 17, 2012, 08:15:37 AM

Quote from: Diapolo on February 17, 2012, 08:09:28 AM

Quote from: ckolivas on February 17, 2012, 08:05:40 AM

Quote from: Diapolo on February 17, 2012, 07:59:49 AM

For -v 2 it should halve the global worksize (global Threads) number, so doubling this in the kernel should be no problem at all, right?

global_id == global_work_offset + global_worksize

You're doubling the global id, not the worksize.

Global ID without global offset is: 0 till (global worksize - 1) queried in the kernel via get_global_size(0). So for 2-component vectors we need a global worksize 2 (as argument in the enqueue kernel call), because there are 2 nonces processed in each work-item.

Global ID with global offset is: global offset till (global worksize + global offset - 1).

Dia

I still think it should be something like:
u nonce = ((uint)get_global_id(0) + get_global_size(0) * (0,1,2,3));
for vectors4 etc.

Anyway I ended up trying it both ways with your nonce code or mine and neither led to any improvement (actually detriment if anything).

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck

blandead

Newbie

Offline

Activity: 46
Merit: 0

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 09:07:22 AM

#4002

If you're trying to use vectors then there is a type mis-match either stick with putting a (u) in front or use (uint4) and the (0, 1, 2, 3) should be on the outside parenthesis.

Here is a float4 example...

float4 f = (float4)(1.0f, 2.0f, 3.0f, 4.0f);

Also in the _kernal void search if you keep (_global uint * output) then you're not really utilizing vectors correctly

And, sorry was just trying to provide some general feedback with Out of Order Execution, wasn't trying to offend you, I'm just not sure how to edit cgminer directly.

-ck (OP)

Legendary

Offline

Activity: 4634
Merit: 1701

Ruu \o/

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 09:27:57 AM

#4003

Quote from: blandead on February 17, 2012, 09:07:22 AM

Thanks.

Are you saying the existing code is losing shares with __global uint * output? 99% of users on cgminer are currently using 2 vectors. Again I doubt that's the case.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck

Diapolo

Hero Member

Offline

Activity: 773
Merit: 500

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 09:29:18 AM

#4004

Quote from: ckolivas on February 17, 2012, 08:15:37 AM

Quote from: Diapolo on February 17, 2012, 08:09:28 AM

Quote from: ckolivas on February 17, 2012, 08:05:40 AM

Quote from: Diapolo on February 17, 2012, 07:59:49 AM

For -v 2 it should halve the global worksize (global Threads) number, so doubling this in the kernel should be no problem at all, right?

global_id == global_work_offset + global_worksize

You're doubling the global id, not the worksize.

I still think it should be something like:
u nonce = ((uint)get_global_id(0) + get_global_size(0) * (0,1,2,3));
for vectors4 etc.

Thinking loud again:

get_global_id(0) == ranges from global_offset for the 1st work-item till (global_offset + (global_worksize - 1)) for the last work-item
get_global_size(0) == global_worksize (constant value)

global_offset == nonce-base, that results in:

nonce.x = nonce-base + global_worksize * 0;
nonce.y = nonce-base + global_worksize * 1;
nonce.z = nonce-base + global_worksize * 2;
nonce.w = nonce-base + global_worksize * 3;

Let's consider 10 as nonce-base and 4 as global_worksize. This leads to the following nonces that get checked during 1 kernel execution:

Work-Item 0:
10 + 4 * 0 = 10
10 + 4 * 1 = 14
10 + 4 * 2 = 18
10 + 4 * 3 = 22

Work-Item 1:
11 + 4 * 0 = 11
11 + 4 * 1 = 15
11 + 4 * 2 = 19
11 + 4 * 3 = 23

Work-Item 2:
12 + 4 * 0 = 12
12 + 4 * 1 = 16
12 + 4 * 2 = 20
12 + 4 * 3 = 24

Work-Item 0:
13 + 4 * 0 = 13
13 + 4 * 1 = 17
13 + 4 * 2 = 21
13 + 4 * 3 = 25

So we have nonces from 10 to 25

Now if we divide the passed global worksize by 4 (because of 4-component vector usage in your example) and use 1 for it this leads to:

Work-Item 0:
10 + 1 * 0 = 10
10 + 1 * 1 = 11
10 + 1 * 2 = 12
10 + 1 * 3 = 13

So I guess your code works, if you divide the global worksize by the vec-size before passing that argument to clEnueueNDRangeKernel.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

-ck (OP)

Legendary

Offline

Activity: 4634
Merit: 1701

Ruu \o/

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 09:40:43 AM

#4005

Quote from: Diapolo on February 17, 2012, 09:29:18 AM

Quote from: ckolivas on February 17, 2012, 08:15:37 AM

Quote from: Diapolo on February 17, 2012, 08:09:28 AM

Quote from: ckolivas on February 17, 2012, 08:05:40 AM

Quote from: Diapolo on February 17, 2012, 07:59:49 AM

For -v 2 it should halve the global worksize (global Threads) number, so doubling this in the kernel should be no problem at all, right?

global_id == global_work_offset + global_worksize

You're doubling the global id, not the worksize.

I still think it should be something like:
u nonce = ((uint)get_global_id(0) + get_global_size(0) * (0,1,2,3));
for vectors4 etc.

Thanks. My issue with your code being:

Code:

u nonce = ((uint)get_global_id(0) << 2) + (u)(0, 1, 2, 3);

as I said is it really won't be testing the nonce range we're asking it to test. If "base" is 2^31 and worksize is 2^24 (intensity 9), then get_global_id(0) will return 2^31 for the very first thread. Then if we shift it << 2 it's going to be undefined and in most implementations will just be zero again. Which means we'll be repeating 2^24 operations on nonces 0 - 2^24, which we would have done initially on getting that work item.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck

Diapolo

Hero Member

Offline

Activity: 773
Merit: 500

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 09:55:10 AM
Last edit: February 17, 2012, 10:17:36 AM by Diapolo

#4006

Quote from: ckolivas

Thanks. My issue with your code being:

Code:

u nonce = ((uint)get_global_id(0) << 2) + (u)(0, 1, 2, 3);

get_global_id(0) for the very first thread is simply base, if passed as global_work_offset parameter. So range is from "base" till "base + (2^24 - 1)". If base is 2^31 and we shift left by 2 for Vec4, you are right and we are undefined here. Code is easy for no vectors, but wrong for vectors in it's current form.

Edit: Would that work for Vec2 (base: 10 global-worksize: 4)?

Code:

u nonce = (uint)get_global_id(0) + (u)(0, (uint)get_global_size(0));

base 10 the nonces would be: 10, 14, 11, 15, 12, 16, 13, 17

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

-ck (OP)

Legendary

Offline

Activity: 4634
Merit: 1701

Ruu \o/

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 10:00:14 AM

#4007

Code:

#if defined VECTORS4
#ifdef GOFFSET
u nonce = (uint)get_global_id(0) + (u)(0, get_global_size(0), get_global_size(0) << 1, get_global_size(0) * 3);
#else
u nonce = ((uint)get_group_id(0) * (uint)get_local_size(0) << 2) + ((uint)get_local_id(0) << 2) + base;
#endif
#elif defined VECTORS2
#ifdef GOFFSET
u nonce = (uint)get_global_id(0) + (u)(0, get_global_size(0));
#else
u nonce = ((uint)get_group_id(0) * (uint)get_local_size(0) << 1) + ((uint)get_local_id(0) << 1) + base;
#endif
#else

should do it

and cgminer already takes vectors into account when increasing nonce value to pass to base on the next pass. This doesn't change it. cgminer effectively sends twice as much work when vectors go from 1 to 2 so the intensity is effectively different at different vectors.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck

-ck (OP)

Legendary

Offline

Activity: 4634
Merit: 1701

Ruu \o/

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 10:05:02 AM

#4008

Btw, the performance of it is pretty average, after all that discussion...

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck

Diapolo

Hero Member

Offline

Activity: 773
Merit: 500

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 10:20:02 AM

#4009

Quote from: ckolivas on February 17, 2012, 10:05:02 AM

Btw, the performance of it is pretty average, after all that discussion...

Perhaps the changes needed to make it work ate the small benefits the solution offers ... but I had to LOL when I saw we came up with the same solution ^^. I posted and read your version after that and they look equal for VEC2 Cheesy

.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

-ck (OP)

Legendary

Offline

Activity: 4634
Merit: 1701

Ruu \o/

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 10:48:19 AM

#4010

Quote from: Diapolo on February 17, 2012, 10:20:02 AM

Quote from: ckolivas on February 17, 2012, 10:05:02 AM

Btw, the performance of it is pretty average, after all that discussion...

I'd say you're right. Oh well, always other things to try.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck

-ck (OP)

Legendary

Offline

Activity: 4634
Merit: 1701

Ruu \o/

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 11:23:20 AM

#4011

Quote from: ckolivas on February 17, 2012, 10:48:19 AM

Quote from: Diapolo on February 17, 2012, 10:20:02 AM

Quote from: ckolivas on February 17, 2012, 10:05:02 AM

Btw, the performance of it is pretty average, after all that discussion...

I'd say you're right. Oh well, always other things to try.

You know I could make cgminer "skip" nonce ranges when it's using goffset so that the code can work with less ops. This will drop efficiency though since it will decrease the amount of work a device gets before it needs new work.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck

-ck (OP)

Legendary

Offline

Activity: 4634
Merit: 1701

Ruu \o/

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 11:32:44 AM

#4012

Meh, it ended up being of no advantage for unnecessary complexity.

* ckolivas forgets all about goffset for now.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck

Proofer

Member

Offline

Activity: 266
Merit: 36

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 02:09:17 PM

#4013

Quote from: ckolivas on February 17, 2012, 09:27:57 AM

... 99% of users on cgminer are currently using 2 vectors. ...

We p2pool users were advised to use -v 1; are we the 1%?

-ck (OP)

Legendary

Offline

Activity: 4634
Merit: 1701

Ruu \o/

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 02:12:34 PM

#4014

Quote from: Proofer on February 17, 2012, 02:09:17 PM

Quote from: ckolivas on February 17, 2012, 09:27:57 AM

... 99% of users on cgminer are currently using 2 vectors. ...

We p2pool users were advised to use -v 1; are we the 1%?

No, you were advised to use -g 1
I don't recall saying to use one vector for p2pool and if anyone did say that, they're wrong.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck

-ck (OP)

Legendary

Offline

Activity: 4634
Merit: 1701

Ruu \o/

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 03:14:54 PM

#4015

More AMD breakage coming up. As Diapolo hinted earlier, there is a new AMD driver 12.2 with an SDK that claims to be sdk 2.6 but comes up with the version number 898.1. It breaks cgminer completely making it unable to build any binaries. Angry

I have yet to investigate why but please do not upgrade unless you already have .bin files that work. I'm going to start a collection of bin files that people may be able to download and they'll be housed here:

http://ck.kolivas.org/apps/cgminer/bins/

Notably there are Tahiti (7970) .bins for 32 bit (long4) and 64 bit (long8) as these depend on sdk2.6 and people may well get a nasty surprise if they try to get it working now with the latest drivers.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck

Diapolo

Hero Member

Offline

Activity: 773
Merit: 500

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 03:21:28 PM

#4016

Quote from: ckolivas on February 17, 2012, 03:14:54 PM

As I said, bad stuff incoming ... here are some version strings from Windows:

platform version: OpenCL 1.1 AMD-APP (898.1)

device infos (verified to be equal on Tahiti and BeaverCreek):
OpenCL software driver version: CAL 1.4.1703 (VM)
supported OpenCL version (FULL_PROFILE): OpenCL 1.1 AMD-APP (898.1)

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

DeathAndTaxes

Donator
Legendary

Offline

Activity: 1218
Merit: 1194

Gerald Davis

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 03:28:05 PM

#4017

Quote from: ckolivas on February 17, 2012, 03:14:54 PM

Alternatively you could install upgrade but (in windows) select custom install and UNCHECK SDK. Not sure if 12.2 has any notable changes compared to 12.1 but if it does that is way to get "improved" ( Huh

with AMD deproved) and keep existing SDK installation.

Diapolo

Hero Member

Offline

Activity: 773
Merit: 500

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 03:38:22 PM

#4018

Quote from: DeathAndTaxes on February 17, 2012, 03:28:05 PM

Quote from: ckolivas on February 17, 2012, 03:14:54 PM

with AMD deproved) and keep existing SDK installation.

Yes, for now one should uncheck OpenCL Runtime during Catalyst upgrade until CGMINER is fixed.
My first look made me scream on another fact, they did heavy work on their OpenCL compiler, which tends to behave again very differently compared to former versions ... seems like more work in the future Cheesy

(it looks like they preffer vector GPRs over scalar GPRs with the new runtime as there are massive shifts in GPR usage).

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

JWU42

Legendary

Offline

Activity: 1666
Merit: 1000

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 03:50:28 PM

#4019

The latest 12.2 pre has 898.1 SDK (2/16/12 release). I had to manually remove the .dll files and run the 12.1 installer to get the older 854.1 SDK.

GPG PubKey | THREEMA | OTC | HeatWare

dlasher

Sr. Member

Offline

Activity: 467
Merit: 250

Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6

February 17, 2012, 03:57:51 PM

#4020

Quote from: DeathAndTaxes on February 17, 2012, 02:41:50 AM

Quote from: dlasher on February 17, 2012, 02:31:16 AM

Quote from: ckolivas on February 16, 2012, 03:34:19 AM

cgminer sets clocks all back to default on exit... if it exits cleanly, and of course on windows it's a miracle when it does.

Sadly this is not the case.

windows machine, pair of 6950's, set to 850/1300 for normal operation, in cgminer they are set to 700-880/300... when cgminer exits it leaves the cards at 880/300.

even updated to 12.1 drivers, both 2.4 and 2.6 SDK.

Is it exiting or crashing?

I run cgminer from a .bat file in windows, and when I hit the "Q" button, it gives me the 2-page summary of stats, then sits for about 3 seconds and then closes the dos window. I'm not getting any complaints from windows about it crashing/hanging/etc.

ckolivas: I know you have the impossible task of trying to make an app play nice with windows. I'm HAPPY to run any sort of debug load/options you might have to give better/more-detailed information.

Pages: « 1 ... 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 [201] 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 ... 843 »

Page 200

Viewing Page: 201