c++ - CUDA: access value from thread based on a warp -


i'm working on domain specific language(dsl) embedded in c++. supports gpgpu backend via cuda c++.

let me illustrate i'm trying do:

double * out = new double[100]; double * = new double[100];  reducer r;  //init values out <<= reduce(a, r); //syntax provided dsl 

this performs following:

for (i = 0; < 100; i++) {     // reducer r provides mapping given i, in terms of custom iterator    (j = r.start(i); j != r.end(i); j = r.next(i)) {         out (i) = out(i) + a(j);     } }  

this straightforward on cuda backend achieve. assume i've syntax keyword called mapped_value() provides value based on reducer r included particular j in mapping or function helps validate given (i,j) pair.

out <<= reduce(a * mapped_value(), r); //syntax provided dsl 

this doesn't seem straightforward achieve on cuda backend without storing o(n^2) elements every (i,j) pair. cuda executes warps of 32 threads. there way can access values on these 32 threads, thereby storing 32 elements access mapped_value()?

please let me know if not explain i'm trying clearly. try elaborate further.


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

Add a dynamic header in angular 2 http provider -

minify - Minimizing css files -