c++ - CUDA: access value from thread based on a warp -
i'm working on domain specific language(dsl) embedded in c++. supports gpgpu backend via cuda c++.
let me illustrate i'm trying do:
double * out = new double[100]; double * = new double[100]; reducer r; //init values out <<= reduce(a, r); //syntax provided dsl
this performs following:
for (i = 0; < 100; i++) { // reducer r provides mapping given i, in terms of custom iterator (j = r.start(i); j != r.end(i); j = r.next(i)) { out (i) = out(i) + a(j); } }
this straightforward on cuda backend achieve. assume i've syntax keyword called mapped_value()
provides value based on reducer r
included particular j
in mapping or function helps validate given (i,j)
pair.
out <<= reduce(a * mapped_value(), r); //syntax provided dsl
this doesn't seem straightforward achieve on cuda backend without storing o(n^2) elements every (i,j)
pair. cuda executes warps of 32 threads. there way can access values on these 32 threads, thereby storing 32 elements access mapped_value()
?
please let me know if not explain i'm trying clearly. try elaborate further.
Comments
Post a Comment