Memory organization for OpenCL local memory on Nvidia GPUs -
i trying optimize opencl kernel using local memory, use on nvidia gpus. read warps , how can access local memory banks efficiently , how bank conflicts happen. 1 thing not find example of how memory allocated multiple local memory declarations.
for example in opencl kernel:
__kernel void computeexample(__global float* input, __global float* output, __local float* multiplier, __local float* offsets) { uint localid = get_local_id(0); multiplier[localid] = localid * 2.0f; offsets[localid] = localid + 2.0f; // compute here }
this illustration, want know when declare 2 or more local memory variables, how organized in memory on nvidia cards. allocated end end 1 begins @ end of previous? or each local variable start @ first memory bank leaving possible padding between variables start on 128byte boundary (32 banks x 4 bytes per bank). order of declaration in kernel determine order reside in memory?
what want optimize local memory size avoid bank conflicts , take advantage possible coalesced accessing.
i understand may vary device device , possibly on different nvidia gpus there no guarantees, ideas or tips on best way organize local data helpful information.
thank scott
Comments
Post a Comment