Memory organization for OpenCL local memory on Nvidia GPUs -


i trying optimize opencl kernel using local memory, use on nvidia gpus. read warps , how can access local memory banks efficiently , how bank conflicts happen. 1 thing not find example of how memory allocated multiple local memory declarations.

for example in opencl kernel:

__kernel void computeexample(__global float* input,                              __global float* output,                              __local float* multiplier,                              __local float* offsets) {     uint localid = get_local_id(0);      multiplier[localid] = localid * 2.0f;     offsets[localid] = localid + 2.0f;      // compute here } 

this illustration, want know when declare 2 or more local memory variables, how organized in memory on nvidia cards. allocated end end 1 begins @ end of previous? or each local variable start @ first memory bank leaving possible padding between variables start on 128byte boundary (32 banks x 4 bytes per bank). order of declaration in kernel determine order reside in memory?

what want optimize local memory size avoid bank conflicts , take advantage possible coalesced accessing.

i understand may vary device device , possibly on different nvidia gpus there no guarantees, ideas or tips on best way organize local data helpful information.

thank scott


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

Add a dynamic header in angular 2 http provider -

minify - Minimizing css files -