Memory organization for OpenCL local memory on Nvidia GPUs -

September 15, 2013

i trying optimize opencl kernel using local memory, use on nvidia gpus. read warps , how can access local memory banks efficiently , how bank conflicts happen. 1 thing not find example of how memory allocated multiple local memory declarations.

for example in opencl kernel:

__kernel void computeexample(__global float* input,                              __global float* output,                              __local float* multiplier,                              __local float* offsets) {     uint localid = get_local_id(0);      multiplier[localid] = localid * 2.0f;     offsets[localid] = localid + 2.0f;      // compute here }

this illustration, want know when declare 2 or more local memory variables, how organized in memory on nvidia cards. allocated end end 1 begins @ end of previous? or each local variable start @ first memory bank leaving possible padding between variables start on 128byte boundary (32 banks x 4 bytes per bank). order of declaration in kernel determine order reside in memory?

what want optimize local memory size avoid bank conflicts , take advantage possible coalesced accessing.

i understand may vary device device , possibly on different nvidia gpus there no guarantees, ideas or tips on best way organize local data helpful information.

thank scott

Search This Blog

Single

Memory organization for OpenCL local memory on Nvidia GPUs -

Comments

Post a Comment

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -