optimization - tensorflow custom operation optimize -


i implemented lcnn : https://arxiv.org/abs/1611.06473

which uses sparse convolutional filter matrix speed up.

here github repo : https://github.com/ildoonet/tf-lcnn

below codes of custom operation.

for (int batch_idx = 0; batch_idx < input.shape().dim_size(0); batch_idx ++) {         (int sparse_idx = 0; sparse_idx < weight_indices.shape().dim_size(0); sparse_idx ++) {             int sparse_oc = index_tensor(sparse_idx, 0);             int sparse_p = index_tensor(sparse_idx, 1);             int sparse_ic = sparse_p / (dense_shape_[1] * dense_shape_[2]);             int sparse_ix = (sparse_p % (dense_shape_[1] * dense_shape_[2])) / dense_shape_[1];             int sparse_iy = (sparse_p % (dense_shape_[1] * dense_shape_[2])) % dense_shape_[1];              int sparse_v = values_tensor(sparse_idx);              (int row = 0; row < input.shape().dim_size(0); row += strides_[0]) {                 int out_row = row / strides_[0];                 (int col = 0; col < input.shape().dim_size(1); col += strides_[1]) {                     int out_col = col / strides_[1];                      output_tensor(batch_idx, out_row, out_col, sparse_oc) += input_tensor(batch_idx, row + sparse_ix, col + sparse_iy, sparse_ic) * sparse_v;                 }             }          }     } 

as can see here, naive way convolve , generate output tensor.

for every batch of 3d input tensor of (height, width, channel), access sparse matrix elements, , convolve 3d tensor provide strides.

i guess there plenty of ways improve code, respect memory access, cpu optimization, , etcs.

this codes embedded system, implement single core cpu.

any suggestions?

thanks.


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -