optimization - tensorflow custom operation optimize -
i implemented lcnn : https://arxiv.org/abs/1611.06473
which uses sparse convolutional filter matrix speed up.
here github repo : https://github.com/ildoonet/tf-lcnn
below codes of custom operation.
for (int batch_idx = 0; batch_idx < input.shape().dim_size(0); batch_idx ++) { (int sparse_idx = 0; sparse_idx < weight_indices.shape().dim_size(0); sparse_idx ++) { int sparse_oc = index_tensor(sparse_idx, 0); int sparse_p = index_tensor(sparse_idx, 1); int sparse_ic = sparse_p / (dense_shape_[1] * dense_shape_[2]); int sparse_ix = (sparse_p % (dense_shape_[1] * dense_shape_[2])) / dense_shape_[1]; int sparse_iy = (sparse_p % (dense_shape_[1] * dense_shape_[2])) % dense_shape_[1]; int sparse_v = values_tensor(sparse_idx); (int row = 0; row < input.shape().dim_size(0); row += strides_[0]) { int out_row = row / strides_[0]; (int col = 0; col < input.shape().dim_size(1); col += strides_[1]) { int out_col = col / strides_[1]; output_tensor(batch_idx, out_row, out_col, sparse_oc) += input_tensor(batch_idx, row + sparse_ix, col + sparse_iy, sparse_ic) * sparse_v; } } } }
as can see here, naive way convolve , generate output tensor.
for every batch of 3d input tensor of (height, width, channel), access sparse matrix elements, , convolve 3d tensor provide strides.
i guess there plenty of ways improve code, respect memory access, cpu optimization, , etcs.
this codes embedded system, implement single core cpu.
any suggestions?
thanks.
Comments
Post a Comment