python - Using Backward Propagation in fmin_cg -
i trying build ann in python, , i've been able far to forward pass, problem when try backward propagation. in function nncostfunction, gradient grad
define as:
grad = tr(c_[theta1_grad.swapaxes(1,0).reshape(1,-1), theta2_grad.swapaxes(1,0).reshape(1,-1)])
but problem because using scipy.optimize.fmin_cg calculate nn_params , cost, , fmin_cg accepts single value (the j
value forward pass) , cannot accept grad...
nn_params, cost = op.fmin_cg(lambda t: nncostfunction(t, input_layer_size, hidden_layer_size, num_labels, x, y, lam), initial_nn_params, gtol = 0.001, maxiter = 40, full_output=1)[0, 1]
is there way fix can include backward propagation in network? know there scipy.optimize.minimize function, having difficulty understand how use , results need. know needs done?
your appreciated, thanks.
def nncostfunction(nn_params, input_layer_size, hidden_layer_size, num_labels, x, y, lam): ''' given nn parameters, layer sizes, number of labels, data, , learning rate, returns cost of traversing nn. ''' theta1 = (reshape(nn_params[:(hidden_layer_size*(input_layer_size+1))],(hidden_layer_size,(input_layer_size+1)))) theta2 = (reshape(nn_params[((hidden_layer_size*(input_layer_size+1))):],(num_labels, (hidden_layer_size+1)))) m = x.shape[0] n = x.shape[1] #forward pass y_eye = eye(num_labels) y_new = np.zeros((y.shape[0],num_labels)) z in range(y.shape[0]): y_new[z,:] = y_eye[int(y[z])-1] y = y_new a_1 = c_[ones((m,1)),x] z_2 = tr(theta1.dot(tr(a_1))) a_2 = tr(sigmoid(theta1.dot(tr(a_1)))) a_2 = c_[ones((a_2.shape[0],1)), a_2] a_3 = tr(sigmoid(theta2.dot(tr(a_2)))) j_reg = lam/(2.*m) * (sum(sum(theta1[:,1:]**2)) + sum(sum(theta2[:,1:]**2))) j = (1./m) * sum(sum(-y*log(a_3) - (1-y)*log(1-a_3))) + j_reg #backprop d_3 = a_3 - y d_2 = d_3.dot(theta2[:,1:])*sigmoidgradient(z_2) theta1_grad = 1./m * tr(d_2).dot(a_1) theta2_grad = 1./m * tr(d_3).dot(a_2) #add regularization theta1_grad[:,1:] = theta1_grad[:,1:] + lam*1.0/m*theta1[:,1:] theta2_grad[:,1:] = theta2_grad[:,1:] + lam*1.0/m*theta2[:,1:] #unroll gradients grad = tr(c_[theta1_grad.swapaxes(1,0).reshape(1,-1), theta2_grad.swapaxes(1,0).reshape(1,-1)]) return j, grad def nn_train(x,y,lam = 1.0, hidden_layer_size = 10): ''' train neural network given features , class arrays, learning rate, , size of hidden layer. return parameters theta1, theta2. ''' # nn input , output layer sizes input_layer_size = x.shape[1] num_labels = unique(y).shape[0] #output layer # initialize nn parameters initial_theta1 = randinitializeweights(input_layer_size, hidden_layer_size) initial_theta2 = randinitializeweights(hidden_layer_size, num_labels) # unroll parameters initial_nn_params = np.append(initial_theta1.flatten(1), initial_theta2.flatten(1)) initial_nn_params = reshape(initial_nn_params,(len(initial_nn_params),)) #flatten 1-d array # find , print initial cost: j_init = nncostfunction(initial_nn_params,input_layer_size,hidden_layer_size,num_labels,x,y,lam)[0] grad_init = nncostfunction(initial_nn_params,input_layer_size,hidden_layer_size,num_labels,x,y,lam)[1] print 'initial j cost: ' + str(j_init) print 'initial grad cost: ' + str(grad_init) # implement backprop , train network, run fmin print 'training neural network...' print 'fmin results:' nn_params, cost = op.fmin_cg(lambda t: nncostfunction(t, input_layer_size, hidden_layer_size, num_labels, x, y, lam), initial_nn_params, gtol = 0.001, maxiter = 40, full_output=1)[0, 1] theta1 = (reshape(nn_params[:(hidden_layer_size*(input_layer_size+1))],(hidden_layer_size,(input_layer_size+1)))) theta2 = (reshape(nn_params[((hidden_layer_size*(input_layer_size+1))):],(num_labels, (hidden_layer_size+1)))) return theta1, theta2
Comments
Post a Comment