deep learning - How L2 Regularization changes backpropogation formulas - Cross Validated
i going through online deep learning book , trying recreate neural network written there bit of different class designs. however, i've run problem, when using l2 regularization can't see impact on backpropogation formulas. here's mean. formula in backpropogation uses loss function derivative 1 defining error output layer , defined follow:
error = c'(a) * a'(z)
where c'(a) loss function derivative respect activation , a'(z) activation function derivative respect weighed input. don't see how part of equation changes when adding l2 regularization. believe should derivative of loss function respect activation should change, we're adding squared weights should disappear when calculating derivative(since respect activation, not weights). should wrong logic, please tell is.
edit: more specific. suppose use quadratic loss function l2 regularization. follow true , if not, why?
c'(a) = - y
where activation , y desired output.
for cost function, if use l2 regularization, besides regular loss function, need add additional loss caused high weights. need add below value loss function. lambda hyperparameter controls l2 regularization. when equals 0, no regularization @ all. m number of instances.
now when propagation , calculate derivative, need calcuate additional cost's derivative too
when update weights, need substract learning rate * additional derivative. pushes weight lower, called weight decay.


Comments
Post a Comment