Linear regression gradient descent algorithms in R produce varying results -
i trying implement linear regression in r scratch without using packages or libraries using following data:
uci machine learning repository, bike-sharing-dataset
the linear regression easy enough, here code:
data <- read.csv("bike-sharing-dataset/hour.csv") # select useable features data1 <- data[, c("season", "mnth", "hr", "holiday", "weekday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed", "cnt")] # split data trainingobs<-sample(nrow(data1),0.70*nrow(data1),replace=false) # create training dataset trainingds<-data1[trainingobs,] # create test dataset testds<-data1[-trainingobs,] x0 <- rep(1, nrow(trainingds)) # column of 1's x1 <- trainingds[, c("season", "mnth", "hr", "holiday", "weekday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed")] # create x- matrix of explanatory variables x <- as.matrix(cbind(x0,x1)) # create y-matrix of dependent variables y <- as.matrix(trainingds$cnt) m <- nrow(y) solve(t(x)%*%x)%*%t(x)%*%y
the next step implement batch update gradient descent , here running problems. dont know errors coming or how fix them, code works. problem values being produced radically different results of regression , unsure of why.
the 2 versions of batch update gradient descent have implemented follows (the results of both algorithms differ 1 , results of regression):
# gradient descent 1 gradientdesc <- function(x, y, learn_rate, conv_threshold, n, max_iter) { plot(x, y, col = "blue", pch = 20) m <- runif(1, 0, 1) c <- runif(1, 0, 1) yhat <- m * x + c mse <- sum((y - yhat) ^ 2) / n converged = f iterations = 0 while(converged == f) { ## implement gradient descent algorithm m_new <- m - learn_rate * ((1 / n) * (sum((yhat - y) * x))) c_new <- c - learn_rate * ((1 / n) * (sum(yhat - y))) m <- m_new c <- c_new yhat <- m * x + c mse_new <- sum((y - yhat) ^ 2) / n if(mse - mse_new <= conv_threshold) { abline(c, m) converged = t return(paste("optimal intercept:", c, "optimal slope:", m)) } iterations = iterations + 1 if(iterations > max_iter) { abline(c, m) converged = t return(paste("optimal intercept:", c, "optimal slope:", m)) } } return(paste("mse=", mse)) }
and:
grad <- function(x, y, theta) { # note readability, redefined theta column vector gradient <- 1/m* t(x) %*% (x %*% theta - y) return(gradient) } grad.descent <- function(x, maxit, alpha){ theta <- matrix(rep(0, length=ncol(x)), ncol = 1) (i in 1:maxit) { theta <- theta - alpha * grad(x, y, theta) } return(theta) }
if explain why these 2 functions producing different results appreciate it. want make sure in fact implementing gradient descent correctly.
lastly, how can plot results of descent varying learning rates , superimpose data on results of regression itself?
edit here results of running 2 algorithms alpha = .005 , 10,000 iterations:
1)
> gradientdesc(trainingds, y, 0.005, 0.001, 32, 10000) text_show_backtrace environmental variable. [1] "optimal intercept: 2183458.95872599 optimal slope: 62417773.0184353"
2)
> print(grad.descent(x, 10000, .005)) [,1] x0 8.3681113 season 19.8399837 mnth -0.3515479 hr 8.0269388 holiday -16.2429750 weekday 1.9615369 workingday 7.6063719 weathersit -12.0611266 temp 157.5315413 atemp 138.8019732 hum -162.7948299 windspeed 31.5442471
to give example of how write functions in better way, consider following:
gradientdesc <- function(x, y, learn_rate, conv_threshold, max_iter) { n <- nrow(x) m <- runif(ncol(x), 0, 1) # m vector of dimension ncol(x), 1 yhat <- x %*% m # since x contains constant, no need add 1 mse <- sum((y - yhat) ^ 2) / n converged = f iterations = 0 while(converged == f) { m <- m - learn_rate * ( 1/n * t(x) %*% (yhat - y)) yhat <- x %*% m mse_new <- sum((y - yhat) ^ 2) / n if( abs(mse - mse_new) <= conv_threshold) { converged = t } iterations = iterations + 1 mse <- mse_new if(iterations >= max_iter) break } return(list(converged = converged, num_iterations = iterations, mse = mse_new, coefs = m) ) }
for comparison:
ols <- solve(t(x)%*%x)%*%t(x)%*%y
now,
out <- gradientdesc(x,y, 0.005, 1e-7, 200000) data.frame(ols, out$coefs) ols out.coefs x0 33.0663095 35.2995589 season 18.5603565 18.5779534 mnth -0.1441603 -0.1458521 hr 7.4374031 7.4420685 holiday -21.0608520 -21.3284449 weekday 1.5115838 1.4813259 workingday 5.9953383 5.9643950 weathersit -0.2990723 -0.4073493 temp 100.0719903 147.1157262 atemp 226.9828394 174.0260534 hum -225.7411524 -225.2686640 windspeed 12.3671942 9.5792498
here, x
refers x
defined in first code chunk. note similarity between coefficients. however, note
out$converged [1] false
so increase accuracy increasing number of iterations or playing around step size. might scale variables first.
Comments
Post a Comment