r - Performing a for loop on a matrix instead of a data frame -


i performing rather complicated linear regression involves conditionally creating dummy variables in new columns loop. far i've been doing in couple of data frames, converting them matrices, converting sparse matrices, , joining; however, i've reached computer's limit. sorry if gets confusing - i've tried simplify process as can.

edit - added numeric examples original question.

here source data numeric values:

df <- data.frame(matrix(nrow = 9, ncol = 4)) df$x1 <- c(5, 1, 2, 0, 4, 8, 7, 6, 0) df$x2 <- c(10001, 10001, 10001, 10003, 10003, 10003, 10002, 10002, 10002)  df$x3 <- c(10002, 10002, 10002, 10001, 10001, 10001, 10003, 10003, 10003)  df$x4 <- c(10001, 10001, 10001, 10003, 10003, 10003, 10002, 10002, 10002) names(df) <- c("response", "group_1", "group_2", "exclude") 

what looks like:

  response group_1 group_2 exclude 1        5   10001   10002   10001 2        1   10001   10002   10001 3        2   10001   10002   10001 4        0   10003   10001   10003 5        4   10003   10001   10003 6        8   10003   10001   10003 7        7   10002   10003   10002 8        6   10002   10003   10002 9        0   10002   10003   10002 

source data (please see above edit):

df <- data.frame(matrix(nrow = 9, ncol = 4)) df$x1 <- c(5, 1, 2, 0, 4, 8, 7, 6, 0) df$x2 <- c("blue", "blue", "blue", "yellow", "yellow", "yellow", "green", "green", "green")  df$x3 <- c("green", "green", "green", "blue", "blue", "blue", "yellow", "yellow", "yellow")  df$x4 <- c("blue", "blue", "blue", "yellow", "yellow", "yellow", "green", "green", "green") names(df) <- c("response", "group_1", "group_2", "exclude")  

this simplified version of data looks like:

  response group_1 group_2 exclude 1        5    blue   green    blue 2        1    blue   green    blue 3        2    blue   green    blue 4        0  yellow    blue  yellow 5        4  yellow    blue  yellow 6        8  yellow    blue  yellow 7        7   green  yellow   green 8        6   green  yellow   green 9        0   green  yellow   green 

from above data, find unique variables in "group_1" , "group_2" using following function:

fun_names <- function(x) {   row1 <- unique(x$group_1)   row2 <- unique(x$group_2)   mat <- data.frame(matrix(nrow = length(row1) + length(row2), ncol = 1))   mat[1] <- c(row1, row2)   mat_unique <- data.frame(mat[!duplicated(mat[,1]), ])   names(mat_unique) <- c("id")    return(mat_unique) } df_unique <- fun_names(df) 

this returns following data frame:

      id 1   blue 2 yellow 3  green 

then each color ("id") create new column value of 1 if color in each row , color not match "exclude" column value. loop looks this:

for(name in df_unique$id) {   df[paste(name)] <-      ifelse(df$group_1 == name & df$exclude != name |             df$group_2 == name & df$exclude != name, 1, 0) } 

running loop returns final data.frame looks this:

edit here numeric data final df:

  response group_1 group_2 exclude 10001 10003 10002 1        5   10001   10002   10001     0     0     1 2        1   10001   10002   10001     0     0     1 3        2   10001   10002   10001     0     0     1 4        0   10003   10001   10003     1     0     0 5        4   10003   10001   10003     1     0     0 6        8   10003   10001   10003     1     0     0 7        7   10002   10003   10002     0     1     0 8        6   10002   10003   10002     0     1     0 9        0   10002   10003   10002     0     1     0 

here original data:

  response group_1 group_2 exclude blue yellow green 1        5    blue   green    blue    0      0     1 2        1    blue   green    blue    0      0     1 3        2    blue   green    blue    0      0     1 4        0  yellow    blue  yellow    1      0     0 5        4  yellow    blue  yellow    1      0     0 6        8  yellow    blue  yellow    1      0     0 7        7   green  yellow   green    0      1     0 8        6   green  yellow   green    0      1     0 9        0   green  yellow   green    0      1     0 

so, question: how perform loop if original data matrix (instead of data frame)? since loop modifying data frame, need convert data frame matrix in order convert sparse matrix - data.frame data.matrix conversion intensive machine.

i have converted in code until above for loop matrix notation, can't figure out how print new columns in manner while modifying matrix in r (instead of data frame). basically, i'm hoping me modify for loop work on matrix. 1 have suggestions?

edit forgot mention source data needs retain it's grouping - group_by(response, group_1, group_2, exclude). also, df object needs start matrix remove data.frame data.matrix conversion.

edit2 did not mention this, data indexed , converted numeric value before run entire process. df object in example numbers.

use sparse matrix dummy encoding:

m <- as.matrix(df)  groups <- unique(as.vector(m[, grep("group", colnames(m))])) tmp <- lapply(groups, function(x, m)    which((m[, "group_1"] == x | m[, "group_2"] == x) & m[, "exclude"] != x),        m = m)  j = rep(seq_along(tmp), lengths(tmp)) = unlist(tmp)  library(matrix) dummies <- sparsematrix(i, j, dims = c(nrow(m), length(groups))) colnames(dummies) <- groups  m <- matrix(as.matrix(df)) cbind(m, dummies) #9 x 7 matrix of class "dgematrix" #     response group_1 group_2 exclude 10001 10003 10002 #[1,]        5   10001   10002   10001     0     0     1 #[2,]        1   10001   10002   10001     0     0     1 #[3,]        2   10001   10002   10001     0     0     1 #[4,]        0   10003   10001   10003     1     0     0 #[5,]        4   10003   10001   10003     1     0     0 #[6,]        8   10003   10001   10003     1     0     0 #[7,]        7   10002   10003   10002     0     1     0 #[8,]        6   10002   10003   10002     0     1     0 #[9,]        0   10002   10003   10002     0     1     0 

Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -