r - Performing a for loop on a matrix instead of a data frame -
i performing rather complicated linear regression involves conditionally creating dummy variables in new columns loop. far i've been doing in couple of data frames, converting them matrices, converting sparse matrices, , joining; however, i've reached computer's limit. sorry if gets confusing - i've tried simplify process as can.
edit - added numeric examples original question.
here source data numeric values:
df <- data.frame(matrix(nrow = 9, ncol = 4)) df$x1 <- c(5, 1, 2, 0, 4, 8, 7, 6, 0) df$x2 <- c(10001, 10001, 10001, 10003, 10003, 10003, 10002, 10002, 10002) df$x3 <- c(10002, 10002, 10002, 10001, 10001, 10001, 10003, 10003, 10003) df$x4 <- c(10001, 10001, 10001, 10003, 10003, 10003, 10002, 10002, 10002) names(df) <- c("response", "group_1", "group_2", "exclude")
what looks like:
response group_1 group_2 exclude 1 5 10001 10002 10001 2 1 10001 10002 10001 3 2 10001 10002 10001 4 0 10003 10001 10003 5 4 10003 10001 10003 6 8 10003 10001 10003 7 7 10002 10003 10002 8 6 10002 10003 10002 9 0 10002 10003 10002
source data (please see above edit):
df <- data.frame(matrix(nrow = 9, ncol = 4)) df$x1 <- c(5, 1, 2, 0, 4, 8, 7, 6, 0) df$x2 <- c("blue", "blue", "blue", "yellow", "yellow", "yellow", "green", "green", "green") df$x3 <- c("green", "green", "green", "blue", "blue", "blue", "yellow", "yellow", "yellow") df$x4 <- c("blue", "blue", "blue", "yellow", "yellow", "yellow", "green", "green", "green") names(df) <- c("response", "group_1", "group_2", "exclude")
this simplified version of data looks like:
response group_1 group_2 exclude 1 5 blue green blue 2 1 blue green blue 3 2 blue green blue 4 0 yellow blue yellow 5 4 yellow blue yellow 6 8 yellow blue yellow 7 7 green yellow green 8 6 green yellow green 9 0 green yellow green
from above data, find unique variables in "group_1" , "group_2" using following function:
fun_names <- function(x) { row1 <- unique(x$group_1) row2 <- unique(x$group_2) mat <- data.frame(matrix(nrow = length(row1) + length(row2), ncol = 1)) mat[1] <- c(row1, row2) mat_unique <- data.frame(mat[!duplicated(mat[,1]), ]) names(mat_unique) <- c("id") return(mat_unique) } df_unique <- fun_names(df)
this returns following data frame:
id 1 blue 2 yellow 3 green
then each color ("id") create new column value of 1 if color in each row , color not match "exclude" column value. loop looks this:
for(name in df_unique$id) { df[paste(name)] <- ifelse(df$group_1 == name & df$exclude != name | df$group_2 == name & df$exclude != name, 1, 0) }
running loop returns final data.frame
looks this:
edit here numeric data final df:
response group_1 group_2 exclude 10001 10003 10002 1 5 10001 10002 10001 0 0 1 2 1 10001 10002 10001 0 0 1 3 2 10001 10002 10001 0 0 1 4 0 10003 10001 10003 1 0 0 5 4 10003 10001 10003 1 0 0 6 8 10003 10001 10003 1 0 0 7 7 10002 10003 10002 0 1 0 8 6 10002 10003 10002 0 1 0 9 0 10002 10003 10002 0 1 0
here original data:
response group_1 group_2 exclude blue yellow green 1 5 blue green blue 0 0 1 2 1 blue green blue 0 0 1 3 2 blue green blue 0 0 1 4 0 yellow blue yellow 1 0 0 5 4 yellow blue yellow 1 0 0 6 8 yellow blue yellow 1 0 0 7 7 green yellow green 0 1 0 8 6 green yellow green 0 1 0 9 0 green yellow green 0 1 0
so, question: how perform loop if original data matrix (instead of data frame)? since loop modifying data frame, need convert data frame matrix in order convert sparse matrix - data.frame
data.matrix
conversion intensive machine.
i have converted in code until above for
loop matrix notation, can't figure out how print new columns in manner while modifying matrix in r (instead of data frame). basically, i'm hoping me modify for
loop work on matrix. 1 have suggestions?
edit forgot mention source data needs retain it's grouping - group_by(response, group_1, group_2, exclude)
. also, df
object needs start matrix remove data.frame
data.matrix
conversion.
edit2 did not mention this, data indexed , converted numeric value before run entire process. df
object in example numbers.
use sparse matrix dummy encoding:
m <- as.matrix(df) groups <- unique(as.vector(m[, grep("group", colnames(m))])) tmp <- lapply(groups, function(x, m) which((m[, "group_1"] == x | m[, "group_2"] == x) & m[, "exclude"] != x), m = m) j = rep(seq_along(tmp), lengths(tmp)) = unlist(tmp) library(matrix) dummies <- sparsematrix(i, j, dims = c(nrow(m), length(groups))) colnames(dummies) <- groups m <- matrix(as.matrix(df)) cbind(m, dummies) #9 x 7 matrix of class "dgematrix" # response group_1 group_2 exclude 10001 10003 10002 #[1,] 5 10001 10002 10001 0 0 1 #[2,] 1 10001 10002 10001 0 0 1 #[3,] 2 10001 10002 10001 0 0 1 #[4,] 0 10003 10001 10003 1 0 0 #[5,] 4 10003 10001 10003 1 0 0 #[6,] 8 10003 10001 10003 1 0 0 #[7,] 7 10002 10003 10002 0 1 0 #[8,] 6 10002 10003 10002 0 1 0 #[9,] 0 10002 10003 10002 0 1 0
Comments
Post a Comment