dplyr - making custom bins in dataframe inside function in R -
i create bins variable numbers
per category name
inside function. having difficulties related using category name provided parameter inside function.
in dplyr approach, each observation gets bin in new column. pass colgroup parameter inside mutate instead of writing name of column directly, , establish bin lower , upper limits each group (colgroup) in pseudocode. maybe data.table approach better.
set.seed(10) b<-(rnorm(10, sd=1,mean=10)) y<-runif(3) pr<-y/sum(y) names<-unlist(lapply(mapply(rep, letters[1:3], 1:3), function (x) paste0(x, collapse = "") ) ) x <- sample(names, 10, replace=true, prob=pr) df<-data.frame(name=x,numbers=b) df #working without bin limits per category (not desired) #and using "numbers" in cut (not desired) binfunction1 <- function(df, colgroup1, varcount,binsize) { new<-df %>% group_by_(colgroup1) %>% mutate(bin = cut(numbers, breaks <- c(seq(7, 15, = binsize)), # limits colgroup not implemented labels = 1:(length(breaks)-1) ) ) return(new) } binfunction1(df,"name","numbers",0.5) name numbers bin <fctr> <dbl> <fctr> 1 bb 10.018746 7 2 9.815747 6 3 ccc 8.628669 4 4 ccc 9.400832 5 5 bb 10.294545 7 6 ccc 10.389794 7 7 8.791924 4 8 9.636324 6 9 8.373327 3 10 9.743522 6 # pseudocode, use varcount instead of numbers in cut # limits per category instead of 7 , 15 binfunction2 <- function(df, colgroup1, varcount,binsize) { new<-df %>% group_by_(colgroup1) %>% mutate(bin = cut(varcount, breaks <- c(seq(min(varcount), max(varcount), = binsize)), labels = 1:(length(breaks)-1) ) ) return(new) }
not elegant solution, outcome after? (i didn't quite understand question)
binfunction3 <- function(x, colgroup1, varcount, binsize) { tmp <- split(x, x[[colgroup1]], drop = true) tp <- lapply(tmp, function(k) { breaks <- c(seq(min(k[[varcount]])*0.9, max(k[[varcount]])*1.1, = binsize)) cbind(k, data.frame(bin = cut(k[[varcount]], breaks, labels = 1:(length(breaks)-1)))) }) tp <- do.call(rbind, tp) rownames(tp) <- gsub("[[:alpha:]]*\\.", "", rownames(tp)) return(tp[rownames(x),]) } binfunction3(df,"name","numbers",0.5) # name numbers bin # 1 10.018746 5 # 2 ccc 9.815747 5 # 3 ccc 8.628669 2 # 4 bb 9.400832 2 # 5 10.294545 6 # 6 bb 10.389794 4 # 7 8.791924 3 # 8 ccc 9.636324 4 # 9 8.373327 2 # 10 9.743522 5
Comments
Post a Comment