dplyr - making custom bins in dataframe inside function in R -

June 15, 2015

i create bins variable numbers per category name inside function. having difficulties related using category name provided parameter inside function.

in dplyr approach, each observation gets bin in new column. pass colgroup parameter inside mutate instead of writing name of column directly, , establish bin lower , upper limits each group (colgroup) in pseudocode. maybe data.table approach better.

set.seed(10) b<-(rnorm(10, sd=1,mean=10)) y<-runif(3) pr<-y/sum(y) names<-unlist(lapply(mapply(rep, letters[1:3], 1:3), function (x) paste0(x, collapse = "") ) ) x <- sample(names, 10, replace=true, prob=pr) df<-data.frame(name=x,numbers=b) df  #working without bin limits per category (not desired)  #and using "numbers" in cut (not desired) binfunction1 <- function(df, colgroup1, varcount,binsize) {   new<-df %>%     group_by_(colgroup1) %>%     mutate(bin = cut(numbers, breaks <- c(seq(7, 15, = binsize)), # limits colgroup not implemented                      labels = 1:(length(breaks)-1) ) )     return(new) } binfunction1(df,"name","numbers",0.5)      name   numbers    bin    <fctr>     <dbl> <fctr>  1     bb 10.018746      7  2       9.815747      6  3    ccc  8.628669      4  4    ccc  9.400832      5  5     bb 10.294545      7  6    ccc 10.389794      7  7       8.791924      4  8       9.636324      6  9       8.373327      3 10       9.743522      6  # pseudocode, use varcount instead of numbers in cut  # limits per category instead of 7 , 15     binfunction2 <- function(df, colgroup1, varcount,binsize) {       new<-df %>%         group_by_(colgroup1) %>%         mutate(bin = cut(varcount, breaks <- c(seq(min(varcount), max(varcount), = binsize)),                           labels = 1:(length(breaks)-1) ) )         return(new)     }

not elegant solution, outcome after? (i didn't quite understand question)

binfunction3 <- function(x, colgroup1, varcount, binsize) {   tmp <- split(x, x[[colgroup1]], drop = true)   tp <- lapply(tmp, function(k) {   breaks <- c(seq(min(k[[varcount]])*0.9, max(k[[varcount]])*1.1, = binsize))   cbind(k, data.frame(bin = cut(k[[varcount]], breaks, labels = 1:(length(breaks)-1)))) })   tp <- do.call(rbind, tp)  rownames(tp) <- gsub("[[:alpha:]]*\\.", "", rownames(tp))   return(tp[rownames(x),]) }  binfunction3(df,"name","numbers",0.5)     #    name   numbers bin    # 1     10.018746   5    # 2   ccc  9.815747   5    # 3   ccc  8.628669   2    # 4    bb  9.400832   2    # 5     10.294545   6    # 6    bb 10.389794   4    # 7      8.791924   3    # 8   ccc  9.636324   4    # 9      8.373327   2    # 10     9.743522   5

Search This Blog

Single

dplyr - making custom bins in dataframe inside function in R -

Comments

Post a Comment

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -