data.table - na.locf with seq in large column in R -


i'm working large data.table has groups based on 2 reference columns , has column distance defined first row in each group , jumps 2units each time.

making small reproducible example, have:

reference1 <- c("ref1", "ref1", "ref1", "ref2", "ref2", "ref2", "ref2", "ref3", "ref3", "ref3") reference2 <- c("fer1", "fer1", "fer1", "fer1", "fer1", "fer1", "fer1", "fer2", "fer2", "fer2") firstdist <- c(2, na, na, 5, na, na, na, 8, na, na)  df <- data.frame(ref1 = reference1,                   ref2 = reference2,                   dist = firstdist) 

which equates to

   ref1 ref2 dist 1  ref1 fer1    2 2  ref1 fer1   na 3  ref1 fer1   na 4  ref2 fer1    5 5  ref2 fer1   na 6  ref2 fer1   na 7  ref2 fer1   na 8  ref3 fer2    8 9  ref3 fer2   na 10 ref3 fer2   na 

i'd fill down column taking last observation , carrying forward +2, assume want use na.locf zoo package this. searching around haven't found way carry forward whilst adding constant integer.

an example of output i'd like:

   ref1 ref2 dist 1  ref1 fer1    2 2  ref1 fer1    4 3  ref1 fer1    6 4  ref2 fer1    5 5  ref2 fer1    7 6  ref2 fer1    9 7  ref2 fer1   11 8  ref3 fer2    8 9  ref3 fer2   10 10 ref3 fer2   12 

e.g. like

df$dist <- na.locf(df$dist, = 2) 

not 100% sure na.locf best way it, data.table solutions welcome, table have millions of rows, efficiency important

thank you,

i try following:

library(data.table) setdt(df)  df[, dist := seq(first(dist), = 2, length.out = .n), = .(ref1, ref2)]  # > df #     ref1 ref2 dist #  1: ref1 fer1    2 #  2: ref1 fer1    4 #  3: ref1 fer1    6 #  4: ref2 fer1    5 #  5: ref2 fer1    7 #  6: ref2 fer1    9 #  7: ref2 fer1   11 #  8: ref3 fer2    8 #  9: ref3 fer2   10 # 10: ref3 fer2   12 

here, .n number of rows in each group (grouped ref1 , ref2).


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -