data.table - na.locf with seq in large column in R -
i'm working large data.table has groups based on 2 reference columns , has column distance defined first row in each group , jumps 2units each time.
making small reproducible example, have:
reference1 <- c("ref1", "ref1", "ref1", "ref2", "ref2", "ref2", "ref2", "ref3", "ref3", "ref3") reference2 <- c("fer1", "fer1", "fer1", "fer1", "fer1", "fer1", "fer1", "fer2", "fer2", "fer2") firstdist <- c(2, na, na, 5, na, na, na, 8, na, na) df <- data.frame(ref1 = reference1, ref2 = reference2, dist = firstdist)
which equates to
ref1 ref2 dist 1 ref1 fer1 2 2 ref1 fer1 na 3 ref1 fer1 na 4 ref2 fer1 5 5 ref2 fer1 na 6 ref2 fer1 na 7 ref2 fer1 na 8 ref3 fer2 8 9 ref3 fer2 na 10 ref3 fer2 na
i'd fill down column taking last observation , carrying forward +2, assume want use na.locf zoo package this. searching around haven't found way carry forward whilst adding constant integer.
an example of output i'd like:
ref1 ref2 dist 1 ref1 fer1 2 2 ref1 fer1 4 3 ref1 fer1 6 4 ref2 fer1 5 5 ref2 fer1 7 6 ref2 fer1 9 7 ref2 fer1 11 8 ref3 fer2 8 9 ref3 fer2 10 10 ref3 fer2 12
e.g. like
df$dist <- na.locf(df$dist, = 2)
not 100% sure na.locf best way it, data.table solutions welcome, table have millions of rows, efficiency important
thank you,
i try following:
library(data.table) setdt(df) df[, dist := seq(first(dist), = 2, length.out = .n), = .(ref1, ref2)] # > df # ref1 ref2 dist # 1: ref1 fer1 2 # 2: ref1 fer1 4 # 3: ref1 fer1 6 # 4: ref2 fer1 5 # 5: ref2 fer1 7 # 6: ref2 fer1 9 # 7: ref2 fer1 11 # 8: ref3 fer2 8 # 9: ref3 fer2 10 # 10: ref3 fer2 12
here, .n
number of rows in each group (grouped ref1
, ref2
).
Comments
Post a Comment