parallel processing - R - parallelizing ldply and replicate functions -
i have tried quite time parallelize code, no avail. either errors or nothing works. have ideas?
cal_ops <- function(n, dtm, ratio = 0.1) { print(n) selvect <- sample(nrow(dtm), nrow(dtm) * ratio) holdout <- dtm[selvect,] training <- dtm[-selvect,] topmodel <- lda(training, n, control = list(estimate.alpha = false)) return(c(n, perplexity(topmodel, holdout), as.numeric(loglik(topmodel)))) } require(plyr) replication <- 1000 sequ <-seq(5,100,5) perplex <- ldply(sequ, function(x, dtm) { t(replicate(replication, cal_ops(x, dtm))) } , dtm = dtm_to_use) it takes long time run is. thank you, in advance.
i've tried using example parallel version of replicate - but, had many errors: https://stackoverflow.com/a/19281611/8598566
your example not reproducible, e.g. dtm_to_use not defined, making hard other "the-following-should-work" suggestion:
the plyr::ldply(x) function takes argument .parallel = true, process x in chunks distributed whatever number of workers have. uses foreach framework internally parallel processing. can use of "do"-packages. here example using future backends:
library("dofuture") registerdofuture() ## utilize cores available r session plan(multiprocess) replication <- 1000 sequ <-seq(from = 5, = 100, = 5) perplex <- plyr::ldply(sequ, function(x) { t(replicate(replication, c(a = x, b = sqrt(x)))) }, .parallel = true) str(perplex) 'data.frame': 20000 obs. of 2 variables: $ a: num 5 5 5 5 5 5 5 5 5 5 ... $ b: num 2.24 2.24 2.24 2.24 2.24 ... since mentioned hpc target: if have ad-hoc cluster without job scheduler can ssh each node, can use:
plan(cluster, workers = c("node1", "node2", "node2", "node3")) to run 1 core each on node1 , node3 , 2 cores on node2. if have real job scheduler, sge, can use:
library("future.batchtools") plan(batchtools_sge) and each element in sequ processed individual job on queue (which corresponds having infinite number of workers). if want chunk up, can limit number of workers (= jobs), e.g.
plan(batchtools_sge, workers = 200) you script identical regardless of backend used.
Comments
Post a Comment