How to improve speed on reading binary file in R -

May 15, 2012

i have read in quite big binary files r (to process, transform, convert other formats). approach, in general, works okay but, unfortunately, script runs forever (>12h first part of data less 2% of data).

i assume problem not genuinely due size of data (at least not explanation) due inefficient code. looking way speed runtime , grateful help!

my approach based on tutorial: https://stats.idre.ucla.edu/r/faq/how-can-i-read-binary-data-into-r/

in code below include 2 variables, instead of thousands. in total, data ~100gb, said above processing first part (<2%) takes >12h.

the data separated smaller files process separately (for every part 1 script , 1 dataset).

my code:

newdata = file(paste0(getwd(), "/file.dat"), "rb")  # here, first 2 variables dataset <- data.table(id = integer(),                   v1 = integer())  # 327639 number of cases (data on people) for(i in 1:327639) {   bla <- readbin(con = newdata, integer(), size = 2, n=2000, endian = "big")   id <-   v1 = bla[1]    dataset <- rbind(dataset, list(id, v1)) } save(dataset, file = paste0(getwd(), "/output/", "part_a.rdata")) close(newdata)

thanks help!

maybe wrong, if please excuse noise.
idea in comment implemented follows. (untested, no data file.)

#numbytes <- file.size(newdata) numbytes <- 327639l  dataset <- data.table(id = integer(numbytes),                   v1 = integer(numbytes))  chunk <- 2^15 passes <- numbytes %/% chunk remainder <- numbytes %% chunk  <- 1l for(j in seq_len(passes)){     bla <- readbin(con = newdata, integer(), n = chunk, size = 2, endian = "big")     dataset$id[i:(i + chunk - 1l)] <- i:(i + chunk - 1l)     dataset$vl[i:(i + chunk - 1l)] <- bla     <- + chunk } bla <- readbin(con = newdata, integer(), n = remainder, size = 2, endian = "big") dataset$id[i:numbytes] <- i:numbytes dataset$vl[i:numbytes] <- bla

Search This Blog

Single

How to improve speed on reading binary file in R -

Comments

Post a Comment

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -