na - How to leave out rows with missing values when total no. of values crosses a given value in R -


i have dataset contains 45% of missing values:

i remove rows has na's values given period. example, if there rows continuously has missing values ,for hour or more 50 values missing continuously , want remove rows alone. , don't want leave rows missing values less 15 or 25.

in short, 1) don't want remove rows has got na value's. 2) want remove rows continuously has na values in column

example data: pic

discard columnwise contiguous nas

try this, uses rle(is.na...)) determine runs of nas. if > num_runs discarded (data @ bottom)

myfun <- function(x, num_runs) {               # x vector column of df               require(dplyr)               runs <- cumsum(rle(is.na(x))$lengths)               vals <- rle(is.na(x))$values               start <- dplyr::lag(runs)+1               start <- replace(start, is.na(start), 1)               m <- rbind(start[vals], runs[vals])               seqruns <- apply(m, 2, function(x) if ((x[2]-x[1]+1) > num_runs) { seq(x[1],x[2]) })               ans <- unlist(seqruns)               return(ans)          }  library(purrr) library(dplyr) num_runs <- 4 discard <- unlist(map(1:ncol(df), ~myfun(df[,.x, num_runs]))) df[-discard,] 

output

                   mpg cyl  disp  hp drat    wt  qsec vs gear carb mazda rx4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 mazda rx4 wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 hornet 4 drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0   na    1 hornet sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0   na    2 valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0   na    1 duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0   na    4 merc 240d         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2 merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2 merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4 merc 280c         17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4 fiat 128          32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1 honda civic       30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2 toyota corolla    33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 fiat x1-9         27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1 porsche 914-2     26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2 lotus europa      30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2 ford pantera l    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4 ferrari dino      19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6 maserati bora     15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8 volvo 142e        21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2 

discard rowwise contiguous nas

try this, uses rle(is.na...)) determine runs of nas. if any > num_runs discarded (data @ bottom)

library(purrr) num_runs <- 1    # number of contiguous nas keep <- map_lgl(1:nrow(df), ~!any(rle(is.na(unlist(df[.x,])))$lengths[rle(is.na(unlist(df[.x,])))$values] > num_runs)) df[keep,] 

output

                     mpg cyl  disp  hp drat    wt  qsec vs gear carb mazda rx4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 mazda rx4 wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 hornet 4 drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0   na    1 hornet sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0   na    2 valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0   na    1 duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0   na    4 merc 240d           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2 merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2 merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4 merc 280c           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4 cadillac fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0   na    4 lincoln continental 10.4   8 460.0 215   na 5.424 17.82  0  0   na    4 chrysler imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0   na    4 fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1 honda civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2 toyota corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 toyota corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0   na    1 dodge challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0   na    2 amc javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0   na    2 camaro z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0   na    4 pontiac firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0   na    2 fiat x1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1 porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2 lotus europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2 ford pantera l      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4 ferrari dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6 maserati bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8 volvo 142e          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2 

data

library(dplyr) df <- mtcars %>% replace(.==3, na) 

Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -