scikit learn - Missing data in Dataframe using Python -


[dataframe]

hi ,

attached data, can please me handle missing data in "outlet_size" column. can use complete data preparing datascience models.

thanks,

these 1 of major challenges of data mining problems (or machine learning). decide missing data based on pure experience. mustn't @ data science blackbox follows series of steps successful @ it!

some guidelines missing data.

a. if more 40% of data missing column, drop it! (again, 40% depends on type of problem you're working with! if data super crucial or trivial can ignore it).

b. check if there someway can impute missing data internet. you're looking @ item weight! if there anyway know product you're dealing instead of hashed coded item_identifier, can literally google , figure out.

c. missing data can classified 2 types:

mcar: missing @ random. desirable scenario in case of missing data.

mnar: missing not @ random. missing not @ random data more serious issue , in case might wise check data gathering process further , try understand why information missing. instance, if of people in survey did not answer question, why did that? question unclear? assuming data mcar, missing data can problem too. safe maximum threshold 5% of total large datasets. if missing data feature or sample more 5% should leave feature or sample out. therefore check features (columns) , samples (rows) more 5% of data missing using simple function

d. posted in comments, can drop rows using df.dropna() or fill them infinity, or fill them mean using df["value"] = df.groupby("name").transform(lambda x: x.fillna(x.mean())) groups column value dataframe df category name, finds mean in each category , fills missing value in value corresponding mean of category!

e. apart either dropping missing values, replacing mean or median, there other advanced regression techniques can use has way predict missing values , fill it, e.g (mice: multivariate imputation chained equations), should browse , read more advanced imputation technique helpful.


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -