python - Removing rows with NaN in MultiIndex with duplicates -
updated dataframe repros exact issue
i have issue nan appearing in indexes leading non-unique rows (since nan !== nan). need drop rows nan occurs in index. previous question had example dataframe single nan row, original solution did not resolve issue did not meet poorly advertised requirement:
(note in actual data have thousands of such rows, including duplicate rows since
nan !== nanpermissible on index)
(from original post)
the issue
>>>import pandas pd >>>import numpy np >>> df = pd.dataframe([[1,1,"a"],[1,2,"b"],[1,3,"c"],[1,np.nan,"x"],[1,np.nan,"x"],[1,np.nan,"x"],[2,1,"d"],[2,2,"e"],[np.nan,1,"x"],[np.nan,2,"x"],[np.nan,1,"x"]], columns=["a","b","c"]) >>>df c b 1.0 1.0 2.0 b 3.0 c nan x nan x nan x 2.0 1.0 d 2.0 e nan 1.0 x 2.0 x 1.0 x note duplicate rows: (1.0, nan) , (nan, 1.0)
failed solutions:
i've tried simple like:
>>>df = df[pandas.notnull(df.index)] but fails because notnull not implemented multiindex.
also 1 of answers suggested:
>>>df = df.reindex(df.index.dropna()) however failed error:
exception: cannot handle non-unique multi-index! desired output:
>>>df c b 1.0 1.0 2.0 b 3.0 c 2.0 1.0 d 2.0 e (all nan index rows dropped, eliminating non-unique rows)
option 1
reset_index, dropna, , set_index once more.
c = df.index.names df = df.reset_index().dropna().set_index(c) df c b 1.0 1.0 2.0 b 3.0 c 2.0 1.0 d 2.0 e 2.0 x 1.0 x if multiindex unique, can use...
option 2
df.index.dropna , df.reindex
df = df.reindex(df.index.dropna())
Comments
Post a Comment