python - Removing rows with NaN in MultiIndex with duplicates -
updated dataframe repros exact issue
i have issue nan
appearing in indexes leading non-unique rows (since nan !== nan
). need drop rows nan
occurs in index. previous question had example dataframe single nan
row, original solution did not resolve issue did not meet poorly advertised requirement:
(note in actual data have thousands of such rows, including duplicate rows since
nan !== nan
permissible on index)
(from original post)
the issue
>>>import pandas pd >>>import numpy np >>> df = pd.dataframe([[1,1,"a"],[1,2,"b"],[1,3,"c"],[1,np.nan,"x"],[1,np.nan,"x"],[1,np.nan,"x"],[2,1,"d"],[2,2,"e"],[np.nan,1,"x"],[np.nan,2,"x"],[np.nan,1,"x"]], columns=["a","b","c"]) >>>df c b 1.0 1.0 2.0 b 3.0 c nan x nan x nan x 2.0 1.0 d 2.0 e nan 1.0 x 2.0 x 1.0 x
note duplicate rows: (1.0, nan)
, (nan, 1.0)
failed solutions:
i've tried simple like:
>>>df = df[pandas.notnull(df.index)]
but fails because notnull
not implemented multiindex.
also 1 of answers suggested:
>>>df = df.reindex(df.index.dropna())
however failed error:
exception: cannot handle non-unique multi-index!
desired output:
>>>df c b 1.0 1.0 2.0 b 3.0 c 2.0 1.0 d 2.0 e
(all nan
index rows dropped, eliminating non-unique rows)
option 1
reset_index
, dropna
, , set_index
once more.
c = df.index.names df = df.reset_index().dropna().set_index(c) df c b 1.0 1.0 2.0 b 3.0 c 2.0 1.0 d 2.0 e 2.0 x 1.0 x
if multiindex
unique, can use...
option 2
df.index.dropna
, df.reindex
df = df.reindex(df.index.dropna())
Comments
Post a Comment