python - Removing rows with NaN in MultiIndex with duplicates -

September 15, 2013

updated dataframe repros exact issue

i have issue nan appearing in indexes leading non-unique rows (since nan !== nan). need drop rows nan occurs in index. previous question had example dataframe single nan row, original solution did not resolve issue did not meet poorly advertised requirement:

(note in actual data have thousands of such rows, including duplicate rows since nan !== nan permissible on index)

(from original post)

the issue

>>>import pandas pd >>>import numpy np >>> df = pd.dataframe([[1,1,"a"],[1,2,"b"],[1,3,"c"],[1,np.nan,"x"],[1,np.nan,"x"],[1,np.nan,"x"],[2,1,"d"],[2,2,"e"],[np.nan,1,"x"],[np.nan,2,"x"],[np.nan,1,"x"]], columns=["a","b","c"]) >>>df          c   b 1.0 1.0      2.0  b     3.0  c     nan  x     nan  x     nan  x 2.0 1.0  d     2.0  e nan 1.0  x     2.0  x     1.0  x

note duplicate rows: (1.0, nan) , (nan, 1.0)

failed solutions:

i've tried simple like:

>>>df = df[pandas.notnull(df.index)]

but fails because notnull not implemented multiindex.

also 1 of answers suggested:

>>>df = df.reindex(df.index.dropna())

however failed error:

exception: cannot handle non-unique multi-index!

desired output:

>>>df          c   b 1.0 1.0      2.0  b     3.0  c 2.0 1.0  d     2.0  e

(all nan index rows dropped, eliminating non-unique rows)

option 1
reset_index, dropna, , set_index once more.

c = df.index.names df = df.reset_index().dropna().set_index(c) df           c   b      1.0 1.0      2.0  b     3.0  c 2.0 1.0  d     2.0  e     2.0  x     1.0  x

if multiindex unique, can use...
option 2
df.index.dropna , df.reindex

df = df.reindex(df.index.dropna())

Search This Blog

Single