python - pandas change dataframe while iterating the same -
i beginner pandas. use case is, have 2 dataframes 1 containing actual data(say df1) :
teamid yearid w 1b par 2b par 3b par hr par bb par 1366 laa 1961 70 0.147748 0.035708 0.003604 0.030958 0.111548 1367 kc1 1961 61 0.164751 0.035982 0.007829 0.014993 0.096618 1377 nya 1962 96 0.167148 0.038536 0.004656 0.031952 0.093770 1379 laa 1962 86 0.159482 0.038027 0.005737 0.022455 0.098672 1381 cha 1962 85 0.165797 0.040756 0.009129 0.014998 0.101076
i need mean center data per year. achieve, have created separate frame using below command have per year mean (say df2)
df2 = df1.groupby('yearid').mean() df2 = df1.reset_index() #not mandatory in case! df2.head() yearid w 1b par 2b par 3b par hr par bb par 0 1961 65.500000 0.156249 0.035845 0.005717 0.022975 0.104083 1 1962 78.454545 0.165632 0.035853 0.006777 0.023811 0.088590 2 1963 78.142857 0.162467 0.034020 0.006896 0.021254 0.080336 3 1964 81.727273 0.167251 0.036336 0.006748 0.021548 0.079152 4 1965 82.000000 0.160042 0.035539 0.006534 0.022693 0.085745
now, mean center df1, running below loop:
for i, row in df1.iterrows(): year = df2[df2['yearid']==row[1]] row = row-year print(row) df1.head()
interestingly, print(row) prints updated column values @ end,df1.head() prints original dataframe is. makes sense because when changing "row" variable, changing snapshot/instance , not actual dataframe's content.
expected output: per year mean of columns 1b par, 2b par....bb par should equal 0.
two questions : > how update dataframe(df1 in above case) well? > there way subtract subset of columns , not of them? current code subtracting yearid we'd want center (1b par:bb par) columns
thanks!
ps: modified loop , getting expected results :
for i, row in df1.iterrows(): year = df2[df2['yearid']==row[1]] row = row-year df1.set_value(i,'1b par', row['1b par']) df1.set_value(i,'2b par', row['2b par']) df1.set_value(i,'3b par', row['3b par']) df1.set_value(i,'hr par', row['hr par']) df1.set_value(i,'bb par', row['bb par']) df1.head() teamid yearid w 1b par 2b par 3b par hr par bb par 1366 laa 1961 70 -0.164751 -0.000137 -0.002113 0.007983 0.007465 1367 kc1 1961 61 -0.147748 0.000137 0.002113 -0.007983 -0.007465 1377 nya 1962 96 -0.164116 0.002683 -0.002121 0.008141 0.005180
is there better way of achieving same result? believe not beautiful way of doing done!
different approach:
msuf = '_mean' dfm = pd.merge(df1,df2,on="yearid",suffixes=('',msuf)) column in ["1b par","2b par","3b par","hr par","bb par"]: dfm[column] = dfm[column] - dfm[column+msuf] dfm = dfm.drop(column+msuf,axis=1)
first merge 2 dataframes on yearid
, subtractions column-wise , drop mean-columns.
Comments
Post a Comment