python - pandas change dataframe while iterating the same -


i beginner pandas. use case is, have 2 dataframes 1 containing actual data(say df1) :

    teamid  yearid  w   1b par      2b par      3b par      hr par       bb par 1366    laa 1961    70  0.147748    0.035708    0.003604    0.030958    0.111548 1367    kc1 1961    61  0.164751    0.035982    0.007829    0.014993    0.096618 1377    nya 1962    96  0.167148    0.038536    0.004656    0.031952    0.093770 1379    laa 1962    86  0.159482    0.038027    0.005737    0.022455    0.098672 1381    cha 1962    85  0.165797    0.040756    0.009129    0.014998    0.101076 

i need mean center data per year. achieve, have created separate frame using below command have per year mean (say df2)

df2 = df1.groupby('yearid').mean() df2 = df1.reset_index() #not mandatory in case! df2.head()    yearid    w            1b par      2b par      3b par       hr par    bb par 0   1961    65.500000   0.156249    0.035845    0.005717    0.022975    0.104083 1   1962    78.454545   0.165632    0.035853    0.006777    0.023811    0.088590 2   1963    78.142857   0.162467    0.034020    0.006896    0.021254    0.080336 3   1964    81.727273   0.167251    0.036336    0.006748    0.021548    0.079152 4   1965    82.000000   0.160042    0.035539    0.006534    0.022693    0.085745 

now, mean center df1, running below loop:

for i, row in df1.iterrows():     year = df2[df2['yearid']==row[1]]     row = row-year     print(row) df1.head() 

interestingly, print(row) prints updated column values @ end,df1.head() prints original dataframe is. makes sense because when changing "row" variable, changing snapshot/instance , not actual dataframe's content.

expected output: per year mean of columns 1b par, 2b par....bb par should equal 0.

two questions : > how update dataframe(df1 in above case) well?  > there way subtract subset of columns , not of them? current code subtracting yearid we'd want center (1b par:bb par) columns 

thanks!


ps: modified loop , getting expected results :

for i, row in df1.iterrows():     year = df2[df2['yearid']==row[1]]     row = row-year     df1.set_value(i,'1b par', row['1b par'])     df1.set_value(i,'2b par', row['2b par'])     df1.set_value(i,'3b par', row['3b par'])     df1.set_value(i,'hr par', row['hr par'])     df1.set_value(i,'bb par', row['bb par']) df1.head()       teamid yearid     w     1b par      2b par     3b par     hr par    bb par 1366    laa 1961    70  -0.164751   -0.000137   -0.002113   0.007983    0.007465 1367    kc1 1961    61  -0.147748   0.000137    0.002113    -0.007983   -0.007465 1377    nya 1962    96  -0.164116   0.002683    -0.002121   0.008141    0.005180 

is there better way of achieving same result? believe not beautiful way of doing done!

different approach:

msuf = '_mean' dfm = pd.merge(df1,df2,on="yearid",suffixes=('',msuf)) column in ["1b par","2b par","3b par","hr par","bb par"]:     dfm[column] = dfm[column] - dfm[column+msuf]     dfm = dfm.drop(column+msuf,axis=1) 

first merge 2 dataframes on yearid, subtractions column-wise , drop mean-columns.


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -