python - Pandas find year ago date from non-uniform list of dates -


i use more project. trying analyze 4.5 million rows of data. have read data dataframe, have organized data , have 3 columns: 1) date datetime 2) unique identifier 3) price

i need calculate year on year change in prices per item dates not uniform , not consistent per item. example:

date      item  price 12/31/15       110 12/31/15   b     120 12/31/14       100 6/24/13    b     100 

what find result is:

date      item  price  previousdate   % change 12/31/15       110   12/31/14       10% 12/31/15   b     120   6/24/13        20% 12/31/14       100 6/24/13    b     100 

edit - better example of data

 date   item    price 6/1/2016      276.3457646 6/1/2016    b   5.044165645 4/27/2016   b   4.91300186 4/27/2016     276.4329163 4/20/2016     276.9991265 4/20/2016   b   4.801263717 4/13/2016     276.1950213 4/13/2016   b   5.582923328 4/6/2016    b   5.017863509 4/6/2016      276.218649 3/30/2016   b   4.64274783 3/30/2016     276.554653 3/23/2016   b   5.576438253 3/23/2016     276.3135836 3/16/2016   b   5.394435443 3/16/2016     276.4222986 3/9/2016      276.8929462 3/9/2016    b   4.999951262 3/2/2016    b   4.731349423 3/2/2016      276.3972068 1/27/2016     276.8458971 1/27/2016   b   4.993033132 1/20/2016   b   5.250379701 1/20/2016     276.2899864 1/13/2016   b   5.146639666 1/13/2016     276.7041978 1/6/2016    b   5.328296958 1/6/2016      276.9465891 12/30/2015  b   5.312301356 12/30/2015    256.259668 12/23/2015  b   5.279105491 12/23/2015    255.8411198 12/16/2015  b   5.150798234 12/16/2015    255.8360529 12/9/2015     255.4915183 12/9/2015   b   4.722876886 12/2/2015     256.267146 12/2/2015   b   5.083626167 10/28/2015  b   4.876177757 10/28/2015    255.6464653 10/21/2015  b   4.551439655 10/21/2015    256.1735769 10/14/2015    255.9752668 10/14/2015  b   4.693967392 10/7/2015   b   4.911797443 10/7/2015     256.2556707 9/30/2015   b   4.262994526 9/30/2015     255.8068691 7/1/2015      255.7312385 4/22/2015     234.6210132 4/15/2015     235.3902076 4/15/2015   b   4.154926102 4/1/2015      234.4713827 2/25/2015     235.1391496 2/18/2015     235.1223471 

what have done (with other users) hasn't worked below. guys can provide or pointing me in right direction!

import pandas pd import datetime dt import numpy np  df = pd.read_csv('...python test file5.csv',parse_dates =['as of date'])  df = df[['item','price','as of date']]  def get_prev_year_price(x, df):     try:         return df.loc[x['prev_year_date'], 'price']         #return np.abs(df.time - x)     except exception e:         return x['price']  #function determine closest date given date , list of dates def nearest(items, pivot):     return min(items, key=lambda x: abs(x - pivot))  df['as of date'] = pd.to_datetime(df['as of date'],format='%m/%d/%y') df = df.rename(columns = {df.columns[2]:'date'})  # list of dates dtlst = [item item in df['date']]  data = [] data2 = [] item in df['item'].unique():     item_df = df[df['item'] == item] #select based on items     select_dates = item_df['date'].unique()     item_df.set_index('date', inplace=true) #set date key index      item_df = item_df.resample('d').mean().reset_index() #fill in missing date     item_df['price'] = item_df['price'].interpolate('nearest') #fill in price nearest price available     # use max(item_df['date'] item_df['date'] < item_df['date'] - pd.dateoffset(years=1, days=1))         #possible_date = item_df['date'] - pd.dateoffset(years=1)         #item_df['prev_year_date'] = max(df[df['date'] <= possible_date])      item_df['prev_year_date'] = item_df['date'] - pd.dateoffset(years=1) #calculate 1 year ago date     date_df = item_df[item_df.date.isin(select_dates)] #select dates useful data     item_df.set_index('date', inplace=true)      date_df['prev_year_price'] = date_df.apply(lambda x: get_prev_year_price(x, item_df),axis=1)     #date_df['prev_year_price'] = date_df.apply(lambda x: nearest(dtlst, x),axis=1)      date_df['change'] = date_df['price'] / date_df['prev_year_price']-1     date_df['item'] = item     data.append(date_df)     data2.append(item_df) summary = pd.concat(data).sort_values('date', ascending=false) #print (summary)  #saving output of csv file see how data looks after being handled  filename = '...python_test_file_save4.csv' summary.to_csv(filename, index=true, encoding='utf-8') 

this situation merge_asof, merges 2 dataframes finding last row of right dataframe less key left dataframe. need add year right dataframe first, since requirement 1 year or more difference between dates.

here sample data brought in comment.

date      item  price 12/31/15       110 12/31/15   b     120 12/31/14       100 6/24/13    b     100 12/31/15   c     100 1/31/15    c      80 11/14/14   c     130 11/19/13   c     110 11/14/13   c     200 

the dates need sorted merge_asof work. merge_asof drops joining column need put copy of in our right dataframe.

setup dataframes

df = df.sort_values('date') df_copy = df.copy() df_copy['previousdate'] = df_copy['date'] df_copy['date'] += pd.dateoffset(years=1) 

use merge_asof

df_final = pd.merge_asof(df, df_copy,                          on='date',                          by='item',                          suffixes=['current', 'previous']) df_final['% change'] = (df_final['pricecurrent'] - df_final['priceprevious']) / df_final['priceprevious'] df_final          date item  pricecurrent  priceprevious previousdate  % change 0 2013-06-24    b           100            nan          nat       nan 1 2013-11-14    c           200            nan          nat       nan 2 2013-11-19    c           110            nan          nat       nan 3 2014-11-14    c           130          200.0   2013-11-14 -0.350000 4 2014-12-31              100            nan          nat       nan 5 2015-01-31    c            80          110.0   2013-11-19 -0.272727 6 2015-12-31              110          100.0   2014-12-31  0.100000 7 2015-12-31    b           120          100.0   2013-06-24  0.200000 8 2015-12-31    c           100          130.0   2014-11-14 -0.230769 

Comments

Popular posts from this blog

neo4j - finding mutual friends in a cypher statement starting with three or more persons -

php - How to remove letter in front of the word laravel -

minify - Minimizing css files -