python - Pandas find year ago date from non-uniform list of dates -
i use more project. trying analyze 4.5 million rows of data. have read data dataframe, have organized data , have 3 columns: 1) date datetime 2) unique identifier 3) price
i need calculate year on year change in prices per item dates not uniform , not consistent per item. example:
date item price 12/31/15 110 12/31/15 b 120 12/31/14 100 6/24/13 b 100 what find result is:
date item price previousdate % change 12/31/15 110 12/31/14 10% 12/31/15 b 120 6/24/13 20% 12/31/14 100 6/24/13 b 100 edit - better example of data
date item price 6/1/2016 276.3457646 6/1/2016 b 5.044165645 4/27/2016 b 4.91300186 4/27/2016 276.4329163 4/20/2016 276.9991265 4/20/2016 b 4.801263717 4/13/2016 276.1950213 4/13/2016 b 5.582923328 4/6/2016 b 5.017863509 4/6/2016 276.218649 3/30/2016 b 4.64274783 3/30/2016 276.554653 3/23/2016 b 5.576438253 3/23/2016 276.3135836 3/16/2016 b 5.394435443 3/16/2016 276.4222986 3/9/2016 276.8929462 3/9/2016 b 4.999951262 3/2/2016 b 4.731349423 3/2/2016 276.3972068 1/27/2016 276.8458971 1/27/2016 b 4.993033132 1/20/2016 b 5.250379701 1/20/2016 276.2899864 1/13/2016 b 5.146639666 1/13/2016 276.7041978 1/6/2016 b 5.328296958 1/6/2016 276.9465891 12/30/2015 b 5.312301356 12/30/2015 256.259668 12/23/2015 b 5.279105491 12/23/2015 255.8411198 12/16/2015 b 5.150798234 12/16/2015 255.8360529 12/9/2015 255.4915183 12/9/2015 b 4.722876886 12/2/2015 256.267146 12/2/2015 b 5.083626167 10/28/2015 b 4.876177757 10/28/2015 255.6464653 10/21/2015 b 4.551439655 10/21/2015 256.1735769 10/14/2015 255.9752668 10/14/2015 b 4.693967392 10/7/2015 b 4.911797443 10/7/2015 256.2556707 9/30/2015 b 4.262994526 9/30/2015 255.8068691 7/1/2015 255.7312385 4/22/2015 234.6210132 4/15/2015 235.3902076 4/15/2015 b 4.154926102 4/1/2015 234.4713827 2/25/2015 235.1391496 2/18/2015 235.1223471 what have done (with other users) hasn't worked below. guys can provide or pointing me in right direction!
import pandas pd import datetime dt import numpy np df = pd.read_csv('...python test file5.csv',parse_dates =['as of date']) df = df[['item','price','as of date']] def get_prev_year_price(x, df): try: return df.loc[x['prev_year_date'], 'price'] #return np.abs(df.time - x) except exception e: return x['price'] #function determine closest date given date , list of dates def nearest(items, pivot): return min(items, key=lambda x: abs(x - pivot)) df['as of date'] = pd.to_datetime(df['as of date'],format='%m/%d/%y') df = df.rename(columns = {df.columns[2]:'date'}) # list of dates dtlst = [item item in df['date']] data = [] data2 = [] item in df['item'].unique(): item_df = df[df['item'] == item] #select based on items select_dates = item_df['date'].unique() item_df.set_index('date', inplace=true) #set date key index item_df = item_df.resample('d').mean().reset_index() #fill in missing date item_df['price'] = item_df['price'].interpolate('nearest') #fill in price nearest price available # use max(item_df['date'] item_df['date'] < item_df['date'] - pd.dateoffset(years=1, days=1)) #possible_date = item_df['date'] - pd.dateoffset(years=1) #item_df['prev_year_date'] = max(df[df['date'] <= possible_date]) item_df['prev_year_date'] = item_df['date'] - pd.dateoffset(years=1) #calculate 1 year ago date date_df = item_df[item_df.date.isin(select_dates)] #select dates useful data item_df.set_index('date', inplace=true) date_df['prev_year_price'] = date_df.apply(lambda x: get_prev_year_price(x, item_df),axis=1) #date_df['prev_year_price'] = date_df.apply(lambda x: nearest(dtlst, x),axis=1) date_df['change'] = date_df['price'] / date_df['prev_year_price']-1 date_df['item'] = item data.append(date_df) data2.append(item_df) summary = pd.concat(data).sort_values('date', ascending=false) #print (summary) #saving output of csv file see how data looks after being handled filename = '...python_test_file_save4.csv' summary.to_csv(filename, index=true, encoding='utf-8')
this situation merge_asof, merges 2 dataframes finding last row of right dataframe less key left dataframe. need add year right dataframe first, since requirement 1 year or more difference between dates.
here sample data brought in comment.
date item price 12/31/15 110 12/31/15 b 120 12/31/14 100 6/24/13 b 100 12/31/15 c 100 1/31/15 c 80 11/14/14 c 130 11/19/13 c 110 11/14/13 c 200 the dates need sorted merge_asof work. merge_asof drops joining column need put copy of in our right dataframe.
setup dataframes
df = df.sort_values('date') df_copy = df.copy() df_copy['previousdate'] = df_copy['date'] df_copy['date'] += pd.dateoffset(years=1) use merge_asof
df_final = pd.merge_asof(df, df_copy, on='date', by='item', suffixes=['current', 'previous']) df_final['% change'] = (df_final['pricecurrent'] - df_final['priceprevious']) / df_final['priceprevious'] df_final date item pricecurrent priceprevious previousdate % change 0 2013-06-24 b 100 nan nat nan 1 2013-11-14 c 200 nan nat nan 2 2013-11-19 c 110 nan nat nan 3 2014-11-14 c 130 200.0 2013-11-14 -0.350000 4 2014-12-31 100 nan nat nan 5 2015-01-31 c 80 110.0 2013-11-19 -0.272727 6 2015-12-31 110 100.0 2014-12-31 0.100000 7 2015-12-31 b 120 100.0 2013-06-24 0.200000 8 2015-12-31 c 100 130.0 2014-11-14 -0.230769
Comments
Post a Comment