python - How to perform linear correlation on a data set and return the column name which has the most correlation? -

June 15, 2010

i working on data set has closing prices of stock.

'goog' : [         742.66, 738.40, 738.22, 741.16,         739.98, 747.28, 746.22, 741.80,         745.33, 741.29, 742.83, 750.50     ],     'fb' : [         108.40, 107.92, 109.64, 112.22,         109.57, 113.82, 114.03, 112.24,         114.68, 112.92, 113.28, 115.40     ],     'msft' : [         55.40, 54.63, 54.98, 55.88,         54.12, 59.16, 58.14, 55.97,         61.20, 57.14, 56.62, 59.25     ],     'aapl' : [         106.00, 104.66, 104.87, 105.69,         104.22, 110.16, 109.84, 108.86,         110.14, 107.66, 108.08, 109.90     ]

these closing prices period of last 12 days. need determine pair of stocks given companies had highly correlated percentage change of daily closing prices , return them array.

import pandas pd import numpy np  class stockprices:     # param prices dict of string list. dictionary containing tickers of stocks, , each tickers daily prices.     # returns list of strings. list containing tickers of 2 correlated stocks.     @staticmethod     def most_corr(prices):         return    #for example, parameters below function should return ['fb', 'msft']. prices = {     'goog' : [         742.66, 738.40, 738.22, 741.16,         739.98, 747.28, 746.22, 741.80,         745.33, 741.29, 742.83, 750.50     ],     'fb' : [         108.40, 107.92, 109.64, 112.22,         109.57, 113.82, 114.03, 112.24,         114.68, 112.92, 113.28, 115.40     ],     'msft' : [         55.40, 54.63, 54.98, 55.88,         54.12, 59.16, 58.14, 55.97,         61.20, 57.14, 56.62, 59.25     ],     'aapl' : [         106.00, 104.66, 104.87, 105.69,         104.22, 110.16, 109.84, 108.86,         110.14, 107.66, 108.08, 109.90     ] }  print(stockprices.most_corr(prices))

i have gone through numpy correlation function how use exact functionality determine of following 2 vectors have maximum correlation ?

you use pandas corr function converting dictionary dataframe. function returns correlation matrix numeric columns in dataframe.

import pandas pd  prices = {     'goog' : [         742.66, 738.40, 738.22, 741.16,         739.98, 747.28, 746.22, 741.80,         745.33, 741.29, 742.83, 750.50     ],     'fb' : [         108.40, 107.92, 109.64, 112.22,         109.57, 113.82, 114.03, 112.24,         114.68, 112.92, 113.28, 115.40     ],     'msft' : [         55.40, 54.63, 54.98, 55.88,         54.12, 59.16, 58.14, 55.97,         61.20, 57.14, 56.62, 59.25     ],     'aapl' : [         106.00, 104.66, 104.87, 105.69,         104.22, 110.16, 109.84, 108.86,         110.14, 107.66, 108.08, 109.90     ] }  df = pd.dataframe.from_dict(prices) print(df.corr())

out:

          aapl        fb      goog      msft aapl  1.000000  0.886750  0.853015  0.894846 fb    0.886750  1.000000  0.799421  0.858784 goog  0.853015  0.799421  1.000000  0.820544 msft  0.894846  0.858784  0.820544  1.000000

the pearson correlation calculated default (which standard), if need method, kendall , spearman available.

Search This Blog

Single

python - How to perform linear correlation on a data set and return the column name which has the most correlation? -

Comments

Post a Comment

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -