python - How to perform linear correlation on a data set and return the column name which has the most correlation? -
i working on data set has closing prices of stock.
'goog' : [ 742.66, 738.40, 738.22, 741.16, 739.98, 747.28, 746.22, 741.80, 745.33, 741.29, 742.83, 750.50 ], 'fb' : [ 108.40, 107.92, 109.64, 112.22, 109.57, 113.82, 114.03, 112.24, 114.68, 112.92, 113.28, 115.40 ], 'msft' : [ 55.40, 54.63, 54.98, 55.88, 54.12, 59.16, 58.14, 55.97, 61.20, 57.14, 56.62, 59.25 ], 'aapl' : [ 106.00, 104.66, 104.87, 105.69, 104.22, 110.16, 109.84, 108.86, 110.14, 107.66, 108.08, 109.90 ]
these closing prices period of last 12 days. need determine pair of stocks given companies had highly correlated percentage change of daily closing prices , return them array.
import pandas pd import numpy np class stockprices: # param prices dict of string list. dictionary containing tickers of stocks, , each tickers daily prices. # returns list of strings. list containing tickers of 2 correlated stocks. @staticmethod def most_corr(prices): return #for example, parameters below function should return ['fb', 'msft']. prices = { 'goog' : [ 742.66, 738.40, 738.22, 741.16, 739.98, 747.28, 746.22, 741.80, 745.33, 741.29, 742.83, 750.50 ], 'fb' : [ 108.40, 107.92, 109.64, 112.22, 109.57, 113.82, 114.03, 112.24, 114.68, 112.92, 113.28, 115.40 ], 'msft' : [ 55.40, 54.63, 54.98, 55.88, 54.12, 59.16, 58.14, 55.97, 61.20, 57.14, 56.62, 59.25 ], 'aapl' : [ 106.00, 104.66, 104.87, 105.69, 104.22, 110.16, 109.84, 108.86, 110.14, 107.66, 108.08, 109.90 ] } print(stockprices.most_corr(prices))
i have gone through numpy correlation function how use exact functionality determine of following 2 vectors have maximum correlation ?
you use pandas corr function converting dictionary dataframe. function returns correlation matrix numeric columns in dataframe.
import pandas pd prices = { 'goog' : [ 742.66, 738.40, 738.22, 741.16, 739.98, 747.28, 746.22, 741.80, 745.33, 741.29, 742.83, 750.50 ], 'fb' : [ 108.40, 107.92, 109.64, 112.22, 109.57, 113.82, 114.03, 112.24, 114.68, 112.92, 113.28, 115.40 ], 'msft' : [ 55.40, 54.63, 54.98, 55.88, 54.12, 59.16, 58.14, 55.97, 61.20, 57.14, 56.62, 59.25 ], 'aapl' : [ 106.00, 104.66, 104.87, 105.69, 104.22, 110.16, 109.84, 108.86, 110.14, 107.66, 108.08, 109.90 ] } df = pd.dataframe.from_dict(prices) print(df.corr())
out:
aapl fb goog msft aapl 1.000000 0.886750 0.853015 0.894846 fb 0.886750 1.000000 0.799421 0.858784 goog 0.853015 0.799421 1.000000 0.820544 msft 0.894846 0.858784 0.820544 1.000000
the pearson correlation calculated default (which standard), if need method, kendall , spearman available.
Comments
Post a Comment