pandas - Python Matrix - Limiting Matrix to top 20 -
i have matrix counts number of links between 2 sets of disciplines did through code df created:
new_df = df[['grantrefnumber','subject']] = ['psychology','education','social policy','sociology','pol. sci. & internat. studies','development studies','social anthropology','area studies','science , technology studies','law & legal studies','economics','management & business studies','human geography','environmental planning','demography','social work','tools, technologies & methods','linguistics','history'] final_df = new_df[new_df['subject'].isin(a)] ctrs = {location: counter(gp.grantrefnumber) location, gp in final_df.groupby('subject')} ctrs = list(ctrs.items()) overlaps = [(loc1, loc2, sum(min(ctr1[k], ctr2[k]) k in ctr1)) i, (loc1, ctr1) in enumerate(ctrs, start=1) (loc2, ctr2) in ctrs[i:] if loc1 != loc2] overlaps += [(l2, l1, c) l1, l2, c in overlaps] df2 = pd.dataframe(overlaps, columns=['loc1', 'loc2', 'count']) df2 = df2.set_index(['loc1', 'loc2']) df2 = df2.unstack().fillna(0).astype(int)
the matrix looks (it quite big took partial picture:
i turn matrix chord diagram later on in code, wanted way filter (or move new df) data show top 20 (or number can change variable later on) highest numbers in matrix, , put 0 else.
is there easy way of doing this?
you can use:
df = pd.dataframe({'b':[4,5,4,5,5,4], 'c':[7,8,9,4,2,3], 'd':[1,3,5,7,1,0], 'e':[5,3,6,9,2,4]}) print (df) b c d e 0 4 7 1 5 1 5 8 3 3 2 4 9 5 6 3 5 4 7 9 4 5 2 1 2 5 4 3 0 4
you can create top unique values first , dataframe.mask
isin
condition:
a = np.sort(np.unique(df.values.ravel()))[-3:] print (a) [7 8 9] df = df.where(df.isin(a), 0) print (df) b c d e 0 0 7 0 0 1 0 8 0 0 2 0 9 0 0 3 0 0 7 9 4 0 0 0 0 5 0 0 0 0
Comments
Post a Comment