Perform conditional filtering based on grouped column value in Pandas Python -


i want perform filtering on sales column such make-auction group there should atleast 1 sales >= 100. acura, copart has 101 sales both rows acura expected in output. bmw, sales <100 both copart , iaa, filtered out.

dataframe:

make    auction sales    acura   copart  101 acura   iaa     88   bmw     copart  50   bmw     iaa     60 buick   copart  130  buick   iaa     140 

expected output:

make    auction sales    acura   copart  101 acura   iaa         88   buick   copart  130  buick   iaa     140 

i able apply filter of >100 on entire sales column, not wanted. suggestion on how perform this? thanks!

use filtration:

df = df.groupby('make').filter(lambda x: x['sales'].ge(100).any()) print (df)     make auction  sales 0  acura  copart    101 1  acura     iaa     88 4  buick  copart    130 5  buick     iaa    140 

another solution loc , boolean indexing make values filtered isin:

print (df.loc[df['sales'] >= 100, 'make']) 0    acura 4    buick 5    buick name: make, dtype: object  print (df['make'].isin(df.loc[df['sales'] >= 100, 'make'])) 0     true 1     true 2    false 3    false 4     true 5     true name: make, dtype: bool  df = df[df['make'].isin(df.loc[df['sales'] >= 100, 'make'])] print (df)     make auction  sales 0  acura  copart    101 1  acura     iaa     88 4  buick  copart    130 5  buick     iaa    140 

second solution faster:

np.random.seed(123) n = 1000000 l = list('abcdefghijklmno') df = pd.dataframe({'make': np.random.choice(l, n),                    'sales':np.random.randint(110, size=n)}) print (df)  in [59]: %timeit df[df['make'].isin(df.loc[df['sales'] >= 100, 'make'])] 10 loops, best of 3: 55.6 ms per loop  #alexander answer in [60]: %timeit df[df['make'].isin(df[df['sales'] >= 100]['make'].unique())] 10 loops, best of 3: 65 ms per loop  in [61]: %timeit df.groupby('make').filter(lambda x: x['sales'].ge(100).any()) 1 loop, best of 3: 217 ms per loop  #pirsquared solution 1 in [62]: %timeit df[df.sales.ge(100).groupby([df.make]).transform('any')] 1 loop, best of 3: 135 ms per loop  #pirsquared solution 2 in [63]: %%timeit     ...: f, u = pd.factorize(df.make.values)     ...: w = df.sales.values >= 100     ...: df[(np.bincount(f, w) > 0)[f]]     ...:  10 loops, best of 3: 67.2 ms per loop 

Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

Add a dynamic header in angular 2 http provider -

minify - Minimizing css files -