Perform conditional filtering based on grouped column value in Pandas Python -
i want perform filtering on sales column such make-auction group there should atleast 1 sales >= 100. acura, copart has 101 sales both rows acura expected in output. bmw, sales <100 both copart , iaa, filtered out.
dataframe:
make auction sales acura copart 101 acura iaa 88 bmw copart 50 bmw iaa 60 buick copart 130 buick iaa 140
expected output:
make auction sales acura copart 101 acura iaa 88 buick copart 130 buick iaa 140
i able apply filter of >100 on entire sales column, not wanted. suggestion on how perform this? thanks!
use filtration:
df = df.groupby('make').filter(lambda x: x['sales'].ge(100).any()) print (df) make auction sales 0 acura copart 101 1 acura iaa 88 4 buick copart 130 5 buick iaa 140
another solution loc
, boolean indexing
make
values filtered isin
:
print (df.loc[df['sales'] >= 100, 'make']) 0 acura 4 buick 5 buick name: make, dtype: object print (df['make'].isin(df.loc[df['sales'] >= 100, 'make'])) 0 true 1 true 2 false 3 false 4 true 5 true name: make, dtype: bool df = df[df['make'].isin(df.loc[df['sales'] >= 100, 'make'])] print (df) make auction sales 0 acura copart 101 1 acura iaa 88 4 buick copart 130 5 buick iaa 140
second solution faster:
np.random.seed(123) n = 1000000 l = list('abcdefghijklmno') df = pd.dataframe({'make': np.random.choice(l, n), 'sales':np.random.randint(110, size=n)}) print (df) in [59]: %timeit df[df['make'].isin(df.loc[df['sales'] >= 100, 'make'])] 10 loops, best of 3: 55.6 ms per loop #alexander answer in [60]: %timeit df[df['make'].isin(df[df['sales'] >= 100]['make'].unique())] 10 loops, best of 3: 65 ms per loop in [61]: %timeit df.groupby('make').filter(lambda x: x['sales'].ge(100).any()) 1 loop, best of 3: 217 ms per loop #pirsquared solution 1 in [62]: %timeit df[df.sales.ge(100).groupby([df.make]).transform('any')] 1 loop, best of 3: 135 ms per loop #pirsquared solution 2 in [63]: %%timeit ...: f, u = pd.factorize(df.make.values) ...: w = df.sales.values >= 100 ...: df[(np.bincount(f, w) > 0)[f]] ...: 10 loops, best of 3: 67.2 ms per loop
Comments
Post a Comment