python - How to count the number of the same instances in a column of a data frame in a rolling window -
i trying count number of same id inside of each sliding window data:
id date 2017-05-17 15:49:51 s_2 2017-05-17 15:49:52 s_5 2017-05-17 15:49:55 s_2 2017-05-17 15:49:56 s_3 2017-05-17 15:49:58 s_5 2017-05-17 15:49:59 s_5
i trying count number of same id inside rolling window of size 3 overlap each other. answer should this:
date id s_2_count s_3_count s_5_count 2017-05-17 15:49:51 s_2 2 0 1 2017-05-17 15:49:52 s_5 1 1 1 2017-05-17 15:49:55 s_2 1 1 1 2017-05-17 15:49:56 s_3 0 1 2 2017-05-17 15:49:58 s_5 nan nan nan 2017-05-17 15:49:59 s_5 nan nan nan
use str.get_dummies
, rolling
, sum
, shift
, , add_prefix
:
df.id.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count')
output:
s_2_count s_3_count s_5_count date 2017-05-17 15:49:51 2.0 0.0 1.0 2017-05-17 15:49:52 1.0 1.0 1.0 2017-05-17 15:49:55 1.0 1.0 1.0 2017-05-17 15:49:56 0.0 1.0 2.0 2017-05-17 15:49:58 nan nan nan 2017-05-17 15:49:59 nan nan nan
let's assign dataframe:
df.assign(**df.id.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count'))
or using join
df.join(df.id.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count'))
output:
id s_2_count s_3_count s_5_count date 2017-05-17 15:49:51 s_2 2.0 0.0 1.0 2017-05-17 15:49:52 s_5 1.0 1.0 1.0 2017-05-17 15:49:55 s_2 1.0 1.0 1.0 2017-05-17 15:49:56 s_3 0.0 1.0 2.0 2017-05-17 15:49:58 s_5 nan nan nan 2017-05-17 15:49:59 s_5 nan nan nan
option 2 using pd.crosstab
df.assign(**pd.crosstab(df.index,df.id).rolling(3).sum().shift(-2))
or use join
df.join(pd.crosstab(df.index,df.id).rolling(3).sum().shift(-2))
Comments
Post a Comment