python - How to count the number of the same instances in a column of a data frame in a rolling window -


i trying count number of same id inside of each sliding window data:

                           id   date             2017-05-17 15:49:51         s_2    2017-05-17 15:49:52         s_5    2017-05-17 15:49:55         s_2    2017-05-17 15:49:56         s_3    2017-05-17 15:49:58         s_5 2017-05-17 15:49:59         s_5 

i trying count number of same id inside rolling window of size 3 overlap each other. answer should this:

date                    id      s_2_count    s_3_count   s_5_count        2017-05-17 15:49:51     s_2         2            0         1  2017-05-17 15:49:52     s_5         1            1         1    2017-05-17 15:49:55     s_2         1            1         1    2017-05-17 15:49:56     s_3         0            1         2    2017-05-17 15:49:58     s_5         nan          nan       nan 2017-05-17 15:49:59     s_5         nan          nan       nan 

use str.get_dummies, rolling, sum, shift, , add_prefix:

df.id.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count') 

output:

                     s_2_count  s_3_count  s_5_count date                                                 2017-05-17 15:49:51        2.0        0.0        1.0 2017-05-17 15:49:52        1.0        1.0        1.0 2017-05-17 15:49:55        1.0        1.0        1.0 2017-05-17 15:49:56        0.0        1.0        2.0 2017-05-17 15:49:58        nan        nan        nan 2017-05-17 15:49:59        nan        nan        nan 

let's assign dataframe:

df.assign(**df.id.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count')) 

or using join

df.join(df.id.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count')) 

output:

                      id  s_2_count  s_3_count  s_5_count date                                                      2017-05-17 15:49:51  s_2        2.0        0.0        1.0 2017-05-17 15:49:52  s_5        1.0        1.0        1.0 2017-05-17 15:49:55  s_2        1.0        1.0        1.0 2017-05-17 15:49:56  s_3        0.0        1.0        2.0 2017-05-17 15:49:58  s_5        nan        nan        nan 2017-05-17 15:49:59  s_5        nan        nan        nan 

option 2 using pd.crosstab

df.assign(**pd.crosstab(df.index,df.id).rolling(3).sum().shift(-2)) 

or use join

df.join(pd.crosstab(df.index,df.id).rolling(3).sum().shift(-2)) 

Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -