scala - Spark Dataframe Group by having New Indicator Column -


i need group "key" column , need check whether "type_code" column has both "pl" , "jl" values , if need add indicator column "y" else "n"

example :

    //input values     val values = list(list("66","pl") ,     list("67","jl") , list("67","pl"),list("67","po"),     list("68","jl"),list("68","po")).map(x =>(x(0), x(1)))      import spark.implicits._     //created dataframe     val cmc = values.todf("key","type_code")      cmc.show(false)     ------------------------     key |type_code  |     ------------------------     66  |pl |     67  |jl |     67  |pl |     67  |po |     68  |jl |     68  |po |     ------------------------- 

expected output :

for each "key", if has "type_code" has both pl & jl y else n

    -----------------------------------------------------     key |type_code  | indicator     -----------------------------------------------------     66  |pl         | n     67  |jl         | y     67  |pl         | y     67  |po         | y     68  |jl         | n     68  |po         | n     --------------------------------------------------- 

for example, 67 has both pl & jl - "y" 66 has pl - "n" 68 has jl - "n"

one option:

1) collect type_code list;

2) check if contains specific strings;

3) flatten list explode:

(cmc.groupby("key")     .agg(collect_list("type_code").as("type_code"))     .withcolumn("indicator",          when(array_contains($"type_code", "pl") && array_contains($"type_code", "jl"), "y").otherwise("n"))     .withcolumn("type_code", explode($"type_code"))).show +---+---------+---------+ |key|type_code|indicator| +---+---------+---------+ | 68|       jl|        n| | 68|       po|        n|     | 67|       jl|        y| | 67|       pl|        y| | 67|       po|        y| | 66|       pl|        n| +---+---------+---------+ 

Comments

Popular posts from this blog

neo4j - finding mutual friends in a cypher statement starting with three or more persons -

php - How to remove letter in front of the word laravel -

minify - Minimizing css files -