sql - Scala Spark aggregating values from multiple columns into new column -


i working on sql migration cassandra , having issues combining multiple columns single column. example, have person can have 1 or many races. have 0 or 1 each individual race. how able combine single column set?

             sql schema +----------------------------------+ - r_white r_af_american r_asian    - -   1          1           1       - -   0          1           1       - -   1          0           1       - -   0          1           0       - +----------------------------------+                     cassandra temp desired schema +-------------------------------------------------------------------+ - r_white r_af_american r_asian          race                       - -   1          1           1        ["white","af american", "asian"]- -   0          1           1        ["af american", "asian"]        - -   1          0           1        ["white", "asian"]              - -   0          1           0        ["af american"]                 - +-------------------------------------------------------------------+ 

since reading in data csv, plan read in tables spark sql are, perform combine(into race column), delete of columns other new race column.

i have been chaining .withcolumn() other data manipulation on data doesn't seem fit when(col("r_white") === 1, somehow_add_to_set).otherwise(...) pattern. appreciated! thanks!

one way achieve create array using array function , when expression each value, , dropping nulls using udf:

import org.apache.spark.sql.functions._ import spark.implicits._  val dropnulls = udf { (a: seq[string]) => a.filter(_ != null) }  val result = df.withcolumn("race", dropnulls(array(   when($"r_white"       === 1, "r_white"),   when($"r_af_american" === 1, "r_af_american"),   when($"r_asian"       === 1, "r_asian") ))) 

you can rid of hard-coded values mapping dataframe's column names these when expressions:

val result = df.withcolumn("race", dropnulls(array(   df.columns.map(name => when(col(name) === 1, name)): _* )))  result.show(false) // +-------+-------------+-------+---------------------------------+ // |r_white|r_af_american|r_asian|race                             | // +-------+-------------+-------+---------------------------------+ // |1      |1            |1      |[r_white, r_af_american, r_asian]| // |0      |1            |1      |[r_af_american, r_asian]         | // |1      |0            |1      |[r_white, r_asian]               | // |0      |1            |0      |[r_af_american]                  | // +-------+-------------+-------+---------------------------------+ 

Comments

Popular posts from this blog

neo4j - finding mutual friends in a cypher statement starting with three or more persons -

php - How to remove letter in front of the word laravel -

minify - Minimizing css files -