sql - Scala Spark aggregating values from multiple columns into new column -
i working on sql migration cassandra , having issues combining multiple columns single column. example, have person can have 1 or many races. have 0 or 1 each individual race. how able combine single column set?
sql schema +----------------------------------+ - r_white r_af_american r_asian - - 1 1 1 - - 0 1 1 - - 1 0 1 - - 0 1 0 - +----------------------------------+ cassandra temp desired schema +-------------------------------------------------------------------+ - r_white r_af_american r_asian race - - 1 1 1 ["white","af american", "asian"]- - 0 1 1 ["af american", "asian"] - - 1 0 1 ["white", "asian"] - - 0 1 0 ["af american"] - +-------------------------------------------------------------------+ since reading in data csv, plan read in tables spark sql are, perform combine(into race column), delete of columns other new race column.
i have been chaining .withcolumn() other data manipulation on data doesn't seem fit when(col("r_white") === 1, somehow_add_to_set).otherwise(...) pattern. appreciated! thanks!
one way achieve create array using array function , when expression each value, , dropping nulls using udf:
import org.apache.spark.sql.functions._ import spark.implicits._ val dropnulls = udf { (a: seq[string]) => a.filter(_ != null) } val result = df.withcolumn("race", dropnulls(array( when($"r_white" === 1, "r_white"), when($"r_af_american" === 1, "r_af_american"), when($"r_asian" === 1, "r_asian") ))) you can rid of hard-coded values mapping dataframe's column names these when expressions:
val result = df.withcolumn("race", dropnulls(array( df.columns.map(name => when(col(name) === 1, name)): _* ))) result.show(false) // +-------+-------------+-------+---------------------------------+ // |r_white|r_af_american|r_asian|race | // +-------+-------------+-------+---------------------------------+ // |1 |1 |1 |[r_white, r_af_american, r_asian]| // |0 |1 |1 |[r_af_american, r_asian] | // |1 |0 |1 |[r_white, r_asian] | // |0 |1 |0 |[r_af_american] | // +-------+-------------+-------+---------------------------------+
Comments
Post a Comment