sql - Scala/Apache Spark Converting DataFrame column values and type, multiple when otherwise -


i have primary sql table reading spark , modifying write cassandradb. have working implementation converting gender 0, 1, 2, 3 (integers) "male", "female", "trans", etc (strings). though below method work, seems inefficient make seperate array mappings dataframe, join main table/dataframe, remove, rename, etc.

i have seen:

.withcolumn("gender", when(col("gender) === 1, "male").otherwise("female")  

that allow me continue method chaining on primary table have not been able working more 2 options. there way this? have around 10 different columns on table each need own custom conversion created. since code processing tbs of data, there less repetitive , more efficient way accomplish this. in advance!

case class gender(tmpid: int, tmpgender: string)  private def creategenderdf(spark:sparksession): dataframe = {   import spark.implicits._   seq(     gender(1, "male"),     gender(2, "female"),     gender(777, "prefer not answer")   ).todf }   private def createpersonsdf(spark: sparksession): dataframe = {   val genderdf = creategenderdf(spark)   genderdf.show()    val personsdf: dataframe = spark.read     .format("csv")     .option("header", "true")     .option("inferschema", "true")     .option("delimiter", "\t")     .load(datapath + "people.csv")     .withcolumnrenamed("id", "id")     .withcolumnrenamed("name_first", "firstname")    val personsdf1: dataframe = personsdf     .join(genderdf, personsdf("gender") === genderdf("tmpid"), "leftouter")    val personsdf2: dataframe = personsdf1     .drop("gender")     .drop("tmpid")     .withcolumnrenamed("tmpgender", "gender") } 

you can use nested when function eliminate need of creating genderdf, join, drop, rename etc. example can following

import org.apache.spark.sql.functions._ import org.apache.spark.sql.types.stringtype personsdf.withcolumn("gender", when(col("gender") === 1, "male").otherwise(when(col("gender") ===2, "female").otherwise("prefer not answer")).cast(stringtype)) 

you can add more when function in above nested structure , can repeate same other 10 columns well.


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -