scala - How to create sha1 hashing for the entire row in a RDD/Dataframe -


i have dataframe 1 schema. existing dataframe having 50 columns .now want add new column in existing dataframe. new column name "hashing_id" , logic hashing_id sha1(row). how achive this?

i tried below code . these below 2 methods inside trait used main class . trait extends serializable

 def addhashingkey():dataframe={ val sha1 = java.security.messagedigest.getinstance("sha-1") val encoder = new sun.misc.base64encoder() //encoder.encode(sha1.digest(row.mkstring.getbytes)) createdataframe(df.map(row => {         row.fromseq(row.toseq ++ encoder.encode(sha1.digest(row.mkstring.getbytes))) }), df.schema.add("hashing_id", stringtype))   }   def createdataframe(rdd: rdd[row], schema: structtype): dataframe = { sqlcontext.createdataframe(rdd, schema) } 

how achieve sha1 using rdd ?

could me on

when run code , throws below exception

 17/09/12 13:45:20 error yarn.applicationmaster: user class threw exception: org.apache.spark.sparkexception: task not serializable  org.apache.spark.sparkexception: task not serializable   caused by: java.io.notserializableexception: sun.misc.base64encoder  serialization stack:  - object not serializable (class: sun.misc.base64encoder, value:   sun.misc.base64encoder@46c0813) 

can't try this, seems working me in few test i've run:

 val newdf = sqlcontext.createdataframe( rdd.map(x => row(x.toseq ++ seq(x.toseq.hashcode()): _*)), structtype(schema.iterator.toseq ++ seq(structfield("hashing_id", stringtype, true)))) 

obviously need replace hashcode hash function need

edit: use sha1 function

define function in class

object encoder {   def sha1(s: row): string = messagedigest.getinstance("sha-1").digest(s.mkstring.getbytes()).tostring } 

then in original class can call function follows

   val newdf = sqlcontext.createdataframe(wordsrdd.map(x => row(x.toseq ++ seq(encoder.sha1(x)): _*)), structtype(schema.iterator.toseq ++ seq(structfield("hashing_id", stringtype, true)))).rdd.collect() 

Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -