scala - build a 2D lookup table from Spark Dataframe -


i convert smaller dataframe become broadcast lookup table used inside udf of larger dataframe. smaller dataframe (mylookupdf) may below:

+---+---+---+---+ | x | 90|100|101| +---+---+---+---+ | 90|  1|  0|  0| |100|  0|  1|  1| |101|  0|  1|  1| +---+---+---+---+ 

i want use first column first key, x1, , first row second key. x1 , x2 have same elements. ideally, lookup table (mylookupmap) scala map (or similar) , work like:

mylookupmap(90)(90) returns 1 mylookupmap(90)(101) returns 0 mylookupmap(100)(90) returns 0 mylookupmap(101)(100) return 1 etc. 

so far, manage have:

val mylookupmap = mylookupdf.collect().map(r => map(mylookupdf.columns.zip(r.toseq):_*)) mylookupmap: array[scala.collection.map[string,any]] = array(map(x -> 90, 90 -> 1, 100 -> 0, 101 -> 0), map(x -> 100, 90 -> 0, 100 -> 1, 101 -> 1), map(x -> 101, 90 -> 0, 100 -> 1, 101 -> 1)) 

which array of map , not required. suggestions appreciated.

collect() create rdd equivalent array. have find ways collect arrays maps.

given dataframe

scala> mylookupdf.show(false) +---+---+---+---+ |x  |90 |100|101| +---+---+---+---+ |90 |1  |0  |0  | |100|0  |1  |1  | |101|0  |1  |1  | +---+---+---+---+ 

all need header names other x can below

scala>     val header = mylookupdf.schema.fieldnames.tail header: array[string] = array(90, 100, 101) 

i modifying map functions map result

scala>     val mylookupmap = mylookupdf.rdd.map(r => {      |       val row = r.toseq      |       (row.head, map(header.zip(row.tail):_*))      |     }).collectasmap() mylookupmap: scala.collection.map[any,scala.collection.immutable.map[string,any]] = map(101 -> map(90 -> 0, 100 -> 1, 101 -> 1), 100 -> map(90 -> 0, 100 -> 1, 101 -> 1), 90 -> map(90 -> 1, 100 -> 0, 101 -> 0)) 

you should see desired results.

scala> mylookupmap(90)(90.tostring) res1: = 1  scala> mylookupmap(90)(101.tostring) res2: = 0  scala> mylookupmap(100)(90.tostring) res3: = 0  scala> mylookupmap(101)(100.tostring) res4: = 1 

now can pass mylookupmap udf function


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -