scala - build a 2D lookup table from Spark Dataframe -
i convert smaller dataframe become broadcast lookup table used inside udf of larger dataframe. smaller dataframe (mylookupdf) may below:
+---+---+---+---+ | x | 90|100|101| +---+---+---+---+ | 90| 1| 0| 0| |100| 0| 1| 1| |101| 0| 1| 1| +---+---+---+---+
i want use first column first key, x1, , first row second key. x1 , x2 have same elements. ideally, lookup table (mylookupmap) scala map (or similar) , work like:
mylookupmap(90)(90) returns 1 mylookupmap(90)(101) returns 0 mylookupmap(100)(90) returns 0 mylookupmap(101)(100) return 1 etc.
so far, manage have:
val mylookupmap = mylookupdf.collect().map(r => map(mylookupdf.columns.zip(r.toseq):_*)) mylookupmap: array[scala.collection.map[string,any]] = array(map(x -> 90, 90 -> 1, 100 -> 0, 101 -> 0), map(x -> 100, 90 -> 0, 100 -> 1, 101 -> 1), map(x -> 101, 90 -> 0, 100 -> 1, 101 -> 1))
which array of map , not required. suggestions appreciated.
collect()
create rdd
equivalent array
. have find ways collect arrays
maps
.
given dataframe
scala> mylookupdf.show(false) +---+---+---+---+ |x |90 |100|101| +---+---+---+---+ |90 |1 |0 |0 | |100|0 |1 |1 | |101|0 |1 |1 | +---+---+---+---+
all need header names other x
can below
scala> val header = mylookupdf.schema.fieldnames.tail header: array[string] = array(90, 100, 101)
i modifying map
functions map
result
scala> val mylookupmap = mylookupdf.rdd.map(r => { | val row = r.toseq | (row.head, map(header.zip(row.tail):_*)) | }).collectasmap() mylookupmap: scala.collection.map[any,scala.collection.immutable.map[string,any]] = map(101 -> map(90 -> 0, 100 -> 1, 101 -> 1), 100 -> map(90 -> 0, 100 -> 1, 101 -> 1), 90 -> map(90 -> 1, 100 -> 0, 101 -> 0))
you should see desired results.
scala> mylookupmap(90)(90.tostring) res1: = 1 scala> mylookupmap(90)(101.tostring) res2: = 0 scala> mylookupmap(100)(90.tostring) res3: = 0 scala> mylookupmap(101)(100.tostring) res4: = 1
now can pass mylookupmap
udf
function
Comments
Post a Comment