scala - How can I count the average from Spark RDD? -


i have problem spark scala want count average rdd data,i create new rdd this,

[(2,110),(2,130),(2,120),(3,200),(3,206),(3,206),(4,150),(4,160),(4,170)] 

i want count them this,

[(2,(110+130+120)/3),(3,(200+206+206)/3),(4,(150+160+170)/3)] 

then,get result this,

   [(2,120),(3,204),(4,160)] 

how can scala rdd? use spark version 1.6

you can use aggregatebykey.

val rdd = sc.parallelize(seq((2,110),(2,130),(2,120),(3,200),(3,206),(3,206),(4,150),(4,160),(4,170))) val agg_rdd = rdd.aggregatebykey((0,0))((acc, value) => (acc._1 + value, acc._2 + 1),(acc1, acc2) => (acc1._1 + acc2._1, acc1._2 + acc2._2)) val sum = agg_rdd.mapvalues(x => (x._1/x._2)) sum.collect 

Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -