What is the difference between map and mapValues functions in Spark?

What is an alias and how to create an alias in Elasticsearch?

April 10, 2023

How to view a message in Kafka?

April 17, 2023

Published by Big Data In Real World at April 13, 2023

map()

Both map and mapValues are transformation functions

With map(), we will have access to both the key and value (x._1 and x._2) so we can transform both key and value if we choose to. (for eg. we can change the key, day to all uppercase if we have to)

Returns Array[(String, Double)]

val rdd = sc.parallelize(Seq(("Sunday", 50), ("Monday", 60), ("Tuesday", 65), ("Wednesday", 70), ("Thursday", 85), ("Friday", 25), ("Saturday", 15)))

rdd.map { x =>
  val ctemp = (x._2 - 32)*.55
  (x._1, ctemp)
}.collect

res1: Array[(String, Double)] = Array((Sunday,9.9), (Monday,15.400000000000002), (Tuesday,18.150000000000002), (Wednesday,20.900000000000002), (Thursday,29.150000000000002), (Friday,-3.8500000000000005), (Saturday,-9.350000000000001))

mapValues()

Both map and mapValues are transformation functions

With mapValues(), unlike map(), we will not have access to the key. We will only have access to value. Which means we can only transform value and not key.

Just like map(), returns Array[(String, Double)]

mapValues() differ from map() when we use custom partitioners. If we applied any custom partitioning to our RDD (e.g. using partitionBy), using map would “forget” that partitioner (the result will revert to default partitioning) as the keys might have changed; mapValues, however, preserves any partitioner set on the RDD because the keys don’t change with mapValues as it doesn’t have access to the keys in the first place.

rdd.mapValues { x =>
  (x - 32)*.55
}.collect

res4: Array[(String, Double)] = Array((Sunday,9.9), (Monday,15.400000000000002), (Tuesday,18.150000000000002), (Wednesday,20.900000000000002), (Thursday,29.150000000000002), (Friday,-3.8500000000000005), (Saturday,-9.350000000000001))

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

What is the difference between map and mapValues functions in Spark?

What is an alias and how to create an alias in Elasticsearch?

How to view a message in Kafka?

What is an alias and how to create an alias in Elasticsearch?

How to view a message in Kafka?

map()

mapValues()

Big Data In Real World

Related posts

How to kill a running Spark application?

What is the default number of executors in Spark?

What is the default number of cores and amount of memory allocated to an application in Spark?