What is an alias and how to create an alias in Elasticsearch?
April 10, 2023How to view a message in Kafka?
April 17, 2023In this post we will look at the differences between map and mapValues functions and when it is appropriate to use either one.
We have a small made up dataset with Day and temperature in Fahrenheit. Let’s use both map() and mapValues() to convert them to Celsius.
map()
Both map and mapValues are transformation functions
With map(), we will have access to both the key and value (x._1 and x._2) so we can transform both key and value if we choose to. (for eg. we can change the key, day to all uppercase if we have to)
Returns Array[(String, Double)]
val rdd = sc.parallelize(Seq(("Sunday", 50), ("Monday", 60), ("Tuesday", 65), ("Wednesday", 70), ("Thursday", 85), ("Friday", 25), ("Saturday", 15))) rdd.map { x => val ctemp = (x._2 - 32)*.55 (x._1, ctemp) }.collect res1: Array[(String, Double)] = Array((Sunday,9.9), (Monday,15.400000000000002), (Tuesday,18.150000000000002), (Wednesday,20.900000000000002), (Thursday,29.150000000000002), (Friday,-3.8500000000000005), (Saturday,-9.350000000000001))
mapValues()
Both map and mapValues are transformation functions
With mapValues(), unlike map(), we will not have access to the key. We will only have access to value. Which means we can only transform value and not key.
Just like map(), returns Array[(String, Double)]
mapValues() differ from map() when we use custom partitioners. If we applied any custom partitioning to our RDD (e.g. using partitionBy), using map would “forget” that partitioner (the result will revert to default partitioning) as the keys might have changed; mapValues, however, preserves any partitioner set on the RDD because the keys don’t change with mapValues as it doesn’t have access to the keys in the first place.
rdd.mapValues { x => (x - 32)*.55 }.collect res4: Array[(String, Double)] = Array((Sunday,9.9), (Monday,15.400000000000002), (Tuesday,18.150000000000002), (Wednesday,20.900000000000002), (Thursday,29.150000000000002), (Friday,-3.8500000000000005), (Saturday,-9.350000000000001))