becustom
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114wordpress-seo
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114In this post we will look at the differences between map and mapValues functions and when it is appropriate to use either one.<\/p>\n\n\n\n
We have a small made up dataset with Day and temperature in Fahrenheit. Let\u2019s use both map() and mapValues() to convert them to Celsius.<\/p>\n\n\n\n
<\/p>\n\n\n\n
Both map and mapValues are transformation functions<\/p>\n\n\n\n
With map(), we will have access to both the key and value (x._1 and x._2) so we can transform both key and value if we choose to. (for eg. we can change the key, day to all uppercase if we have to)<\/p>\n\n\n\n
Returns Array[(String, Double)]<\/p>\n\n\n\n
val rdd = sc.parallelize(Seq((\"Sunday\", 50), (\"Monday\", 60), (\"Tuesday\", 65), (\"Wednesday\", 70), (\"Thursday\", 85), (\"Friday\", 25), (\"Saturday\", 15)))\r\n\r\nrdd.map { x =>\r\n val ctemp = (x._2 - 32)*.55\r\n (x._1, ctemp)\r\n}.collect\r\n\r\nres1: Array[(String, Double)] = Array((Sunday,9.9), (Monday,15.400000000000002), (Tuesday,18.150000000000002), (Wednesday,20.900000000000002), (Thursday,29.150000000000002), (Friday,-3.8500000000000005), (Saturday,-9.350000000000001))\r\n<\/pre>\n\n\n\nmapValues()<\/h2>\n\n\n\n
<\/p>\n\n\n\n
Both map and mapValues are transformation functions<\/p>\n\n\n\n
With mapValues(), unlike map(), we will not have access to the key. We will only have access to value. Which means we can only transform value and not key.<\/p>\n\n\n\n
Just like map(), returns Array[(String, Double)]<\/p>\n\n\n\n
mapValues() differ from map() when we use custom partitioners. If we applied any custom partitioning to our RDD (e.g. using partitionBy), using map would “forget” that partitioner (the result will revert to default partitioning) as the keys might have changed; mapValues, however, preserves any partitioner set on the RDD because the keys don\u2019t change with mapValues as it doesn\u2019t have access to the keys in the first place.<\/p>\n\n\n\n
rdd.mapValues { x =>\r\n (x - 32)*.55\r\n}.collect\r\n\r\nres4: Array[(String, Double)] = Array((Sunday,9.9), (Monday,15.400000000000002), (Tuesday,18.150000000000002), (Wednesday,20.900000000000002), (Thursday,29.150000000000002), (Friday,-3.8500000000000005), (Saturday,-9.350000000000001))\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"In this post we will look at the differences between map and mapValues functions and when it is appropriate to use either one. We have a [\u2026]<\/span><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13],"tags":[],"class_list":["post-2240","post","type-post","status-publish","format-standard","hentry","category-spark"],"yoast_head":"\n
What is the difference between map and mapValues functions in Spark? - Big Data In Real World<\/title>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\t\n\t\n\t\n