What is the difference between map and flatMap functions in Spark?

How to copy selective documents from one index to a new index in Elasticsearch?

March 2, 2022

Understanding stack function in Spark

March 16, 2022

Published by Big Data In Real World at March 9, 2022

map()

Let’s look at map() first. map() transforms and RDD with N elements to RDD with N elements. Important thing to note is each element is transformed into another element there by the resultant RDD will have the same elements as before.

split() function on this RDD, breaks the lines into words when it sees a space in between the words.

 rdd.map(_.split(" ")).collect
 res0: Array[Array[String]] = Array(Array(Hadoop, In, Real, World), Array(Big, Data))

The result RDD also has 2 elements but now the type of the element in an Array and not a String as the initial RDD. Each Array has a list of words from the initial String.

When we do a count, we will get 2. Which is same as the count of the initial RDD before the split() transformation.

 scala> rdd.map(_.split(" ")).count
 res5: Long = 2

flatMap()

flatMap() transforms an RDD with N elements to an RDD with potentially more than N elements. Let’s try the same split() operation with flatMap()

 rdd.flatMap(_.split(" ")).collect
 res1: Array[String] = Array(Hadoop, In, Real, World, Big, Data)

Here with flatMap() we can see that the result RDD is not Array[Array[String]] as with map() it is Array[String]. flatMap() took care of “flattening” the output to plain Strings from Array[String]

When you do a count() on the result RDD, we will get 6 which is more than the number of elements in the initial RDD.

 scala> rdd.flatMap(_.split(" ")).count
 res6: Long = 6

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

What is the difference between map and flatMap functions in Spark?

How to copy selective documents from one index to a new index in Elasticsearch?

Understanding stack function in Spark

How to copy selective documents from one index to a new index in Elasticsearch?

Understanding stack function in Spark

map()

flatMap()

Big Data In Real World

Related posts

How to kill a running Spark application?

What is the default number of executors in Spark?

What is the default number of cores and amount of memory allocated to an application in Spark?