How to convert RDD to DataFrame in spark? - Big Data In Real World

How to convert RDD to DataFrame in spark?

How to show full column content in a Spark DataFrame?
December 7, 2020
What is the difference between hivevar and hiveconf?
December 11, 2020
How to show full column content in a Spark DataFrame?
December 7, 2020
What is the difference between hivevar and hiveconf?
December 11, 2020

You have RDD in your code and now you want to work the data using DataFrames in Spark. Spark provides you with functions to convert RDD to DataFrames and it is quite simple.

Do you like us to send you a 47 page Definitive guide on Spark join algorithms? ===>

Solution

You can use the create DataFrame function which takes in RDD and returns you a DataFrame.

Assume this is the data in you your RDD

 

+------+----+----+ 
| blue| 20.0| 60.0| 
|green| 30.5| 20.0| 
|  red| 70.0| 50.9| 
+------+----+----+

Without column names

You can see that the output doesn’t have meaningful column names.

 

val df = spark.createDataFrame(rdd)

df.show()

+------+----+----+ 
|  _1	| _2  | _3 |
+------+----+----+ 
| blue| 20.0| 60.0| 
|green| 30.5| 20.0| 
|  red| 70.0| 50.9| 
+------+----+----+

With column names

With below you specify the columns but still Spark infers the schema – data types of your columns.

val df1 = spark.createDataFrame(rdd).toDF("id", "val1", “val2”)

df1.show()

+------+----+----+ 
|  id | val1| val2|
+------+----+----+ 
| blue| 20.0| 60.0| 
|green| 30.5| 20.0| 
|  red| 70.0| 50.9| 
+------+----+----+

 

With proper schema

val schema = new StructType()
.add(StructField("id", StringType, true))
.add(StructField("val1", DoubleType, true))
.add(StructField("val2", DoubleType, true))

val df2 = spark.createDataFrame(rdd, schema)

df2.show()

+------+----+----+ 
|  id | val1| val2|
+------+----+----+ 
| blue| 20.0| 60.0| 
|green| 30.5| 20.0| 
|  red| 70.0| 50.9| 
+------+----+----+

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to convert RDD to DataFrame in spark?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X