How to show full column content in a Spark DataFrame?

Try Hadoop In 5 Minutes

September 20, 2020

How to convert RDD to DataFrame in spark?

December 9, 2020

Published by Big Data In Real World at December 7, 2020

Tags

Most often when we are trying to work with data in Spark we might want to preview the data or the solution in Spark shell right on screen. When you do so, by default, Spark will only show part of the output when the data in column is long.

Here is an example of truncated output. It is quite easy to fix this.

scala> results.show(); 
+--------------------+ 
| col	             | 
+--------------------+ 
|2019-11-20 08:30:...| 
|2019-11-20 08:15:...| 
|2019-11-21 07:15:...| 
|2019-11-22 09:35:...|

Do you like us to send you a 47 page Definitive guide on Spark join algorithms? ===>

Solution

show() function takes in 2 parameters – number of rows and true/false whether to truncate output or not. By default the truncate is set to true to truncate the results. Set it false to not truncate the result.

Quick fix

scala> results.show(200, false);

Above will not truncate the results.

Better Solution

If you are trying to print out a lot of columns, let’s say 100, it would be hard to see the output on screen. There is even a better solution in our opinion. Simply write the data to a file in JSON format. JSON gives structure to your data and you can quickly format data using an online JSON formatter like jsonlint. We find this simple technique quite handy. Below will write the file to your local file system which you can then review.

results.write.json("file:///yourPath")

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to show full column content in a Spark DataFrame?

Try Hadoop In 5 Minutes

How to convert RDD to DataFrame in spark?

Try Hadoop In 5 Minutes

How to convert RDD to DataFrame in spark?

Solution

Quick fix

Better Solution

Big Data In Real World

Related posts

How to kill a running Spark application?

What is the default number of executors in Spark?

What is the default number of cores and amount of memory allocated to an application in Spark?