How to read multiple files into a single RDD or DataFrame in Spark?

What is the difference between HiveServer1 and HiveServer2?

May 7, 2021

How to change or reset consumer offset in Kafka?

May 12, 2021

Published by Big Data In Real World at May 10, 2021

Solution

Here is how we read files from multiple directories and a file.

sc.textFile("/home/hirw/dir1,/home/hirw/dir2,/home/hirw/specific/file")

You can also use wildcards to match files and directories

sc.textFile("/home/hirw/dir1,/home/hirw/dir-10[0-5]*")

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to read multiple files into a single RDD or DataFrame in Spark?

What is the difference between HiveServer1 and HiveServer2?

How to change or reset consumer offset in Kafka?

What is the difference between HiveServer1 and HiveServer2?

How to change or reset consumer offset in Kafka?

Solution

Big Data In Real World

Related posts

How to kill a running Spark application?

What is the default number of executors in Spark?

What is the default number of cores and amount of memory allocated to an application in Spark?