What is the difference between HiveServer1 and HiveServer2?
May 7, 2021How to change or reset consumer offset in Kafka?
May 12, 2021We get this question a lot so we thought we would write a small post to answer this question.
Spark leverages Hadoop’s InputFileFormat to read files and the same option that is available with Hadoop when reading files also applied in Spark.
Do you like us to send you a 47 page Definitive guide on Spark join algorithms? ===>
Solution
Here is how we read files from multiple directories and a file.
sc.textFile("/home/hirw/dir1,/home/hirw/dir2,/home/hirw/specific/file")
You can also use wildcards to match files and directories
sc.textFile("/home/hirw/dir1,/home/hirw/dir-10[0-5]*")