Writing A File To HDFS – Java Program
August 23, 2015HDFS Federation
August 30, 2015Reading A File From HDFS – Java Program
In this last post we saw how to write a file to HDFS by writing our own Java program. In this post we will see how to read a file from HDFS by writing a Java program.
Here is the program – FileReadFromHDFS.java
public class FileReadFromHDFS { public static void main(String[] args) throws Exception { //File to read in HDFS String uri = args[0]; Configuration conf = new Configuration(); //Get the filesystem - HDFS FileSystem fs = FileSystem.get(URI.create(uri), conf); FSDataInputStream in = null; try { //Open the path mentioned in HDFS in = fs.open(new Path(uri)); IOUtils.copyBytes(in, System.out, 4096, false); System.out.println("End Of file: HDFS file read complete"); } finally { IOUtils.closeStream(in); } } }
This program will take in an argument which is nothing but the fully qualified HDFS path to a file which we would read and display the contents of the file on the screen. This program will simulate the hadoop fs -cat command.
//File to read in HDFS String uri = args[0];
We need to know is few key information about the cluster, like the name node details etc. The details are already specified in the configuration files during cluster setup.
Configuration conf = new Configuration();
The easiest way to get the configuration of the cluster is by instantiating the Configuration object and this will read the configuration files from the classpath and read and load all the information that is needed by the program.
//Get the filesystem - HDFS FileSystem fs = FileSystem.get(URI.create(uri), conf); FSDataInputStream in = null;
In the next line we will get the FileSystem object using the URL that we passed as the program input and the configuration that we just created. This will return the DistributedFileSystem object and once we have the file system object the next thing we need is the input stream to the file that we would like to read.
in = fs.open(new Path(uri)); IOUtils.copyBytes(in, System.out, 4096, false);
We can get the input stream by calling the open method on the file system object by supplying the HDFS URL of the file we would like to read. Then we will use copyBytes method from the Hadoop’s IOUtils class to read the entire file’s contents from the input stream and print it on the screen.