Speculative Execution
August 16, 2015Reading A File From HDFS – Java Program
August 26, 2015Writing A File To HDFS – Java Program
Writing a file to HDFS is very easy, we can simply execute hadoop fs -copyFromLocal command to copy a file from local filesystem to HDFS. In this post we will write our own Java program to write the file from local file system to HDFS.
Here is the program – FileWriteToHDFS.java
public class FileWriteToHDFS { public static void main(String[] args) throws Exception { //Source file in the local file system String localSrc = args[0]; //Destination file in HDFS String dst = args[1]; //Input stream for the file in local file system to be written to HDFS InputStream in = new BufferedInputStream(new FileInputStream(localSrc)); //Get configuration of Hadoop system Configuration conf = new Configuration(); System.out.println("Connecting to -- "+conf.get("fs.defaultFS")); //Destination file in HDFS FileSystem fs = FileSystem.get(URI.create(dst), conf); OutputStream out = fs.create(new Path(dst)); //Copy file from local to HDFS IOUtils.copyBytes(in, out, 4096, true); System.out.println(dst + " copied to HDFS"); } }
The program takes in 2 parameters. The first paramter is the file and its location in the local file system that will be copied to the location mentioned in the second parameter in HDFS.
//Source file in the local file system String localSrc = args[0]; //Destination file in HDFS String dst = args[1];
We will create a InputStream using the BufferedInputStream object by using the first parameter which is the location of the file in the local file system. The input stream objects are regular java.io stream objects and not hadoop libraries because we are still referencing a file from the local file system and not HDFS.
//Input stream for the file in local file system to be written to HDFS InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
Now we need to create an output stream to the file location in HDFS where we can write the contents of the file from the local file system. The very first thing we need to know is few key information about the cluster, like the name node details etc. The details are already specified in the configuration files during cluster setup.
The easiest way to get the configuration of the cluster is by instantiating the Configuration object and this will read the configuration files from the classpath and read and load all the information that is needed by the program.
//Get configuration of Hadoop system Configuration conf = new Configuration(); System.out.println("Connecting to -- "+conf.get("fs.defaultFS")); //Destination file in HDFS FileSystem fs = FileSystem.get(URI.create(dst), conf); OutputStream out = fs.create(new Path(dst));
In the next line we will get the File System object using the URL that we passed as the program’s input and the configuration that we just created. The file system that will be returned is the DistributedFileSystem object. Once we have the file system object the next thing we need is the outputstream to the file that we would like to write the contents of the file from the local file system.
We will then call the create method on the file system object using the location of the file in HDFS which we passed to the program as the second parameter.
//Copy file from local to HDFS IOUtils.copyBytes(in, out, 4096, true);
Finally we will use copyBytes method from hadoop’s IOUtils class and we will supply the input and output stream object. We will then read 4096 bytes at a time from the input stream and write it to the output stream which will copy the entire file from the local file system to HDFS.
1 Comment
[…] this last post we saw how to write a file to HDFS by writing our own Java program. In this post we will see how to […]