June 26, 2022

Java Program to Write a File in HDFS

This post shows a Java program to write a file in HDFS using the Hadoop FileSystem API.

Steps for writing a file in HDFS using Java are as follows-

  1. FileSystem is an abstraction of file system of which HDFS is one implementation. So you will have to get an instance of FileSystem (HDFS in this case) using the get method.
  2. In the program you can see get() method takes Configuration as an argument. Configuration object has all the configuration related information read from the configuration files (i.e. core-site.xml from where it gets the file system).
  3. In HDFS, Path object represents the Full file path.
  4. Using create() method of FileSystem you can create a file, method returns FSDataOutputStream.

Java Program to write to a file in HDFS

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class HDFSFileWrite {

  public static void main(String[] args) {
    Configuration conf = new Configuration();
    try {
      FileSystem fs = FileSystem.get(conf);
      // Hadoop DFS Path - Input & Output file
      Path inFile = new Path(args[0]);
      Path outFile = new Path(args[1]);
      // Verification
      if (!fs.exists(inFile)) {
        System.out.println("Input file not found");
        throw new IOException("Input file not found");
      }
      if (fs.exists(outFile)) {
        System.out.println("Output file already exists");
        throw new IOException("Output file already exists");
      }
    
      // open and read from file
      FSDataInputStream in = fs.open(inFile);
      // Create file to write
      FSDataOutputStream out = fs.create(outFile);
			
      byte buffer[] = new byte[256];
      try {
        int bytesRead = 0;
        while ((bytesRead = in.read(buffer)) > 0) {
          out.write(buffer, 0, bytesRead);
          }
      } catch (IOException e) {
        System.out.println("Error while copying file");
      } finally {
        in.close();
        out.close();
      }			
    } catch (IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
  }
}

In the above program both input and output files are in HDFS if your input file is in local file system then you can use BufferedInputStream to create an input stream as given here-

InputStream in = new BufferedInputStream(new FileInputStream("/local_file_path/file_name"));

Executing program in Hadoop environment

To execute above Java program in Hadoop environment, you will need to add the directory containing the .class file for the Java program in Hadoop’s classpath.

export HADOOP_CLASSPATH='/huser/eclipse-workspace/knpcode/bin'

I have my HDFSFileWrite.class file in location /huser/eclipse-workspace/knpcode/bin so I have exported that path.

Then you can run the program by providing the path of the input file from which data is read and the path of the output file to which content is written.

hadoop org.knpcode.HDFSFileWrite /user/input/test/aa.txt /user/input/test/write.txt

By using the ls HDFS command you can verify that the file is created or not.

hdfs dfs -ls /user/input/test/

-rw-r--r-- 1 knpcode supergroup 10 2018-01-18 14:55 /user/input/test/write.txt

Writing HDFS file using IOUtils class

Hadoop framework provides IOUtils class that has many convenient methods related to I/O. You can use that to copy bytes from the input stream to output stream.

Java program to write HDFS file
import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

public class HDFSFileWrite {

  public static void main(String[] args) {
    Configuration conf = new Configuration();
    FSDataInputStream in = null;
    FSDataOutputStream out = null;
    try {
      FileSystem fs = FileSystem.get(conf);
      // Hadoop DFS Path - Input & Output file
      Path inFile = new Path(args[0]);
      Path outFile = new Path(args[1]);
      // Verification
      if (!fs.exists(inFile)) {
        System.out.println("Input file not found");
        throw new IOException("Input file not found");
      }
      if (fs.exists(outFile)) {
        System.out.println("Output file already exists");
        throw new IOException("Output file already exists");
      }
      try {
        // open and read from file
        in = fs.open(inFile);
        // Create file to write
        out = fs.create(outFile);
        IOUtils.copyBytes(in, out, 512, false);
        
      } finally {
        IOUtils.closeStream(in);
        IOUtils.closeStream(out);
      }      
    } catch (IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
  }
}

That's all for the topic Java Program to Write a File in HDFS. If something is missing or you have something to share about the topic please write a comment.


You may also like

No comments:

Post a Comment