June 26, 2022

Java Program to Read a File From HDFS

This post shows a Java program to read a file from HDFS using the Hadoop FileSystem API.

Steps for reading the file in HDFS using Java are as follows-

  • FileSystem is an abstraction of file system of which HDFS is one implementation. So you will have to get an instance of FileSystem (HDFS in this case) using the get method.
  • In the program you can see get() method takes Configuration as an argument. Configuration object has all the configuration related information read from the configuration files (i.e. core-site.xml from where it gets the file system).
  • In HDFS Path object represents the Full file path.
  • Once you get the file, to read it input stream is used which in HDFS is FSDataInputStream.
  • For output stream, System.out is used which prints the data on console.

Java Program to read a file from HDFS

import java.io.IOException;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class HDFSFileRead {

  public static void main(String[] args) {
    Configuration conf = new Configuration();
    try {
      FileSystem fs = FileSystem.get(conf);
      // Hadoop DFS Path - Input file
      Path inFile = new Path(args[0]);
        
      // Check if input is valid
      if (!fs.exists(inFile)) {
        System.out.println("Input file not found");
        throw new IOException("Input file not found");
      }
			
      // open and read from file
      FSDataInputStream in = fs.open(inFile);
      // system.out as output stream to display 
      //file content on terminal 
      OutputStream out = System.out;
      byte buffer[] = new byte[256];
      try {
        int bytesRead = 0;
        while ((bytesRead = in.read(buffer)) > 0) {
          out.write(buffer, 0, bytesRead);
        }
      } catch (IOException e) {
        System.out.println("Error while copying file");
      } finally {
         // Closing streams
        in.close();
        out.close();
      }      
    } catch (IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }		 
  }
}

Executing program in Hadoop environment

To execute above program in Hadoop environment, you will need to add the directory containing the .class file for the Java program in Hadoop’s classpath.

export HADOOP_CLASSPATH='/huser/eclipse-workspace/knpcode/bin'

I have my HDFSFileRead.class file in location /huser/eclipse-workspace/knpcode/bin so I have exported that path.

Then you can run the program by providing the HDFS file that has to be read as an argument to your Java program-

hadoop org.knpcode.HDFSFileRead /user/input/test/aa.txt

Using IOUtils class to read a file in HDFS

Hadoop framework provides an utility class IOUtils that has many convenient methods related to I/O. You can use that to read a file in HDFS and display it’s content on console. Using IOUtils will reduce the program size.

Java program to read HDFS file

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

public class HDFSFileRead {
  public static void main(String[] args) {
    Configuration conf = new Configuration();
    try {
      FileSystem fs = FileSystem.get(conf);
      FSDataInputStream in = null;
      // Hadoop DFS Path - Input file
      Path inFile = new Path(args[0]);
      // Check if input is valid
      if (!fs.exists(inFile)) {
        System.out.println("Input file not found");
        throw new IOException("Input file not found");
      }
      try {
        // open and read from file
        in = fs.open(inFile);			
        IOUtils.copyBytes(in, System.out, 512, false);		
      }finally {
        IOUtils.closeStream(in);
      }			
    } catch (IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }	 
  }
}

That's all for the topic Java Program to Read a File From HDFS. If something is missing or you have something to share about the topic please write a comment.


You may also like

No comments:

Post a Comment