This post shows a Java program to write a file in HDFS using the Hadoop FileSystem API.
Steps for writing a file in HDFS using Java are as follows-
- FileSystem is an abstraction of file system of which HDFS is one implementation. So you will have to get an instance of FileSystem (HDFS in this case) using the get method.
- In the program you can see get() method takes Configuration as an argument. Configuration object has all the configuration related information read from the configuration files (i.e. core-site.xml from where it gets the file system).
- In HDFS,
Path
object represents the Full file path. - Using
create()
method of FileSystem you can create a file, method returns FSDataOutputStream.
Java Program to write to a file in HDFS
import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; public class HDFSFileWrite { public static void main(String[] args) { Configuration conf = new Configuration(); try { FileSystem fs = FileSystem.get(conf); // Hadoop DFS Path - Input & Output file Path inFile = new Path(args[0]); Path outFile = new Path(args[1]); // Verification if (!fs.exists(inFile)) { System.out.println("Input file not found"); throw new IOException("Input file not found"); } if (fs.exists(outFile)) { System.out.println("Output file already exists"); throw new IOException("Output file already exists"); } // open and read from file FSDataInputStream in = fs.open(inFile); // Create file to write FSDataOutputStream out = fs.create(outFile); byte buffer[] = new byte[256]; try { int bytesRead = 0; while ((bytesRead = in.read(buffer)) > 0) { out.write(buffer, 0, bytesRead); } } catch (IOException e) { System.out.println("Error while copying file"); } finally { in.close(); out.close(); } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }
In the above program both input and output files are in HDFS if your input file is in local file system then you can use BufferedInputStream to create an input stream as given here-
InputStream in = new BufferedInputStream(new FileInputStream("/local_file_path/file_name"));
Executing program in Hadoop environment
To execute above Java program in Hadoop environment, you will need to add the directory containing the .class file for the Java program in Hadoop’s classpath.
export HADOOP_CLASSPATH='/huser/eclipse-workspace/knpcode/bin'
I have my HDFSFileWrite.class file in location /huser/eclipse-workspace/knpcode/bin so I have exported that path.
Then you can run the program by providing the path of the input file from which data is read and the path of the output file to which content is written.
hadoop org.knpcode.HDFSFileWrite /user/input/test/aa.txt /user/input/test/write.txt
By using the ls HDFS command you can verify that the file is created or not.
hdfs dfs -ls /user/input/test/ -rw-r--r-- 1 knpcode supergroup 10 2018-01-18 14:55 /user/input/test/write.txt
Writing HDFS file using IOUtils class
Hadoop framework provides IOUtils
class that has many convenient methods related to I/O. You can use that to copy bytes from the input stream to output stream.
import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; public class HDFSFileWrite { public static void main(String[] args) { Configuration conf = new Configuration(); FSDataInputStream in = null; FSDataOutputStream out = null; try { FileSystem fs = FileSystem.get(conf); // Hadoop DFS Path - Input & Output file Path inFile = new Path(args[0]); Path outFile = new Path(args[1]); // Verification if (!fs.exists(inFile)) { System.out.println("Input file not found"); throw new IOException("Input file not found"); } if (fs.exists(outFile)) { System.out.println("Output file already exists"); throw new IOException("Output file already exists"); } try { // open and read from file in = fs.open(inFile); // Create file to write out = fs.create(outFile); IOUtils.copyBytes(in, out, 512, false); } finally { IOUtils.closeStream(in); IOUtils.closeStream(out); } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }
That's all for the topic Java Program to Write a File in HDFS. If something is missing or you have something to share about the topic please write a comment.
You may also like
No comments:
Post a Comment