June 24, 2022

GZIP Multiple Files in Java Creating Tar Archive

GZIP is normally used to compress single files in GZIP format, if you want to compress multiple files using GZIP format in Java it is a two step process;

  • first multiple files are archived into one with tar,
  • then compressed with gzip to create a .tar.gz compressed archive.

In this post we'll see this whole process of compressing multiple files using gzip in Java by creating a tar file in Java and then gzip it thus creating a .tar.gz archive.

Gzip multiple files in Java

Java program given here to archive multiple files into tar and then compressing into GZIP uses Apache Commons Compress library which can be downloaded from this path- https://commons.apache.org/proper/commons-compress/download_compress.cgi

Version used here is commons-compress-1.18 so commons-compress-1.18.jar is added to the class path.

From Apache Commons Compress library following two files are used for creating tar archive.

  • TarArchiveEntry- Represents an entry in a Tar archive. So all the directories and files which are compressed are added to tar archive using TarArchiveEntry.
  • TarArchiveOutputStream- This class has methods to put archive entries, and then write content of the files by writing to this stream. TarArchiveOutputStream wraps GZIPOutputStream in the program.

Java program – Create tar archive and Gzip multiple files

Directory structure used in the Java program is as given below, there is a parent directory test having two sub-directories docs and prints and four files-

$ ls -R test

test:
aa.txt  bb.txt  docs  prints

test/docs:
display.txt

test/prints:
output

In the program you need to traverse the directory structure to archive all files and directories. If it is a directory just archive that entry, in case of file apart from archiving that entry also write the content of the file to the stream.

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.zip.GZIPOutputStream;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.utils.IOUtils;

public class GZipMultipleFiles {
  public static void main(String[] args) {
    String PARENT_DIRECTORY = "/home/knpcode/Documents/test";
    GZipMultipleFiles gzipMultipleFiles = new GZipMultipleFiles();
    gzipMultipleFiles.createTarArchive(PARENT_DIRECTORY);
  }
	
  public void createTarArchive(String parentDir){
    TarArchiveOutputStream tarArchive = null;
    try {
      File root = new File(parentDir);
      // create output name for tar archive
      FileOutputStream fos = new FileOutputStream(root.getAbsolutePath().concat(".tar.gz"));
      GZIPOutputStream gzipOS = new GZIPOutputStream(new BufferedOutputStream(fos));
      tarArchive = new TarArchiveOutputStream(gzipOS);
      addToArchive(parentDir, "", tarArchive);   
    } catch (IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }finally{
      try {
        tarArchive.close();
      } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
      }
    }
  }
	
  public void addToArchive(String filePath, String parent, TarArchiveOutputStream tarArchive) throws IOException {
    File file = new File(filePath);
    // Create entry name relative to parent file path 
    //for the archived file
    String entryName = parent + file.getName();
    System.out.println("entryName " + entryName);
    // add tar ArchiveEntry
    tarArchive.putArchiveEntry(new TarArchiveEntry(file, entryName));
    if(file.isFile()){
      FileInputStream fis = new FileInputStream(file);
      BufferedInputStream bis = new BufferedInputStream(fis);
      // Write file content to archive
      IOUtils.copy(bis, tarArchive);
      tarArchive.closeArchiveEntry();
      bis.close();
    }else if(file.isDirectory()){
      // no content to copy so close archive entry
      tarArchive.closeArchiveEntry();
      // if this directory contains more directories and files
      // traverse and archive them 
      for(File f : file.listFiles()){		
        // recursive call
        addToArchive(f.getAbsolutePath(), entryName+File.separator, tarArchive);
      }
    }		  
  }
}
Output for the entries in the tar archives-
entryName test
entryName test/docs
entryName test/docs/display.txt
entryName test/bb.txt
entryName test/prints
entryName test/prints/output
entryName test/aa.txt

As shown in the Archive Manager.

gzip mulitple files Java

That's all for the topic GZIP Multiple Files in Java Creating Tar Archive. If something is missing or you have something to share about the topic please write a comment.


You may also like

No comments:

Post a Comment