June 29, 2022

How to Compress MapReduce Job Output

If you want to compress output of the MapReduce job in Hadoop that can be done per-job basis by setting properties in your job configuration or at a whole cluster level by setting the properties in mapred-site.xml.

Properties for compressing MapReduce job output

  • mapreduce.output.fileoutputformat.compress-Set to true if job outputs should be compressed. Default is false.
  • mapreduce.output.fileoutputformat.compress.type- If the job outputs are to compressed as SequenceFiles, how should they be compressed? Should be one of NONE, RECORD or BLOCK. Default is RECORD.
  • mapreduce.output.fileoutputformat.compress.codec- If the job outputs are compressed, which codec is to be used. Default is org.apache.hadoop.io.compress.DefaultCodec

Making changes in mapred-site.xml

If you want to compress the MapReduce job output for all the jobs running on a cluster then you can add these properties in mapred-site.xml.


Making changes in Job configuration

If you want to compress output of the MapReduce job only for a specific MapReduce job then add properties in you job configuration.

FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);

If you are using Sequence file format then you can set compression type too.

SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);

That's all for the topic How to Compress MapReduce Job Output. If something is missing or you have something to share about the topic please write a comment.

You may also like

No comments:

Post a Comment