June 24, 2022

Convert HTML to PDF in Java Using Flying Saucer, OpenPDF

In this tutorial you’ll see how to convert HTML to PDF in Java using Flying Saucer, OpenPDF and jsoup.

For converting HTML to PDF using PDFBox check this post- Convert HTML to PDF in Java Using Openhtmltopdf, PDFBox

Convert HTML to PDF using Flying Saucer – How it works

Flying Saucer renders well-formed XML, which means it takes XML files as input, applies formatting and styling using CSS, and generates a rendered representation of that XML as output. So the steps for HTML to PDF conversion are as follows-

  1. First step is to ensure that you have a well formed HTML that is done using jsoup which converts HTML to XHTML.
  2. Flying Saucer generates a rendered representation of the XHTML and CSS.
  3. OpenPDF is used to generate PDF document from that rendered representation.

OpenPDF is a fork of iText version 4, it is open source software with a LGPL and MPL license. Read more about OpenPDF in this post- Generating PDF in Java Using OpenPDF Tutorial

Maven Dependencies

Apache Maven dependencies for jsoup and Flying Saucer are as given below-

<dependency>
  <groupId>org.jsoup</groupId>
  <artifactId>jsoup</artifactId>
  <version>1.13.1</version>
</dependency>

<dependency>
  <groupId>org.xhtmlrenderer</groupId>
  <artifactId>flying-saucer-pdf-openpdf</artifactId>
  <version>9.1.20</version>
</dependency>
<!-- Dependency for Apache commons-io -->
<dependency>
  <groupId>commons-io</groupId>
  <artifactId>commons-io</artifactId>
  <version>2.6</version>
</dependency>

The mentioned dependency for flying saucer will get the required jars for OpenPDF as well as flying saucer core (flying-saucer-core-9.1.20.jar).

Convert HTML to PDF using Flying Saucer and OpenPDF Java Program

While converting HTML to PDF three problems I have encountered are-

  1. How to display image in PDF which is given there in HTML using <img src="" ..> tag.
  2. How to add any specific web font.
  3. How to ensure that external CSS used in HTML is also used to style the generated PDF.

Folder Structure used for the example program is as given here. Within OpenPDF folder we have the HTML file, a true type font file and png image file and OpenPDF/css folder has the css file.

-OpenPDF
 MyPage.html
 Gabriola.ttf
 Image OpenPDF.png
--css
  mystyles.css
MyPage.html
<html lang="en">
  <head>
    <title>MyPage</title>  
    <style type="text/css">
      body{background-color: powderblue;}
    </style>
    <link href="css/mystyles.css" rel="stylesheet" >
  </head>
  <body>
    <h1>Convert HTML to PDF</h1>
    <p>Here is an embedded image</p>
    <img src="F:\knpcode\Java\Java Programs\PDF using Java\OpenPDF\Image OpenPDF.png" width="250" height="150">
    <p style="color:red">Styled text using Inline CSS</p>
    <i>This is italicised text</i>
    <p class="fontclass">This text uses the styling from font face font</p>
    <p class="myclass">This text uses the styling from external CSS class</p>
  </body>
</html>
mystyles.css

In the css @font-face rule is used to specify a font and the URL where it can be found. Using @page rule CSS properties are specified to be used when printing a document.

@font-face {
  font-family: myFont;
  src: url("../Gabriola.ttf");
}
.fontclass{
  font-family: myFont;
}
@Page {
  size: 8.5in 11in;
  margin: 1in;
}
.myclass{
  font-family: Helvetica, sans-serif;
  font-size:25;
  font-weight: normal;
  color: blue;
}

That’s how HTML is rendered in the Chrome browser.

Convert HTML to PDF

Now our job is to write a Java program that can convert this HTML to PDF by taking the same image source, using the same external CSS, adding the font that is used in CSS @font-face rule.

For image to work properly while converting to PDF what works for me is to implement my own ReplacedElementFactory that converts image to bytes and use that to create ImageElement. There is a discussion about it here.

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import org.apache.commons.io.IOUtils;
import org.w3c.dom.Element;
import org.xhtmlrenderer.extend.FSImage;
import org.xhtmlrenderer.extend.ReplacedElement;
import org.xhtmlrenderer.extend.ReplacedElementFactory;
import org.xhtmlrenderer.extend.UserAgentCallback;
import org.xhtmlrenderer.layout.LayoutContext;
import org.xhtmlrenderer.pdf.ITextFSImage;
import org.xhtmlrenderer.pdf.ITextImageElement;
import org.xhtmlrenderer.render.BlockBox;
import org.xhtmlrenderer.simple.extend.FormSubmissionListener;
import com.lowagie.text.BadElementException;
import com.lowagie.text.Image;

public class ImageReplacedElementFactory implements ReplacedElementFactory {

  @Override
  public ReplacedElement createReplacedElement(LayoutContext c, BlockBox box, UserAgentCallback uac, int cssWidth,
      int cssHeight) {
    Element e = box.getElement();
    if (e == null) {
      return null;
    }
    String nodeName = e.getNodeName();
    if (nodeName.equals("img")) {
      String attribute = e.getAttribute("src");
      FSImage fsImage;
      try {
        fsImage = imageForPDF(attribute, uac);
      } catch (BadElementException e1) {
        fsImage = null;
      } catch (IOException e1) {
        fsImage = null;
      }
      if (fsImage != null) {
        if (cssWidth != -1 || cssHeight != -1) {
          //System.out.println("scaling");
          fsImage.scale(cssWidth, cssHeight);
        }else {
          fsImage.scale(250, 150);
        }
        return new ITextImageElement(fsImage);
      }
    }
    return null;
  }
  
  protected FSImage imageForPDF(String attribute, UserAgentCallback uac) throws IOException, BadElementException {
    InputStream input = null;
    FSImage fsImage;     
    input = new FileInputStream(attribute);
    final byte[] bytes = IOUtils.toByteArray(input);
    final Image image = Image.getInstance(bytes);
    fsImage = new ITextFSImage(image);
    return fsImage;
  }
	 
  @Override
  public void reset() {
    // TODO Auto-generated method stub
  }

  @Override
  public void remove(Element e) {
    // TODO Auto-generated method stub		
  }

  @Override
  public void setFormSubmissionListener(FormSubmissionListener listener) {
    // TODO Auto-generated method stub		
  }
}

Following Java program is used to generate PDF using the HTML as source

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.FileSystems;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.xhtmlrenderer.layout.SharedContext;
import org.xhtmlrenderer.pdf.ITextRenderer;

public class HTMLToPDF {

  public static void main(String[] args) {
    try {
      // Source HTML file
      File inputHTML = new File("F:\\knpcode\\Java\\Java Programs\\PDF using Java\\OpenPDF\\MyPage.html");
      // Generated PDF file name
      File outputPdf = new File("F:\\knpcode\\Java\\Java Programs\\PDF using Java\\OpenPDF\\Output.pdf");
      //Convert HTML to XHTML
      String xhtml = htmlToXhtml(inputHTML);
      System.out.println("Converting to PDF...");
      xhtmlToPdf(xhtml, outputPdf);
      
    } catch (IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
  }
  
  private static String htmlToXhtml(File inputHTML) throws IOException {
    Document document = Jsoup.parse(inputHTML, "UTF-8");
    System.out.println("parsing ...");
    document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
    System.out.println("parsing done ...");
    return document.html();
  }
  
  private static void xhtmlToPdf(String xhtml, File outputPdf) throws IOException {
    ITextRenderer renderer = new ITextRenderer();	
    SharedContext sharedContext = renderer.getSharedContext();
    sharedContext.setPrint(true);
    sharedContext.setInteractive(false);
    sharedContext.setReplacedElementFactory(new ImageReplacedElementFactory());
    sharedContext.getTextRenderer().setSmoothingThreshold(0);
    renderer.getFontResolver().addFont("F:\\knpcode\\Java\\Java Programs\\PDF using Java\\OpenPDF\\Gabriola.ttf", true);
    String baseUrl = FileSystems.getDefault()
                                .getPath("F:\\", "knpcode\\Java\\", "Java Programs\\PDF using Java\\OpenPDF")
                                .toUri()
                                .toURL()
                                .toString();
    renderer.setDocumentFromString(xhtml, baseUrl);
    renderer.layout();
    OutputStream outputStream = new FileOutputStream(outputPdf);
    renderer.createPDF(outputStream);
    System.out.println("PDF creation completed");
    // put this in finally
    outputStream.close();
  }
}

In the program some important points to note are-

  1. sharedContext.setReplacedElementFactory(new ImageReplacedElementFactory()); sets the custom implementation of ReplacedElementFactory.
  2. In the method renderer.setDocumentFromString(xhtml, baseUrl); baseURL is passed as the second argument. URL is created using this statement
    String baseUrl = FileSystems.getDefault().getPath("F:\\", "knpcode\\Java\\", "Java Programs\\PDF using Java\\OpenPDF").toUri().toURL().toString();
       
  3. If you notice in HTML path to css is a relative path. By setting baseURL as given in the second point it will be able to resolve this relative path which helps in using the external CSS while generating PDF.
  4. Additional font is registered using this statement
    renderer.getFontResolver().addFont("F:\\knpcode\\Java\\Java Programs\\PDF using Java\\OpenPDF\\Gabriola.ttf", true);
    
Generated PDF-
HTML to PDF using Flying Saucer

Reference: https://flyingsaucerproject.github.io/flyingsaucer/r8/guide/users-guide-R8.html

That's all for the topic Convert HTML to PDF in Java Using Flying Saucer, OpenPDF. If something is missing or you have something to share about the topic please write a comment.


You may also like

No comments:

Post a Comment