Release notes

v1.1.3 19 Sep 2017

  • Image width and height extraction fixes
  • More verbose messages in editor
  • Bug fixes for saving templates in editor

v1.1.2 15 Aug 2017

  • Improvements for multipage table recognition
  • Table header extraction tool near headers text area
  • Bug fixes for saving templates in editor

v1.1.1 5 Jul 2017

  • Font characteristics retrieval improvements
  • Bug fixes

v1.1.0 22 Jun 2017

  • Multipage tables support
  • Image selector
  • Improved font style selector


pdf2Data is an add-on for iText7 to recognize data inside PDF documents in an intuitive and predictable manner. It provides a mechanism to extract predefined data fields from the PDF documents based on the same template (for example, an invoice coming from the same supplier).


The pdf2data tool uses all possible ways to find the required data imitating human way to understand the document.
The data recognition uses on a number of rules, which need to be defined in advance per each data field. Typical rules use all details from the PDF document that help to ensure the correct data extraction. This may include:
  • page range and the position on the page
  • specific font style and text patterns
  • fixed keywords next to the data
  • automatic recognition of table structures

How it works

The whole recognition is based on the following steps:

Step 1. Upload a sample PDF document (a template).
Step 2. Select data in the document you would like to extract and define relevant extraction rules (selectors) for the correct data extraction.
Step 3. Upload any other PDF document based on the same template and check if we were able to recognize your data.
Step 4. Start using the template in the pdf2Data server-side component. Integrate it into your document workflow as a Java library or as a command-line application.

Go to demo ▶