Release notes
Intelligence
The pdf2data tool uses all possible ways to find the required data imitating human way to understand
the
document.
The data recognition uses on a number of rules, which need to be defined in advance per each data field. Typical rules use all details from the PDF document that help to ensure the correct data extraction. This may include:
The data recognition uses on a number of rules, which need to be defined in advance per each data field. Typical rules use all details from the PDF document that help to ensure the correct data extraction. This may include:
- page range and the position on the page
- specific font style and text patterns
- fixed keywords next to the data
- automatic recognition of table structures
How it works
The whole recognition is based on the following steps:
Step 1. Upload a sample PDF document (a template).
Step 2. Select data in the document you would like to extract and define relevant extraction rules (selectors) for the correct data extraction.
Step 3. Upload any other PDF document based on the same template and check if we were able to recognize your data.
Step 4. Start using the template in the pdf2Data server-side component. Integrate it into your document workflow as a Java library or as a command-line application.
Step 1. Upload a sample PDF document (a template).
Step 2. Select data in the document you would like to extract and define relevant extraction rules (selectors) for the correct data extraction.
Step 3. Upload any other PDF document based on the same template and check if we were able to recognize your data.
Step 4. Start using the template in the pdf2Data server-side component. Integrate it into your document workflow as a Java library or as a command-line application.