Overview

Understanding Your Input Document

"Know Thyself" is an ancient Greek aphorism that can be helpful as the first step in extracting all the data you need, in the form you want it, from a document in TextConverter. Start by knowing your input document. What is the hierarchy within your document? Unless you are looking at a set of files, the highest level is the file. This consists of information like the filename, type, size, and path. Within the document, there may be one or more headers. If the file contains a single invoice, it will have one header at this level, but if the file contains multiple invoices, there will be a header for each invoice. For a simple invoice, the next level down will be the detailed line items. Some documents are more complex, for example, you might have a bank statement with multiple accounts that may also be divided into multiple sections where the same data is reiterated in different ways.

Top Level information applies to one or more detailed records

Detail Level data drive the creation of records. Each detail record applies to only one output record

Identifying Your Data

Start by finding the detail level. The repeating pattern of the detail level will be used to create your output records. In the example above, the detail records are positional and the top-level data is tag-based. For the detail records each field* is identified based on the column where the value. This is the value's position and it determines what field the value will be placed in. For the top-level data in the header most of the items have tags, for example "BILL TO:" and "SHIP TO: are tags. All of the header fields can be extracted relative to the tags available. By default, the value is to the right of or below a tag, but it is easy to define any position for the value relative to the tag, that is above, below, left, or right.

*other software platforms call these traps and require you to create a trap for each field

Find the detail level, highlight a record (typically one or more lines), and then, after the automatic template is created, adjust the Template Properties and Field Properties to suit your data extraction task.

Here is a partial screenshot of the detailed level data in TextConverter:

The detail records are positional, that is, for the detail records each field is identified based on the column that the value appears in, this is the value's position.

Here is a partial screenshot of the top-level (or header) data in TextConverter:

For the top-level data in this example, we have identified reliable and unique tags like "Order No.:" and "Sales No.:". For each tag, a related value is identified, extracted, and added to the associated output records.