One click forms

TextConverter can use artificial intelligence to generate forms. One click forms work best on well structured computer generated documents with consistent data layout and formatting patterns. In these cases TextConverter automatically processes the selected portion of the document, finds reoccurring layout and formatting patterns and uses this information for identifying data elements to be extracted.

When working in with forms TextConverter automatically handles the extraction process providing a result without the need for any manual configuration or programming. Some final customization such as editing output dictionary (adding/removing fields, changing field names and type) and providing additional data transformation (such as merging or splitting fields) may be necessary to match your output database requirements. You can also perform more advanced transformations of the extracted data by working with script inside of the project.

Single record documents are not suitable for automatic extraction through artificial intelligence (AI). One click form extraction requires a pattern consisting of two or more records. TextConverter is ideally suited for the task of extracting one (or more) records each from a large collection of documents in PDF, DOC, RTF, XLS, HTML, CSV, or text format but for one record documents and forms, use manual extraction.

Clearly delimited files extract automatically in without the use of a form. It is not necessary (or effective) to use one click forms for simple delimited files. The example 1.1 Products extracts perfectly without a template but does not extract well when using forms.

TextConverter works with forms by default for conversion options and impacts several of the other down line options.

Two of the options unique to automatic mode are described below:

Learning - selects the mode of training used by TextConverter's artificial intelligence to select and create templates (see the Template option below). Learning can be set to:

initial - the artificial intelligence trains to the number of lines to preview depth for the current input data source only

continuous - the artificial intelligence continues to train against each new input data file that is opened to the number of lines to preview depth

none - no further changes are made to the template(s)

Template - this allows you to select from a list of templates that the automatic conversion process identified as fitting your input data. This will default to the best fitting template but, if this is not a correct interpretation of the source data, use the drop down menu to try each of the other templates offered. If the template names are long or your window is small, you may need to scroll the pane to the right to see the drop down menu.

When you are working in "Forms" mode some lines may be suppressed from the input table view use the Show all lines option to see these lines. This check box allows you to include lines that do not conform to the selected automatic template in the conversion. By default, this box is not checked and lines that do not conform to the template are skipped. Deactivate this option to see and work with the non-conforming text. For more information on working with lines outside of the template, see the exercise Step 6. Extract second level data. This exercise explains the VB Script code below in better detail and provides an example that you can run on your desktop.

Script for parsing in Automatic mode

Dim val(), nf

nf = DictOut.GetFieldCount()

Redim val(nf)

'----------------- OnRecord -----------------

Function OnRecord

If This.IsDelimited() Then 'Parse delimited lines here

'setting output fields from the current line (delimited)

DictOut.SetCellValue <outputfieldname>, DictIn.GetCellValue(<inputfieldname>)

'setting output fields with the data from global array

DictOut.SetCellValue <outputfieldname>, val(<index>)

Else 'Parse not delimited lines here

'populating global array with the data extracted by tags and

val(<index>) This.GetByTag(<tag>)

This.SkipRecord

End If

End Function

Back (Input) | Next (Output)

Back (User Interface)