Concept Explained

Theoretically, files containing text (structured, loosely structured, or unstructured) can be converted into highly structured data such as database tables. TextConverter breaks down complex text conversion tasks into several simple steps.  Each step can be observed and manipulated while previewing the result.

TextConverter uses Artificial Intelligence (AI) to automatically identify the structured data in your file and prepare this data for your database. The program's AI and ability to learn allows most of the conversion to take place with minimal user intervention. However, practically every step of the process can be modified through easy to use TextConverter Methods and simple Visual Basic Script.

Terminology

Original input text A text file input to be converted into a database format
Template View
A template is a parsing overlay exposing the input document as a set of input dictionary fields and record(s) automatically generated through artificial intelligence (AI).
Parsing View
The information from the original input document presented in a set of columns and rows.
Input line The line of text in the original input text. Input lines are separated either by carriage return <CR>, line feed <LF> symbols or both.
Input record The line of text produced by applying a record delimiter to the original input text. In most simple cases, an input record is the same as an input line.
Input field The element of an input record after a field delimiter is applied.
Input dictionary The list of all input fields and their properties.
Tabular input text The tabular representation of the original input text - a set of all input records.
Output field The result of an input field transformation accordingly to the pre-built or a custom conversion.
Output dictionary The list of all output fields and their properties.
Output record The combination of all output fields corresponding to a single input record.
Output table a set or a subset of all output records
 

TextConverter Process

When working with templates, TextConverter generates fields from the document's structure. Once the file is loaded into the program, the user can select generate templates and choose a template that best suits his or her needs. General changes can be made in the Options pane, while specific modifications can be made through the Script tab in the bottom left pane.

When parsing with script, the process changes slightly since more options become available with delimiters. After loading a text file, changing a record, or adjusting the delimiters, TextConverter analyzes the input and the following occurs automatically:

- Input records are generated according to the record delimiter

- Input dictionary is initialized according to the field delimiter (see Input for details)

- Output dictionary is initialized, the number of the output fields corresponds to the number of the input fields (see output for details)

- Output fields' data types are initialized using an automatic data type recognition procedure

- A List of the most suitable record delimiters is generated and shown in the options pane

- A List of the most suitable fields delimiters is generated and shown in the options pane 

- A Tabular representation of the input text is shown in the left top pane (see Input for details)

- Conversion preview is shown in the preview pane

Simple Conversion

You can change input and output fields' properties, field and record delimiters (when using a text file), as well as other options. An immediate preview displays real-time results of your changes, allowing for expedited modifications and instant results. Click the Run button () to start the conversion process once satisfied with the preview. View the result of your conversion by pressing the Browse button () located in the output pane.

Set the output database table using the Set Output Data Source () button and the setup is finished. Click Run () to convert the input text into the output database table. To save the project for future, click the Save button () in the main tool bar.

Custom Conversion

Record delimiter customization

Use the IsNewRecord context method if a record delimiter changes from line to line. This approach allows the use of arbitrary logic to make a decision with regard to whether a new input line is a new record.

Field delimiter customization

If constant field delimiters do not resolve the problem, you can employ positional, tag delimited or logical methods to define the field boundaries.

1. Positional - fields begin from a certain character position in each input record. See Sample 1.3 for details.

2. Tag delimited - each field follows a tag associated with the field.  See Sample 2.1 for details.

3. Logical - script algorithm is needed to separate input fields.  See Sample 2.2 for details.

Input field - output field conversion

Automatically generated input and output fields are connected to each other by default. This connection means that an input value corresponding to an input field from an input record will be transferred to the connected output field in an output record. The conversion of each input value depends on the type and format properties of the corresponding input and output fields (see input and output for details).  The following picture illustrates the default conversion:

There are situations when the default conversion is not enough.  Such cases include, but are not limited to, when an output value is a function of multiple input values or when extraction of an input value from an input record is not supported by the default conversion. The Script customization overrides the default conversion if both are in use for the same input/output pair of fields.
 

See OnRecord context method  to learn more.

Output records' filtering

You may decide that not all of the output records should be inserted into the output database table.  In such cases you can prevent the current output record from being inserted into the output table by calling the SkipRecord method from your implementation of the OnRecord context method.

Input records merge

When the input text consists of header-body-footer like text segments you might want to merge header and footer data into body records.  The methods that facilitate such a task are: AddToBuffer, GetBufferCount, GetFromBuffer, AppendRecord.

For a higher level description go to TextConverter's Concept

Related Sections

Setting up a conversion process step by step

Samples and walkthroughs

User Interface

Scripting

Comments