Samples and Walkthroughs

This section offers some practical examples and guides through multiple work flows TextConverter users might face.

Select "Samples" from the "Help" main menu item to view the project samples mentioned here.


Working with Templates - These projects demonstrate the productivity enhancements available through AI pattern recognition templates that TextConverter automatically generates.  AI templates (available in versions 2.2 and higher) work best on well structured computer generated documents with both consistent data layout and formatting patterns. TextConverter automatically processes the selected portion of the document, finding reoccurring layout and formatting patterns. TextConverter uses this information to identify data elements to extract.

Files

  • dataquick.pdf - PDF Forclosure Detail Report; Flat (non-hierarchical) report with extra text in the header and footer of each page; When working with templates, TextConverter creates, connects, and labels the input and output dictionaries; TextConverter also sets types and formats for each field.
  • orange_county.pdf - PDF Forclosure Detail Report; Flat (non-hierarchical) report with extra text in the header and footer of each page; When working with templates, TextConverter creates, connects, and labels the input and output dictionaries; TextConverter also sets types and formats for each field.
  • california-county-lookup.html - A web page used to demonstrate HTML extraction of ZIP code data in 7.1 ZIP Extraction.
  • ca_zip.dbf - The extracted data from 7.1 ZIP Extraction.
  • ca_zip.fpt - The extracted data from 7.1 ZIP Extraction.
Projects without Script
  • Step 2. Change the field order - In this exercise we change the order of the fields in the output dictionary.  This reordering of the fields allows you to tailor the appearance of output flat files including Microsoft Excel XLS files, comma separated values (CSV), text files (txt), and other similar files.  If a database is connected as the output of the project, reordering of the fields will generally have no effect but may still be beneficial in visualizing the extraction, transformation, and loading process in TextConverter.
  • Step 3. Edit the Output Dictionary - In this exercise we change the names, widths, and types of fields in the output dictionary.  After data is extracted from the input data source, the input dictionary and output dictionaries are automatically created and connected.  Editing the output dictionary facilitates loading the input into the designated output database or output data file.  Changing the type can transform the data without altering the intended values.
Projects with Script
  • Step 4. Combine many fields into one - This exercise shows you how to take a group of fields that have been automatically extracted from an input data source and combine them into a single field in the output file or database.  In this particular example, we will use address fields and combine these fields into a single address block. The same technique for data transformation can be used to combine first and last names, or any other data that you wish to merge into a single field.
  • Step 5. Split one field into three - In this exercise we use a single automatically extracted input data field to populate three output data fields.  The example uses city, state, and ZIP code for the target extracted data to be separated. The techniques shown here are applicable to many other ETL tasks that require trimming, separating, or identifying data for extraction and loading to an output database or file.
  • Step 6. Extract second level data - In this exercise, we extract data from the header and footer of the PDF and add it to each extracted record along with the input file name.  Computer generated documents, subject to extraction transformation and loading (ETL), often contain multiple hierarchical levels of data.  A report with a title header at the top of each section is a typical example of hierarchical data in a computer generated document.  TextConverter's premier version of automatic artificial intelligence (AI) driven extraction extracts just one level of data at a time automatically.
  • Step 7. Transform data with a look up - In this exercise we use an external file (or database) to transform data extracted from an input data source.  In particular, we will add the county name to addresses in our input data source.  First we will add a field for the county to the output dictionary. Then we will use use the ZIP code in our data source to look up the county from an external database file.  Finally, we will load the county name in the output database or flat file.
Parsing with Script - TextConverter provides support for the parsing with script including VBScript, regex, and a set of power functions and methods developed specifically for data processing as well as workflow automation.
Well formed text input - These samples illustrate the text conversion procedures for input files with regular field and record delimiters.
  • Products (Sample 1.1) - regular field and record delimiters, first line contains the field names, edit output dictionary, simple script customization of an output value

  • AutoLog (Sample 1.2) - a log-like, space-delimited input file, input format customization, process auto run, when the input file changes

  • Positional (Sample 1.3) - OnRecord context method implementation, VBScript function Mid used to Retrieve field values by position, Initialize dates and numerics from text values using the InitDate and InitNumeric functions, Positional field recognition vs. field delimiters.

Irregular field and record delimiters - These samples show how field and record delimiting rules can be customized for input with irregular field and record delimiters.

  • TagSearch (Sample 2.1) - An appropriate record delimiter selection, OnRecord context method implementation, Handle tagged field values using the GetValueByTag function.

  • QuickBooks (Sample 2.2) - Widely accepted data legacy format used by QuickBooks and MS Money, An appropriate record delimiter selection, OnRecord context method implementation, Handle tagged field values with the GetValueByTag function, Filter out unneeded records with the SkipRecord function.

  • Dictionary (Sample 2.3) - No repeatable record delimiter, Rule-based record recognition, Define the record-delimiting rule using the IsNewRecord function.

Heterogeneous input - These samples illustrate advance techniques for output records filtering, records merging and creation of relations.

  • GroupMerge (Sample 3.1) - Record accumulation using the AddToBuffer method, Record filtering using the SkipRecord method, Manual record insertion using the AppendRecord method.

  • Relations (Sample 3.2) - Main output table stores document-related information, Proxy data object to store account level data, Link two tables by primary/foreign key - AccountID using the OnRecord context method customization.

OFX (Sample 4.1) - This sample shows how to configure TextConverter to convert Open Financial Exchange (OFX) files into a database table.  Includes the following files:  4.1 OFX.converterx, OFX.dict, OFX.tscr, Sample.ofx

QIF - These samples show how to configure TextConverter to convert Quicken (QIF) files into a database table.

  • Sample 5.1 - QIF_Bank, This sample project demonstrates how TextConverter should be configured to convert the Quicken (QIF) Bank account transactions type (!Type:Bank) files into a database table.
  • Sample 5.2 - QIF_Cat, This sample project demonstrates how TextConverter should be configured to convert the Quicken (QIF) Category list (!Type:Cat) type files into a database table.

  • Sample 5.3 - QIF_Invst, This sample project demonstrates how TextConverter should be configured to convert the Quicken (QIF) Investment account transactions type (!Type:Invst) files into a database table.

Related Sections 

Getting Started

TextConverter's Concept

User Interface, Options

Setting up an Extraction: Step by Step

Programmer's Reference, Scripting