Step by Step

This support page provides an index to some of the lessons and tutorials provided with SiMX TextConverter.
 
Templates - see Working with Templates
 
  • Step 1. Load an input file - This step simply shows how to open an input file for extraction with TextConverter.  If you already know how to open a file for ETL in TextConverter, you can skip this step.
  • Step 2. Change the field order - In this exercise we change the order of the fields in the output dictionary.  This reordering of the fields allows you to tailor the appearance output flat files including Microsoft Excel XLS files, comma separated values (CSV), text files (txt), and the like.  If a database is connected as the output for the project, reordering of the fields will generally have no affect but may still be beneficial in visualizing the extraction, transformation, and loading process in TextConverter.
  • Step 3. Edit the Output Dictionary - In this exercise we change the names, widths, and types of fields in the output dictionary.  After data is extracted from the input data source, input and output dictionary's are automatically created and connected.  Editing the output dictionary, allows the input data to better load into the designated output database or output data file.  Changing the type can transform the data without altering the intended values.
  • Step 4. Combine four fields into one - This exercise shows you how to take a group of fields that have been automatically extracted from an input data source and combine them into a single field in the output database or file.  In the example, we will use address fields and combine these fields in to a single address block but the same technique for data transformation can be used on first name plus last name or any other data that you wish to concatenate into a single field.
  • Step 5. Split one field into three - In this exercise we use a single automatically extracted input data field to populate three output data fields.  The example uses city, state, and ZIP code for the target extracted data to be separated but the techniques shown here are applicable to many other ETL tasks that require trimming data, separating data, or identifying data for extraction and loading to an output database or file.
  • Step 6. Extract second level data - In this exercise, we extract data from the header and footer of the PDF and add it to each extracted record along with the input file name.  Computer generated documents, subject to extraction transformation and loading (ETL), often contain multiple hierarchical levels of data.  A report with a title header at the top of each section is a typical example of hierarchical data in a computer generated document.  TextConverter's premier version of automatic artificial intelligence (AI) driven extraction extracts just one level of data at a time automatically.
  • Step 7. Transform data with a look up - In this exercise we use an external file (or database) to transform data extracted from an input data source.  In particular, we will add the county name to addresses in our input data source.  First we will add a field for the county to the output dictionary, then we will use use the ZIP code in our data source to look up the county from an external database file.  Finally, we will load the county name in the output database or flat file.
  • Step 7.1. Extract data from a webpage - This project extracts data from an HTML page.  This is a bonus project used to create the ZIP code database employed as a look-up in Step 7. Transform data with a look up from an HTML page (california-county-lookup.html).
  • Step 8. Set the output database table - Press the Open Data Source button () to connect to an output database table (See output). The Set Output DB Table dialog will appear. You may choose a database or a file for loading.
  • Step 9. Run the process, see the Results
     
Parsing - see Parsing with Script
 

Related Sections

Getting Started

User Interface, Options

Programmer's Reference, Scripting