Step 2. Change the field order

In this exercise we change the order of the output dictionary's fields. This reordering allows you to tailor the appearance of output flat files including Microsoft Excel XLS files, comma separated values (CSV), text files (txt), and other similar formats. If a database is connected for the project's output, reordering the fields will have no effect but may still be beneficial in visualizing the extraction, transformation, and loading process in TextConverter's dynamic output preview.
The range of files supported by TextConverter varies from structured delimited files like CSV, to tag based text like QuickBooks or MS Money, to loosely structured heterogeneous input with multilevel data. Use the Input pane to load a file or drag and drop a text file from Windows Explorer into the TextConverter frame. TextConverter can perform ETL by extracting data from popular document formats including PDF, DOC/DOCX, RTF, XLS,/XLSX, HTML, CSV and free formatted text files. 

TextConverter uses Artificial Intelligence (AI) to to extract and transform data from the input data source.  Sometimes a single file does not have enough information to allow the AI to set all the fields in the data dictionary correctly.  When this occurs, you can arse the file with script aided by our high level functions and methods, and make changes visualy to the fields in the output data dictionary.

location:  InstallDir/Samples/TextConverter/Automatic Mode/

project file:  InstallDir/Samples/TextConverter/Automatic Mode/1 Without script/2 ChangeFieldOrder.ConverterX
 
1.  Clear the project by pressing the New button () in the tool bar or by selecting "New" from the "File" menu.
 
2.  Make sure the "Open file as" option is set to "Pdf".
 
3.  Open the sample file provided with the software named "dataquick.pdf"

4.  Select "Generate Templates ..." from either the Options or Input Pane. TextConverter selects a template that best fits the input file's structure by default.

5.  Click the "Reset Output Dictionary" icon to create a linked input dictionary using the new template.
 
 
The input dictionary generates automatically based on the data extracted and the options selected.  The input dictionary is initially connected to an output dictionary with the same characteristics.  The output dictionary can be edited by the user.  Any field in the output dictionary can be connected to any field in the input dictionary.  If the output field requires more than a simple one-to-one relationship with the input field, a script can be inserted to transform extracted data into the output field.

In this example, most of the fields were extracted properly by TextConverter's AI but the fields are not in the correct order.  If this data is for a database, the field order is not important. If you are going to output to the data to a spreadsheet, or want the project to be neat and easy to read, you can change the field order.

In the PDF input data source, the owner and the site address are visually grouped together.  In the extracted data, the owner is in field 12, the site street address is in field 23, the city and state are in field 25, and the ZIP code is in field 26.

6.  Go to the Output Dictionary (upper right pane) and scroll down to field 23 "Site_Address".  Click on the the field icon and drag it up to just below field 12 "Owner".  Notice that the output field moves but stays connected to the same input field.  An easy way to find a field is to click on the data in the input preview.  Clicking on the data element in the input preview will highlight the field in both the input dictionary and the output dictionary, as well as highlighting both the field and the record (row) in the output preview.

Drag field 25 to position # 14
Drag field 26 to position # 15

Now bring the "Mail Address" together.

Drag field 29 to position # 16
Drag field 33 to position # 17
Drag field 34 to position # 18

Now pull the "Trustee" data together

Drag field 2 to position # 19 (which will become position 18 as all the fields move up)
Drag "field_8" (in field position 7) to position # 19 (which will become position 18 as all the fields move up)

Make any additional field position adjustments you feel are necessary.

Save your project - TextConverter stores all of the current settings in a project file.  To save your project, choose "Save Project" form the file menu or click the disc icon () on the toolbar.  Elements of a project include:

  • path to the input data source
  • path to the output data source
  • paths to all other files and databases used in the project
  • the complete script
  • the mapping of the input dictionary to the output dictionary
  • All output dictionary settings
  • All "options" settings
  • Any other ETL settings
  • The workspace layout is NOT saved as part of the project but is instead retained with TextConverter

Comments