Step 4. Combine many fields into one

This exercise shows how to take a group of fields that have been automatically extracted from an input data source and combine them into a single field in the output database or file.  In the example, we will use address fields and combine them in to a single address block. The same technique for data transformation can be used on first name plus last name or any other data that you wish to merge into a single field.

The range of files supported by TextConverter varies from structured delimited files like CSV, to tag based text like QuickBooks or MS Money, to loosely structured heterogeneous input with multilevel data. Use the Input pane to load a file or drag and drop a text file from Windows Explorer into the TextConverter frame. TextConverter can perform ETL by extracting data from popular document formats including PDF, DOC/DOCX, RTF, XLS,/XLSX, HTML, CSV and free formatted text files. 

TextConverter uses Artificial Intelligence (AI) to to extract and transform data from the input data source.  Sometimes a single file does not have enough information to allow the AI to set all the fields in the data dictionary correctly.  When this occurs, you can make changes to the fields in the output data dictionary.

location:  InstallDir/Samples/TextConverter/Automatic Mode/

project file:  InstallDir/Samples/TextConverter/Automatic Mode/2 With script/4 CombineFields.ConverterX


1. Clear the project by pressing the New button () in the tool bar or by selecting "New" from the "File" menu.
 
2. Make sure the "Open file as" option is set to "Pdf".
 
3.  Open the sample file provided with the software named "dataquick.pdf".

4.  Select "Generate Templates ..." from either the Options or Input Pane. TextConverter selects a template that best fits the input file's structure by default.

5.  Click the "Reset Output Dictionary" icon to create a linked input dictionary using the new template.


The input dictionary generates automatically based on the data extracted and the options selected.  The input dictionary is initially connected to an output dictionary with the same characteristics.  The output dictionary can be edited by the user.  Any field in the output dictionary can be connected to any field in the input dictionary.  If the output field requires more than a simple one-to-one relationship with the input field, a script can be inserted to transform extracted data into the output field.

6. Let's assume that you are using the extracted data to send out letters to the Trustees listed in these files. Your system used an address block including the street address, city, state, and ZIP code rafter than individual fields for each value.  We will combine the values from these four input fields into a single output field.

a.  Go to the output dictionary (upper right pane) and scroll down the list to the bottom.

b.  Right click in the space below the last field and select "New Field" from the fly menu.

c.  Name the new field "Trustee Address Block", set the type to "string", and set the width to 60.  Note:  Feel free to choose a name and width that fits your particular application.

d.  Right click on the new field and choose "Create script conversion" from the fly menu.  The script tab will open and the following text will be inserted automatically:

'----------------- OnRecord -----------------
Function OnRecord
DictOut.SetFieldValue "Trustee Address Block", value
End Function

e.  Delete the word "value" that was inserted as a placeholder and leave your cursor in this position.  Now click on "Field_8" in the input dictionary.  An easy way to find the field you are looking for is to click the corresponding data in the input preview pane.  Drag "Field_8" to the position in the script where you deleted the word "value".  The script will now look like this:

Function OnRecord
DictOut.SetFieldValue "Trustee Address Block", DictIn.Field_8.value
End Function

f.  Now add the rest of the fields and values. Add & ", " & (to add a comma and space) then drag in the next field "Field_13", then another & ", " & then drag and drop "Field_14", then & "  " and drag in "Field_15".  Your final script is three lines long and should look like this (please ignore any line wrapping caused by your browser):

'----------------- OnRecord -----------------
Function OnRecord
DictOut.SetFieldValue "Trustee Address Block", DictIn.Field_8.value & ", " & DictIn.Field_13.value & ", " & DictIn.Field_14.value & "  " & DictIn.Field_15.value
End Function

g.  Click the "Check Syntax" button in the lower right corner and click the preview tab to see the results.  You will have to scroll all the way to the right of the preview to see the new combined address block field.  You may now delete the four original fields from the output dictionary and you may move the new field to any position in the output dictionary.

7.  Now you are ready to connect the output data source and run the extraction process.

Save your project - TextConverter stores all of the current settings in a project file.  To save your project, choose "Save Project" form the file menu or click the disc icon () on the toolbar.  Elements of a project include:

  • path to the input data source
  • path to the output data source
  • paths to all other files and databases used in the project
  • the complete script
  • the mapping of the input dictionary to the output dictionary
  • All output dictionary settings
  • All "options" settings
  • Any other ETL settings
  • The workspace layout is NOT saved as part of the project but is instead retained with TextConverter
Learn more with Step 5. Split one field into three

Step 4 Video


Comments