Step 7. Transform data with a look up

In this exercise we use an external file (or database) to transform data extracted from an input data source.  In particular, we will add the county name to addresses in our input data source.  First we will add  fields for the countys to the output dictionary, then we will use use the ZIP code in our data source to look up each county from an external database file.  Finally, we will load the county name in the output database or flat file.

For this exercise, in addition to the input data source, we will also need the file "ca_zip.dbf" provided as a sample file with your software.

location:  InstallDir/Samples/TextConverter/Automatic Mode/2 With script/

project file:  InstallDir/Samples/TextConverter/Automatic Mode/2 With script/7 TansformLookup.ConverterX

1.  Clear the project by pressing the New button () in the tool bar or by selecting "New" from the "File" menu.
 
2.  Make sure the "Open file as" option is set to "Pdf".

3.  Open the sample file provided with the software named "dataquick.pdf"

4.  Select "Generate Templates ..." from either the Options or Input Pane. TextConverter selects a template that best fits the input file's structure by default.

5.  Click the "Reset Output Dictionary" icon to create a linked input dictionary using the new template.

6.  Create fields to hold the name of the county for the lender and the site.

a.  Go to the output dictionary (upper right pane) and scroll down the list to the bottom.
b.  Create two new fields by right clicking in the space below the last field and select "New Field" form the fly menu.

c.  Name the first new field "LenderCounty" and the second new field "SiteCounty", set the type to "string", and set the widths to 20.  Note:  You may choose any names that fits your particular naming convention.  You may also choose any widths, but the width of the data source for the lookup is 20 so you can be assured it will not produce wider data.

d.  Right click on the new fields and choose "Create Script Conversion" from the fly menu.  The script tab will open and the following text will be inserted automatically:

'----------------- OnRecord -----------------
Function OnRecord
DictOut.LenderCounty.value = value
DictOut.SiteCounty.value = value
End Function


7. Create a DB Data Object Variable to represent the external look up file.  This file contains the county names that we want to add. Note: If the variable pane is hidden, click the blue pencil icon to restore it.

a.  Right click on the left side of the scripting pane under the word "Variable" and from the fly menu choose "Data Properties" then "DB".

b.  This will create a DB Variable called "DB".  Click on the name and change it to "CA_ZIP"

c.  Double click on the new variable and the "Property Editor" dialog will appear.  On the "Data Source" tab, click the open data source icon and open the file "ca_zip.dbf" provided as a sample file with your software.  location:  InstallDir\Samples\TextConverter\Automatic mode\2 With script

It took about 10 minutes to create the ca_zip.dbf file.  First, we found a web page with city, ZIP, and county for California.  Next we downloaded the web page HTML file and extracted the data with TextConverter.  Finally, we set the output data source to the file ca_zip.dbf and ran the extraction (see Step 7.1 Extract data from webpages for a detailed explanation on how to do this).

8.  Use the new variable to find the county and load the new field.

a.  Add a line to the script to find the record where the ZIP code in Field_15 (the Lender ZIP code) matches the ZIP code in the look up file.  We will use ".scope.Set" and a SQL expression to find each match.  Now go to the end of line two "Function OnRecord", hit return, and type CA_ZIP. (see the fly menu pop up when you add the dot), choose or type "scope.", (menu again) choose "Set", then the expression "zip like '" & Left(DictOut.Field_15.value, 4) & "%'".  You can even save some typing by dragging in Field_15 from the output dictionary.

In Step 4 we combined the values from several input fields into one output field.  Notice how "&" is used in the same way in this exercise to build the SQL expression for the scope set.  This is the ease and power of Visual Basic (VB).

b.  Add another line and type the If Then statement that will load the county data.  It goes like this If CA_ZIP.Select() Then
DictOut.Lender_County.value = Leave this part and replace the placeholder "value" with CA_ZIP.GetCellValue("County")
End If

c.  The whole thing looks like this:

'----------------- OnRecord -----------------
Function OnRecord
CA_ZIP.scope.Set "zip like '" & Left(DictOut.Field_15.value, 4) & "%'"

If CA_ZIP.Select() Then
    DictOut.LenderCounty.value = CA_ZIP.GetCellValue("COUNTY")
End If
End Function


e.  Now copy and paste to create a second set of the four lines between "Function OnRecord" and "End Function", then edit these new lines using "Field_34" and "SiteCounty".  The final version of the script should look just like the sample provided.  Here it is:

'----------------- OnRecord -----------------
Function OnRecord
CA_ZIP.scope.Set "zip like '" & Left(DictOut.Field_15.value, 4) & "%'"

If CA_ZIP.Select() Then
    DictOut.LenderCounty.value = CA_ZIP.GetCellValue("COUNTY")
End If

CA_ZIP.scope.Set "zip like '" & Left(DictOut.Field_34.value, 4) & "%'"

If CA_ZIP.Select() Then
    DictOut.SiteCounty.value = CA_ZIP.GetCellValue("COUNTY")
End If

End Function


d.  Click the "Check Syntax" button in the lower right corner to check for errors in syntax.  Click the preview tab to see the results (you will have to scroll all the way to the right to see the new county field).

9.  Now you are ready to connect the output data source and run the extraction process.

Save your project - TextConverter stores all of the current settings in a project file.  To save your project, choose "Save Project" form the file menu or click the disc icon () on the toolbar.  Elements of a project include:

  • path to the input data source
  • path to the output data source
  • paths to all other files and databases used in the project
  • the complete script
  • the mapping of the input dictionary to the output dictionary
  • All output dictionary settings
  • All "options" settings
  • Any other ETL settings
  • The workspace layout is NOT saved as part of the project but is instead retained with TextConverter
Even though these are just sample extractions and transformations designed to help teach TextConverter users about scripting, it was interesting for me to see that none of the lenders on these properties in foreclosure were located in the same county as the properties.

Learn more with Step 7.1 Extract data from a webpage
Comments