Step 3. Edit the Output Dictionary

In this exercise we change the names, widths, and types of fields in the output dictionary.  After data is extracted from the input data source, input and output dictionary's are automatically created and connected.  Editing the output dictionary, allows the input data to better load into the designated output database or output data file.  Changing the type can transform the data without altering the intended values.

The range of files supported by TextConverter varies from structured delimited files like CSV, to tag based text like QuickBooks or MS Money, to loosely structured heterogeneous input with multilevel data. Use the Input pane to load a file or drag and drop a text file from Windows Explorer into the TextConverter frame. TextConverter can perform ETL by extracting data from popular document formats including PDF, DOC/DOCX, RTF, XLS,/XLSX, HTML, CSV and free formatted text files.

TextConverter uses Artificial Intelligence (AI) to to extract and transform data from the input data source.  Sometimes a single file does not have enough information to allow the AI to set all the fields in the data dictionary correctly.  When this occurs, you can make changes to the fields in the output data dictionary.

location:  InstallDir/Samples/TextConverter/Automatic Mode/

project file:  InstallDir/Samples/TextConverter/Automatic Mode/1 Without script/3 EditOutputDictionary.ConverterX


1. Clear the project by pressing the New button () in the tool bar or by selecting "New" from the "File" menu.

2. Make sure the "Open file as" option is set to "Pdf".

3.  Open the sample file provided with the software named "dataquick.pdf"

4.  Select "Generate Templates ..." from either the Options or Input Pane. TextConverter selects a template that best fits the input file's structure by default.

5.  Click the "Reset Output Dictionary" icon to create a linked input dictionary using the new template.

 

 

The input dictionary is generated automatically based on the data being extracted and the options selected.  The input dictionary is connected, initially, to an output dictionary with the same characteristics.  The output dictionary can be edited by the user.  Any field in the output dictionary can be connected to any field in the input dictionary.  If the output field does not enjoy a simple one to one relationship with the input field, a script can be inserted to transform extracted data into the output field.

Most of the field names in this example are extracted properly by TextConverter's AI but a few of the names and types need to be adjusted.  The specific adjustments depend on how you plan to use the extracted data in your output database.

6.  Go to the Output Dictionary (upper right pane) and scroll down to "Field_13".  Click on the name of the filed to edit it.  Change the name of this field to "Trustee City" (or any name you feel is appropriate).  Next change "Field_14" to "Trustee State".  Make the following additional changes to the Output Dictionary:

Change Field_15 to Trustee ZIP
Change Field_19 to Trustee Ph
Change Field_25 to Site CityState - In Step 5. Split one field into three, we will split a field like this into its parts.
Change Field_26 to Site ZIP
Change Field_27 to Loan Number and change the type from "Numeric" to "String"
Change Field_33 to Mail CityState - In the next step, we will split this into two fields for City and State.
Change Field_34 to Mail ZIP
Change Field_35 to Lender Street
Change Field_38 to Lender CityState - Notice how this field inherited a name from a blank label to it's left in the original document.  You can use the option for Continuous Learning Mode to train the AI extraction on a larger set of documents.  This training will help to avoid automatic extraction mistakes related to fields that are only occasionally filled in.
Change Field_39 to Lender ZIP
Change the type on "1. Parcel" from "numeric" to "string".
Change the type on "3. Default" from "numeric" to "string".

    7.  Now you are ready to connect the output data source and run the extraction process.

    Save your project - TextConverter stores all of the current settings in a project file.  To save your project, choose "Save Project" form the file menu or click the disc icon () on the toolbar.  Elements of a project include:

    • path to the input data source
    • path to the output data source
    • paths to all other files and databases used in the project
    • the complete script
    • the mapping of the input dictionary to the output dictionary
    • All output dictionary settings
    • All "options" settings
    • Any other ETL settings
    • The workspace layout is NOT saved as part of the project but is instead retained with TextConverter

    Learn more with Step 4. Combine four fields into one

    Step 3 Video


    Comments