Step 1. Load an input file

This step simply shows how to open an input file for extraction with TextConverter.  If you already know how to open a file for ETL in TextConverter, you can skip this step.

The range of files supported by TextConverter varies from well structured delimited files like CSV to tag based text like QuickBooks or MS Money to loosely structured heterogeneous input containing multiple level data. Use the Input pane to load a file or just drag and drop a text file from Windows Explorer into the TextConverter frame. TextConverter 2.2 can perform ETL by extracting data from popular document formats including PDF, DOC, RTF, XLS, HTML, CSV and free formatted text files.

1.  Clear the project by pressing the New button () in the tool bar or by selecting "New" from the "File" menu.
 
2.  Make sure the "Open file as" option is set to "Csv".
 
3.  Open the sample file provided with the software named "1.1 products.csv"
 
 

As soon as the text is extracted from the input file and loaded in TextConverter, the following work is done automatically:

- the Input records are generated according to the current record delimiter
- the Input dictionary is initialized according to the current field delimiter (See input for details)
- the Output dictionary is created, the number of the output fields corresponds to the number of the input fields (See output for details)
- the data types for Output fields are assigned as a result of the automatic data type recognition procedure
- a List of the most suitable record delimiters is collected and shown in the options pane
- a List of the most suitable field delimiters is collected and shown in the options pane 
- a Tabular representation of the input text is shown in the left top pane (See input for details)
- Conversion preview is shown in the preview pane

Here is the screen you will see after the file is loaded:


In the example provided the relevant delimiter pattern was easily recognized by TextConverter's built in AI.  If you have a file that where these setting must be made manually, proceed to Step 2 (of the parsing with script tutorial) - Setting up a record delimiter.

Each time you change a setting in the conversion options the input data source will reload but the output dictionary is NOT reset.  To reset the output dictionary, click the reset dictionary icon on the tool bar.

Save your project - TextConverter stores all of the current settings in a project file.  To save your project, choose "Save Project" form the file menu or click the disc icon () on the toolbar.  Elements of a project include:

  • path to the input data source
  • path to the output data source
  • paths to all other files and databases used in the project
  • the complete script
  • the mapping of the input dictionary to the output dictionary
  • All output dictionary settings
  • All "options" settings
  • Any other ETL settings
  • The workspace layout is NOT saved as part of the project but is instead retained with TextConverter

Learn more with Step 2. Change the field order

For an example of automatic extraction from a PDF file, watch the video.

Comments