May 2009 Tip of the month: How to use automatic extraction to load PDF data to a database Visit SiMX.com to download and install TextConverter. TextConverter can perform Extraction Transformation and Loading (ETL) from popular document formats including PDF, DOC, RTF, XLS, HTML, CSV and txt. It uses Artificial Intelligence (AI) to extract and transform data from the input data source. Use either sample input PDF provided. Set the "Open File As" to "Pdf". Load the sample file by dragging and dropping it into TextConverter. Under Templates (either in the Options Pane or the Input Pane), select Generate templates... The template that best suits the input data will be chosen by default. In the example provided the pattern was easily recognized by TextConverter's AI. Sometimes a single file does not have enough information to allow the AI to set all the fields in the data dictionary correctly. When this occurs, you can make changes to the fields in the output data dictionary or extract all or any portion of the data manually. Each time you change a setting in the conversion options the input data source will reload but the output dictionary is NOT reset. To reset the output dictionary, click the reset dictionary icon on the tool bar. Save your project - TextConverter stores all of the current settings in a project file. To save your project, choose "Save Project" form the file menu or click the disc icon (
|