TextConverter's Options are located in the lower right pane of the user interface (UI). These options can be adjusted either through a drop down list (click the small triangle), by typing into the space provided, or through a combo box where there is both a drop down and the option to type. The triangle for the drop down list or combo box is only shown when the particular option/row is selected. The options displayed are adjusted based on the up line conversion option settings and the format of the input file. The image below shows typical options (click here to view options for a text input file): The blue vertical pencil in the upper right of the pane is used to show/hide the pane. This allows you to hide the options pane if you require more workspace for the output preview. Open file as - can be set to By File Extension, Txt, Csv, Html, Pdf, Doc/DocX/Rtf, or Xls. Template - A template is a parsing overlay exposing the input document as a set of input dictionary fields and record(s) automatically generated through artificial intelligence (AI). When inserting a new input file, click the triangle and select Generate Templates... for a list of templates to use with your data. Manual Mode - This is used to optimize parsing with script. The document APIs that allows TextConverter to extract text from PDFs, HTML files, Word Docs, and Excel spreadsheets allow the text and associated spaces to be interpreted in more than one way. The default Manual Mode is usually the best mode for viewing the extracted input text but if the spaces are not in the correct place for your parsing task or the text does not line up properly, try each of the other modes until you find one that is optimum for your input document. As shown in the image above, "compact" is the default manual mode for PDFs. Learning - Selects the mode of training used by TextConverter's artificial intelligence to select and create templates (see the Template option below). Learning can be set to:
Initial Single-form - The artificial intelligence trains to the number of lines to preview depth for the current input data source only.
Initial Multi-form - The artificial intelligence trains to the number of lines to preview depth for multiple records.
Continuous - The artificial intelligence continues to train against each new input data file that is opened to the number of lines to preview depth. Off - No further changes are made to the template(s). Exclude irrelevant fields - This check box prevents irrelevant fields from appearing in the data and the input dictionary.
Irrelevant fields are typically formatting characters that are part of
the recurring pattern but are neither labels nor values. For example: "********" or "-----------".
None - All data are considered values.
All Constants - Data that reoccurs are considered field names. Constants With : - Data that reoccurs and are followed by a : (colon ). Scanned image - Check this box if the source of
your input Pdf is a scanned image. The Scanned image setting
determines whether or not font variations are considered for pattern
detection during automatic extraction. Fonts are not considered for
scanned images but may be used if this box is not checked.
Trim spaces - When checked, this tells
TextConverter to automatically trim spaces from the source data. This
option is available for Pdf, Xls, HTML, and Doc/Docx/Rtf.
Remove Header/Footer - Toggles the visbility of headers and/or footers in the input preview pane.
Exclude Head Pages - Removes head pages without data being used from the the preview.
Exclude Tail Pages - Removes tail pages without data being used from the preview.
Show all lines - This check box allows
you to include lines that do not conform to the selected automatic
template in the conversion. By default, this box is not checked and
lines that do not conform to the template are skipped.
Exclude lists -
This check box allows you to exclude certain portions of the input data
source document that would otherwise be extracted when working with templates. A list is usually identified when working with templates as sequences of lines varying in number and having the
same fields. The box is unchecked for the default setting.
First Page - Put the page number to start the conversion here. For example: to skip a cover page, start on page 2 (available only for Pdf).
Last Page - Put the page number to end the conversion here. Leave this blank if your documents vary in length or if you wish to continue conversion to the end of the document (available only for Pdf). Pages for Preview/Learning - The words displayed for this option change based on the file format. This option selects the number of pages (or lines) displayed in the input and preview panes and, when working with templates, the depth of analysis for learning mode template creation and selection. This may impact performance noticeably depending on your computing platform and the complexity of your project. Pages for Preview/Learning is displayed for formats other than Pdf.
/Learning is omitted for text files Preview Records #, Number of lines to preview are used in prior versions. Number of lines to skip - As some files may begin
with additional unnecessary information, this option allows you to
specify the number of records to skip in the input file before starting
the conversion.
Max Record size, KBytes - Used for safety purposes to handle situations when a record delimiter is incorrectly set and an input record can be too long. In general, this option does not need to be adjusted except in situations when records are expected to exceed 32 kilobytes in length. Preview auto update - When this box is checked, the preview automatically updates with every change that affects the conversion. This may impact performance noticeably depending on your computing platform and the complexity of your project. Append to the existing table - When the output
data table already exists, check this option to append new records.
Unchecked, loading will replace values in a preexisting table.
SQL compatibility - Wwhen this box is checked, output data is transformed to avoid any SQL syntactical incompatibility. The single quotes used in SQL are replaced with a similar character that is compatible with SQL use. Autobuffer - Automatically prepares the data going into the output to increase efficiency and lowers the amount of system resources being used. If selected, the number of lines to be buffered can be adjusted in the following option, "Buffer size."Options for a Text Input File Open file as - can be set to By File Extension, Txt, Csv, Html, Pdf, Doc/DocX/Rtf, or Xls. Template - A template is a parsing overlay exposing the input document as a set of input dictionary fields and record(s) automatically generated through artificial intelligence (AI). When inserting a new input file, click the triangle and select Generate Templates... for a list of templates to use with your data. Record Delimiter - Defines how the input text lines are divided into input records. Click the drop-down arrow button to see the list of the most suitable delimiters. If the drop-down list does not contain the delimiter you are looking for, you can input any delimiter by typing in the box. You can have an arbitrary text string as a record delimiter. After you make changes to the conversion options, it may be necessary to reset the output dictionary using the reset button on the tool bar. The following special delimiters are also supported:
Field Delimiter - Divides an input record into input fields. Click the drop-down arrow button to see the list of the most suitable delimiters. You can have an arbitrary text string as a field delimiter. If the drop-down list does not contain the delimiter you are looking for, you can input any delimiter by typing in the box. After you make changes to the conversion options, it may be necessary to reset the output dictionary using the reset button on the tool bar. The following special delimiters are also supported:
To use multiple field delimiters use the following syntax: (;)(,)(:) (all three characters in this sample will be used as field delimiters). Use delimiter's syntax like (<Tab>+) to indicate that one on more consecutive occurrences of the <Tab> symbol should be treated as a field delimiter. To use '(', ')', '+' characters as field delimiters, prepend them with the '%' character (%(%)). To use the '%' character as a field delimiter, enter it twice: (%%). Text Qualifier - A character separating blocks of text, within which field and record delimiters are ignored. For example, "Dear John, How are you?" should be treated as a single field even if the Comma (,) is used as a field delimiter. For that example, the text qualifier should be set to Double Quotes (") to qualify that particukar comma as text and not a delimiter. When using commas as a delimiter (such as a CSV) and a comma is in the actual data, TextConverter will automatically insert text qualifiers even if you do not select them. Skip Empty Lines - This check box automatically removes empty lines from the conversion. Uncheck this box if you wish to include empty lines from the original input data. In some versions of TextConverter this option is called "Keep original lines." Get Field Names from the Line - Enter an input record number that contains field names. Lines for Preview - Enter the number of lines to be shown in the output preview pane. Number of lines to skip - As some files may have additional information in the beginning, this option allows you to specify the number of records to skip in the input file before starting the conversion. Max Record Size, KBytes - Used for safety purposes to handle situations when a record delimiter is set incorrectly and an input records can be too long. In general, this option need not be adjusted except in situations when records are expected to exceed 32 kilobytes in length. Preview Auto Update - When this box is checked,
the preview is automatically updated with every change that affects the
conversion. This may impact performance noticeably depending on your
computing platform and the complexity of your project. SQL compatibility - When this box is checked, output data is transformed to avoid any SQL syntactical incompatibility. The single quotes used in SQL are replaced with a similar character that is compatible with SQL use. Autobuffer - Automatically prepares the data going into the output to increase efficiency and lowers the amount of system resources being used. If selected, the number of lines to be buffered can be adjusted in the following option, "Buffer size." |
