Step 2. Setting up a record delimiter

A list of potential record delimiters is automatically generated and shown in the options pane. The Input pane shows a tabular representation of the input text. Each line in that view is called an input record. The purpose of record delimiters is to separate data in the original text into input records. It is convenient to see the original input text and its tabular representation at the same time while working with the record and field delimiters. Click on the Open Input File icon (button) to see the original input text.


You would use the <End Line> record delimiter in most cases when a data segment corresponding to a single record occupies a single line of the input text file. Use the options pane to assign a record delimiter.

You can have an arbitrary text string as a record delimiter.  The following special delimiters are also supported:

<End Of Line> -  <CR>, <LF> or  their combination 
<Empty Line> -  Empty Line
<Tab> -  Horizontal tab
<VTab> -  Vertical tab
<CR> -  Carriage return
<LF> -  Line feed
<FF> - Form feed

The cases when a data segment corresponding to a single record occupies multiple lines in the input text fall into one of the following three categories:

1. A special character delimiter

A special character indicates a new data segment corresponding to a record.  In the following sample, the @ symbol would be an appropriate record delimiter:

@  Author="Aberer, Karl",
   Title="The Use of Object-oriented Data Models in Biomolecular Databases",
   BookTitle="Conf. on Object-Oriented Computing in the Natural Sciences",
   Address="Heidelberg, Germany",
   Year=1994
@  Author="Aberer, Karl and Hemm, Klemens",
   Title="Semantic optimisation for biomolecular queries in object-oriented database systems",
   BookTitle="2nd Meeting on Interconnection of Molecular Biology Databases",
   Address="Cambridge, UK",
   Year=1995

2. Tag based.

QuickBooks, MS Money and many other systems use this type of format.  A pattern analysis of this sample suggests that <STMTTRN> as a record delimiter will break the input text into segments to form input records.

<STMTTRN>
<TRNTYPE>DEBIT</TRNTYPE>
<DTPOSTED>20051031</DTPOSTED>
<TRNAMT>-108.40</TRNAMT>
<FITID>1</FITID>
<CHECKNUM>0</CHECKNUM>
<NAME>CUSTOMER CHEQUE 88</NAME>
<MEMO>CUSTOMER CHEQUE 88</MEMO>
</STMTTRN>
<STMTTRN>
<TRNTYPE>CREDIT</TRNTYPE>
<DTPOSTED>20051028</DTPOSTED>
<TRNAMT>88.55</TRNAMT>
<FITID>2</FITID>
<CHECKNUM>0</CHECKNUM>
<NAME>BRANCH DEPOSIT ASHMORE</NAME>
<MEMO>BRANCH DEPOSIT ASHMORE</MEMO>
</STMTTRN>

3. A variable record delimiter.

A variable record delimiter might change from one record to the next.

113279 ABRAMSON, IVAN MOTHER OF TRUTH  
Used in Good Condition. Literary Press Publishers. n.p.: (1929).
126778 ADDAMS, CHARLES, ILLUSTRATED BY CHAS ADDAMS CREATURE COMFORTS  
Used in Good Condition. Simon and Schuster, New York, 1981. first printing.  
122144 AESOP FABLES OF AESOP 
Used in Good Condition - has some general wear. Philadelphia: Henry Altemus, 1900.

For the more complex cases like this one, a record separation criterion must be identified. We can see, that in this sample, a numeric value begins each new segment of data (a book description). To correctly solve this problem, we should implement the IsNewRecord context method:

Function IsNewRecord( text_line )
Dim words
words = target.SplitString( text_line )
If IsArray( words ) And IsNumeric( words(0) ) Then IsNewRecord = true Else IsNewRecord = false
End Function 

 

Now we can proceed to Step 3 - Setting up field delimiter(s)

(or go back to Step 5. Modify the field formats.)

Related Sections

Samples and Walkthroughs

Setting up an ETL process:  Step by Step

Comments