Chapter 2: File Organizations

This chapter describes the file organizations supported by your COBOL system.

Overview

A file is a collection of data that is, typically, stored on disk. As a logical entity, a file enables you to divide your data into meaningful groups. For example, you can use one file to hold all of a company's product information and another file to hold all personnel information. As a physical entity, a file should be considered in terms of its file organization, which refers to the way in which data is stored physically in a file. This determines the way that you access the data subsequently.

This COBOL system supports three file organizations: sequential, relative and indexed. Depending upon the file organization, you have up to three ways of accessing the data:

File Organization Sequential Access Random Access Dynamic Access
Sequential Yes No No
Relative Yes Yes Yes
Indexed Yes Yes Yes

Sequential Files

A sequential file is one in which the individual records can only be accessed sequentially, that is, in the same order as they were originally written to the file. New records are always added to the end of the file.

Three types of sequential file are supported by this COBOL system:

Record Sequential Files

Record sequential files are nearly always referred to simply as sequential files because when you create a file and specify the organization as sequential, a record sequential file is created by default.

To define a file as record sequential, specify ORGANIZATION IS RECORD SEQUENTIAL in the SELECT clause for the file in your COBOL program, for example:

 select recseq assign to "recseq.dat"
     organization is record sequential.

Because record sequential is the default for sequential files, you don't need to specify ORGANIZATION IS RECORD SEQUENTIAL. As long as you do not set the SEQUENTIAL Compiler directive , you can simply use ORGANIZATION IS SEQUENTIAL.

Line Sequential Files

The primary use of line sequential files (which are also known as "text files" or "ASCII files") is for display-only data. Most PC editors, for example Notepad, produce line sequential files.

In a line sequential file, each record in the file is separated from the next by a record delimiter. The record delimiter, which is the line feed (x"0A") character, is inserted after the last non-space character in each record. A WRITE statement removes trailing spaces from the data record and appends the record delimiter. A READ statement removes the record delimiter and, if necessary, pads the data record (with trailing spaces) to the record size defined by the program reading the data.

If the record size in a line sequential file is greater than the record length, the data fills the record length and on the next READ it returns more data from that record. i.e. it uses the record length and the record delimiter.

To define a file as line sequential, specify ORGANIZATION IS LINE SEQUENTIAL in the SELECT clause for the file in your COBOL program, for example:

 select lineseq assign to "lineseq.dat"
     organization is line sequential.

Printer Sequential Files

Printer sequential files are files which are destined for a printer, either directly or by spooling to a disk file. They consist of a sequence of print records with zero or more vertical positioning characters (such as line-feed) between records. A print record consists of zero or more printable characters and is terminated by a carriage return (x"0D").

With a printer sequential file, the OPEN statement causes a x"0D" to be written to the file to ensure that the printer is located at the first character position before printing the first print record. The WRITE statement causes trailing spaces to be removed from the print record before it is written to the printer with a terminating carriage return (x"0D"). The BEFORE or AFTER clause can be specified in the WRITE statement to cause one or more line-feed characters (x"0A"), a form-feed character (x"0C") or a vertical tab character (x"0B") to be sent to the printer before or after writing the print record.

Printer sequential files should not be opened for INPUT or I/O.

You can define a file as printer sequential by specifying ASSIGN TO LINE ADVANCING FILE or ASSIGN TO PRINTER in the SELECT clause, for example:

 select printseq
     assign to line advancing file "printseq.dat".

Relative Files

A relative file is a file in which each record is identified by its ordinal position in the file (record 1, record 2 and so on). This means that records can be accessed randomly as well as sequentially:

Because records can be accessed randomly, access to relative files is fast.

Although you can declare variable length records for a relative file, this can be wasteful of disk space because the system assumes the maximum record length for all WRITE statements to the file and pads the unused character positions. This is done so that the COBOL file handling routines can quickly calculate the physical location of any record, given the record's record number in the file.

As relative files always contain fixed length records, no space is saved by specifying data compression. In fact, if data compression is specified for a relative file, it is ignored by the File Handler.

Each record in a relative file is followed by a two-byte record marker which indicates the current status of the record. The status of a record can be:

x"0A" - record present

x"00" - record deleted or never written

When you delete a record from a relative file, the record's contents are not removed immediately. The record's record marker is updated to show that it has been deleted, but the contents of the deleted record remain physically in the file until a new record is written. If you need to remove the data from the file for security reasons, follow the procedure below:

  1. Use REWRITE to overwrite the record, for example with space characters.
  2. Delete the record.

To define a relative file, specify ORGANIZATION IS RELATIVE in the SELECT clause for the file in your COBOL program.

To access records randomly, you must also:

For example:

 select relfil assign to "relfil.dat"
     organization is relative
     access mode is random
     relative key is relfil-key.
 ...
 working-storage section.
 01 relfil-key   pic 9(8) comp-x.

The example code above defines a relative file. The access mode is random and so a relative key relfil-key is defined. For random access, you must always supply a record number in the relative key, before trying to read a record from the file.

If you specify ACCESS MODE IS DYNAMIC, you can access the file both sequentially and randomly.

Indexed Files

An indexed file is a file in which each record includes a primary key. To distinguish one record from another, the value of the primary key must be unique for each record. Records can then be accessed randomly by specifying the value of the record's primary key. Indexed file records can also be accessed sequentially.

As well as a primary key, indexed files can contain one or more additional keys known as alternate keys. The value of a record's alternate key(s) does not have to be unique.

To define a file as indexed, specify ORGANIZATION IS INDEXED in the SELECT clause for the file in your COBOL program. You must also specify a primary key using the RECORD KEY clause:

 select idxfile assign to "idx.dat"
    organization is indexed
    record key is idxfile-record-key.

Most types of indexed file actually comprise two separate files: the data file (containing the record data) and the index file (containing the index structure). Where this is the case, the name that you specify in your COBOL program is given to the data file and the name of the associated index file is produced by adding an .idx extension to the data file name. You should avoid using the .idx extension in other contexts.

The index is built up as an inverted tree structure that grows as records are added.

With indexed files, the number of disk accesses required to locate a randomly selected record depends primarily on the number of records in the file and the length of the record key. File I/O is faster when reading the file sequentially.

We strongly recommend that you take regular backups of files of all types. However, with indexed files there are events such as media corruption that can result in only one of the two files becoming unusable. If you do lose an index file, use the Rebuild utility to recover the index from the data file and so reduce the time lost due to a failure. For more information, see the chapter Rebuild.

Primary Keys

To define the primary key of an indexed file use the RECORD KEY IS clause in the SELECT clause:

 select idxfile assign to "idx.dat"
     organization is indexed
     record key is idxfile-record-key.

Alternate Keys

As well as the primary key, each record can have any number of additional keys, known as alternate keys. To define an alternate key use the ALTERNATE RECORD KEY IS clause in the SELECT clause:

 select idxfile assign to "idx.dat"
     organization is indexed
     record key is idxfile-record-key
     alternate record key is idxfile-alt-key.

Duplicate Keys

You can define keys which allow duplicate values. However, do not allow duplicates on primary keys as the value of a record's primary key must be unique.

When you use duplicate keys, be aware that there is a limit on the number of times you can specify the same value for an individual key. Each time you specify the same value for a duplicate key, an increment of one is added to the key's occurrence number. The maximum number of duplicate values permitted for an individual key varies according to the type of indexed file. For a full list of indexed file types and their characteristics, see the topic Types of Indexed Files.

The duplicate occurrence value is a unique identifier, which is added to the key, which has duplicates in order to make it unique. When a new key is added, where keys of the same value exist, then the occurrence value of the last added key is incremented by one and used for the newly added key. If keys are deleted, no change is made to the keys, which are not deleted and so if the highest occurrence of the series of duplicates is not deleted, then the highest value does not change. If the highest occurrence is deleted, then the highest occurrence value reduces to the next lower still existing occurrence value.

Your COBOL system uses the occurrence number to ensure that duplicate key records are read in the order in which they were created. Because of this, you cannot reuse an occurrence number whose record you have deleted. Therefore, you can reach the maximum number of duplicate values, even if some of those keys have already been deleted.

Some types of indexed file contain a duplicate occurrence record in the data file. Where an indexed file contains a duplicate occurrence record, each record in the data file is followed by a system record. This system record holds, for each duplicate key in that record, the occurrence number of the key. This number is just a counter of the number of times that key value has been used during the history of the file. The presence of the duplicate occurrence record makes REWRITE and DELETE operations on a record with many duplicates much faster but causes the data records of such files to be larger than those of a standard file.

To enable duplicate values to be specified for alternate keys, use WITH DUPLICATES in the ALTERNATE RECORD KEY clause in the SELECT clause:

 file-control.
     select idxfile assign to "idx.dat"
         organization is indexed
         record key is idxfile-record-key
         alternate record key is idxfile-alt-key 
                                 with duplicates.

Sparse Keys

A sparse key is a key for which no index entry is stored for a given value of that key. For example, if a key is defined as sparse when it contains all spaces, index entries for the key are not included when the part of the record it occupies contains only space characters.

Only alternate keys can be sparse.

Using this feature results in smaller index files. The larger your key(s) and the more records you have for which the alternate key has the given value, the larger your saving of disk space.

To enable sparse keys, use SUPPRESS WHEN ALL in the ALTERNATE RECORD KEY clause in the SELECT clause:

 file-control.
     select idxfile assign to "idx.dat"
         organization is indexed
         record key is idxfile-record-key
         alternate record key is idxfile-alt-key
                                 with duplicates 
                                 suppress when all "A".

In this example, if a record is written for which the value of the alternate key is all A's, the actual key value is not stored in the index file.

Indexed File Access

You can use both the primary and alternate keys to read records from an indexed file, either directly (random access) or in key sequence (sequential access). The access mode can be:

The method of accessing an indexed file is defined using the ACCESS MODE IS clause in the SELECT clause, for example:

 file-control.
     select idxfile assign to "idx.dat"
         organization is indexed
         access mode is dynamic
         record key is idxfile-record-key
         alternate record key is idxfile-alt-key.

Fixed-length and Variable-length Records

A file can contain:

Using variable-length records might enable you to save disk space. When you use fixed-length records, you need to make the record length equal to the length of the longest record. If your application generates many short records with occasional long ones, using fixed-length records wastes a lot of disk space, so variable-length records would be a better choice.

The type of record is determined as follows:

To use: Specify the clause:
Variable-length records RECORDING MODE IS V
Fixed-length records RECORDING MODE IS F

Otherwise:

To use: Specify the clause:
variable length records RECORD IS VARYING
fixed length records RECORD CONTAINS n CHARACTERS

Otherwise:

To use: Specify the Compiler directive:
variable length records RECMODE"V"
fixed length records RECMODE"F"

Otherwise, to use variable length records, specify the RECMODE"OSVS" Compiler directive plus one of the following:

File Headers

A file header is a block of 128 bytes at the start of the file. Files with the following file organizations contain file headers:

In addition, each record in these files is preceded by a two- or four-byte record header.

Further detail on file and record headers and the structure of files with headers is available in the reference help topics Files with Headers


Copyright © 2009 Micro Focus (IP) Ltd. All rights reserved.