This chapter explains the concepts of the COBOL language.
The most basic and indivisible unit of the language is the character. The set of characters used to form COBOL character-strings and separators includes the letters of the alphabet, digits and special characters, and is defined below:
| Character | Meaning |
|---|---|
| 0 to 9 | Digits |
| A to Z | Upper-case letters |
| Lower-case letters | |
| Space | |
| + | Plus sign |
| - | Minus sign or hyphen |
| * | Asterisk |
| / | Oblique stroke/slash |
| = | Equal sign |
| $ | Dollar sign |
| . | Period or decimal point |
| , | Comma or decimal point |
| ; | Semicolon |
| " | Quotation mark |
| Apostrophe | |
| ( | Left parenthesis |
| ) | Right parenthesis |
| > | Greater than symbol |
| < | Less than symbol |
| Colon | |
| Ampersand | |
| Underscore |
Lower-case letters can be used in
character strings and text words; except when used in nonnumeric literals and
except for some picture symbols, each lower-case letter is equivalent to the
corresponding upper-case letter.
This COBOL implementation is restricted to the above character set, but the content of nonnumeric literals, comment lines, comment entries and data can include any of the characters available under the character encoding scheme used for the COBOL compilation group. (See the topic Character Sets and Collating Sequences.)
The individual characters of the language are concatenated to form character-strings and separators. A separator can be concatenated with another separator or with a character-string. A character-string can be concatenated only with a separator. The concatenation of character-strings and separators forms your source text.
A separator is a string of one or more punctuation characters. The rules for formation of separators are:
a list of
function arguments, reference modifiers,
arithmetic expressions, or conditions.
![]()
![]()
![]()
Either an
apostrophe or a quotation mark may be used as the quotation symbol character in
opening and closing delimiters.
The opening delimiters of literals are:
The closing delimiters of literals are:
The opening delimiter must be immediately preceded by a space, left parenthesis or opening pseudo-text delimiter. The closing delimiter must be immediately followed by one of the separators space, comma, semicolon, period, right parenthesis or closing pseudo-text delimiter. Separators immediately preceding the opening delimiter are not part of the opening delimiter. Separators immediately following the closing delimiter are not part of the closing delimiter.
The space immediately preceding the
opening pseudo-text delimiter can be omitted.
Pseudo-text delimiters can appear only in balanced pairs delimiting pseudo-text
and
verb-signatures
. (See the topic Source Text Manipulation and the topic Method Interface Definition.)
Any punctuation character which appears as part of the specification of a PICTURE character-string (see the topic The PICTURE Clause) or numeric literal is not considered to be a punctuation character, but rather a symbol. PICTURE character-strings are delimited only by the separators space, comma, semicolon, or period.
The rules established for the formation of separators do not apply to the characters which comprise the contents of nonnumeric literals, comment-entries, or comment lines.
A character-string is a character or a sequence of contiguous characters forming a COBOL word, a literal, a PICTURE character-string, or a comment-entry. A character-string is delimited by separators.
A COBOL word is a character-string
of not more than 30 characters which forms a compiler-directive word, a
context-sensitive word, user-defined word, a system-name, a reserved word, or
an intrinsic-function-name. Each character of a COBOL word that is not a
special character word is selected from the set of letters, digits, the hyphen
and the
underscore
.
The hyphen
or the
underscore
may not appear as the first or last character in such words. Each lower-case letter is considered to be equivalent to its corresponding upper-case letter.
The character-string may contain
31 characters.
Within a source element the following apply:
User-Defined Words: A user-defined word is a COBOL word that must be supplied by the user to satisfy the format of a clause or statement.
The types of user-defined words are:
Within a given source element the following user-defined words are grouped into the following disjoint sets:
constant-names,
data-names,
property-names,
record-names,
![]()
split-key-names
typedef-names
All user-defined words, except segment-numbers and level-numbers, can belong to one and only one of these disjoint sets. Furthermore, all user-defined words within a given disjoint set must be unique, except as specified in the section Uniqueness Of Reference.
With the exception of paragraph-name, section-name, level-number and segment-number, all user-defined words must contain at least one alphabetic character
or one occurrence
of the hyphen character
.
Segment-numbers and level-numbers need not be unique; a given specification of a segment-number or level-number can be identical to any other segment-number or level-number and can even be identical to a paragraph-name or section-name.
The following user-defined words are externalized to the operating environment:
![]()
class-names,
function-prototype-names,
interface-names,
![]()
method-names,
program-prototype-names,
user-function-names
.
If a literal is specified in place of or in addition to one of these names, the content of the literal is the name that is externalized to the operating environment in a case-sensitive manner. If no literal is specified, the externalized name is created by folding the name to upper case. The Compiler directive FOLD-CALL-NAME can be used to control the case of externalized class-names, interface-names and program-names. The F and U options of the Compiler directive OOCTRL can be used to control the case of externalized method-names. +F and -U cause method-names to be folded to lower case, which is the default.
| Condition-name: | A condition-name is a name which is assigned to a specific
value, set of values, or range of values, within a complete set of values that
a data item can assume. The data item itself is called a conditional variable.
Condition-names can be defined in the Data Division or in the Special-Names
paragraph within the Environment Division where a condition-name must be
assigned to one or both of the ON STATUS or OFF STATUS of the run-time
switches.
A condition-name is used only as follows:
|
| A constant-name is a name which is assigned as the name of a fixed value. | |
| Mnemonic-name: | A mnemonic-name assigns a user-defined word to an implementor-name. These associations are established in the Special-Names paragraph of the Environment Division. (See the topic The Special-Names Paragraph .) |
| Paragraph-name: | A paragraph-name is a word that names a paragraph in the Procedure Division. Paragraph-names are equivalent if, and only if, they are composed of the same sequence of the same number of characters. |
| Section-name: | A section-name is a word that names a section in the Procedure Division. Section-names are equivalent if, and only if, they are composed of the same sequence of the same number of characters. |
| Other user-defined words: | See the Glossary for definitions of all other types of user-defined words. |
System-names: A system-name is a COBOL word that is used to communicate with the operating environment.
System-names must contain at least one alphabetic character
or one occurrence
of the hyphen character
.
There are three types of system-names:
Within a given implementation these three types of system-names form disjoint sets; a given system-name can belong to one and only one of them.
The system-names listed above are individually defined in the Glossary.
Intrinsic-function-names: An
intrinsic-function-name is a word that is one of a specified list of words
which can be used in COBOL source elements. The same word, with the exception
of LENGTH, RANDOM and SUM, in a different context, can appear in a source
element as a user-defined word or a system-name. (See the topic
Definitions of
Functions.)
Reserved words: A reserved word is a COBOL word that is one of a specified list of words which can be used in COBOL compilation groups, but which must not appear in the compilation groups as user-defined words or system-names. Reserved words can be used only as specified in the general formats. (See the topic Reserved Words.)
The types of reserved words are:
| Key words: | A key word is a word whose presence is required when the format
in which it appears is used in a compilation group. Within each format, such
words are upper-case and underlined.
Key words are of three types:
|
| Optional words: | Within each format, upper-case words that are not underlined are called optional words and can appear at the user's discretion. The presence or absence of an optional word does not alter the semantics of the COBOL source element in which it appears. |
| Special registers: | Certain words are used to name and reference special registers: special registers are certain storage areas created by your COBOL system, whose primary use is to store information produced in conjunction with the use of specific COBOL features. They are specified in the section Special Registers. |
| Figurative constants: | Certain reserved words are used to name and reference specific constant values. These reserved words are specified in the section Figurative Constant Values. |
| Special character words: | The arithmetic operators and relation characters are reserved words. |
Certain reserved words are used as predefined
object identifiers. The predefined object identifiers are:
|
Context-sensitive Words: A
context-sensitive word is a COBOL word that is reserved only in the general
formats in which it is specified. The same word may also be used as an
intrinsic-function-name, a user-defined word or a system-name.
Context-sensitive words and the contexts in which they are reserved are
specified in the section
Context-sensitive Words
Table in the appendix Reserved Words.
When source elements are directly or indirectly contained within other source elements, each source element can use identical user-defined words to name items independent of the use of these user-defined words by other elements. (See the discussion of user-defined words in the section COBOL Words.) When identically named items exist, a source elements's reference to such a name, even when it is a different type of user-defined word, is to the item which that source element describes rather than to the item possessing the same name, described in another source element.
Delegate-names, enum-names and valuetype-names are special cases of class-names for object orientation.
The following types of user-defined words can be referenced only by statements and entries in the source element in which the user-defined word is declared:
The following types of user-defined words can be referenced throughout a compilation group:
The following types of names, when declared in a Configuration Section, can be referenced only by statements and entries either in the source element that contains that Configuration Section or in any source element contained within that source element:
Specific conventions, for declarations and references, apply to the following types of user-defined words when the conditions listed above do not apply:
The program-name of a program is declared in the Program-ID paragraph of the program's Identification Division. A program-name can be referenced only by the CALL statement,
the CHAIN
statement,
the CANCEL statement,
the SET statement
and the END PROGRAM header. If two programs in a run unit are identically named, at least one of those two programs must be directly or indirectly contained within a separate program which does not contain the other of those two programs.
The following rules regulate the scope of a program-name:
![]()
![]()
or,
if the program possesses the recursive attribute, in the program
itself
.
![]()
![]()
, except that the
program possessing the common attribute and any programs contained within it
may reference the program-name only if the program possesses the recursive
attribute.
For example, suppose that ProgA contains ProgB and ProgC, ProgC contains ProgD and ProgF, and ProgD contains ProgE (see Figure 1

Figure 1: Example of Scope of Program-Names
If ProgD does not possess the COMMON attribute, then ProgD can only be referenced by the program that directly contains ProgD, that is, ProgC.
If ProgD does possess the COMMON attribute, then ProgD can be referenced by ProgC since it contains ProgD and by any programs contained in ProgC except for programs contained in ProgD, that is, by ProgF but not by ProgE. Also it cannot be referenced by ProgA or ProgB.
When condition-names, data-names, file-names, record-names and report-names
and typedef-names
are declared in a source element, they can be referenced only by that source element unless one or more of the names is global and the source element contains other source elements.
The requirements governing the uniqueness of the names declared by a single source element to be condition-names, data-names, file-names, record-names and report-names
and typedef-names
are explained in the discussion of user-defined words in the section COBOL Words.
A source element cannot reference any condition-name, data-name, file-name, record-name or report-name
or typedef-name
declared in any source element it contains.
A global name can be referenced in the source element in which it is declared or in any source elements which are directly or indirectly contained within that source element.
When a source element, source element B, is directly contained within another source element, source element A, both source elements can define a condition-name, a data-name, a file-name, a record-name or a report-name
or a typedef-name
using the same user-defined word. When such a duplicate-name is referenced in source element B, the following rules are used to determine the referenced item:
If a data item possessing the global attribute includes a table described with an index-name, that index-name also possesses the global attribute. Therefore, the scope of an index-name is identical to that of the data-name which names the table whose index is named by that index-name and the scope of name rules for data-names apply.
Index-names cannot be qualified.
Index-names can be qualified.
The class-name of a class referenced within a source element must be either the name of the containing class definition or declared in the Repository paragraph
or the
Class-Control paragraph
of that or a containing source element.
Within a compilation group, there must be at most one class definition for a given class-name.
The interface-name of an interface referenced within a source element must be either the name of the containing interface definition or declared in the Repository paragraph of that or a containing source element.
Within a compilation group, there must be at most one interface definition for a given interface-name.
A class-name or interface-name declared in the Repository paragraph of a source element may be used in that source element and any nested source element.
A class-name or interface-name declared
in the Class-Control paragraph of a source element may be used in that source
element and any nested source element.
A method-name of a method is declared in the Method-ID paragraph. A method-name must be referenced only by the INVOKE statement, an inline method invocation and the end method header.
The methods declared in a class definition must have unique method-names within that class definition. The methods declared in a child class may have the same name as a method in the parent class, subject to the conditions for the Method-ID paragraph.
The methods declared in an interface definition must have unique method-names within that interface definition. The methods declared in an inheriting interface can have the same name as a method in the inherited interface, subject to the conditions stated for the Method-ID paragraph.
Function-prototype-names referenced within a source element must be either the name of the containing function definition or declared in the Repository paragraph of that or a containing source element.
If a function prototype is specified in a Repository paragraph and the function prototype declaring the same function-prototype-name is also specified within the same compilation group, the function prototype specification is used and the information in the external repository for this prototype is ignored.
Program-prototype-names referenced within a source element must be either the program-name of a containing program definition or a program-prototype-name declared in the Repository paragraph
If a program prototype is specified in a Repository paragraph and the program prototype declaring the same program-prototype-name is also specified within the same compilation group, the program prototype specification is used and the information in the external repository for this prototype is ignored.
A literal is any of:
Every literal belongs to one of three types; nonnumeric, numeric and national.
A nonnumeric literal is a character-string delimited at both ends by quotation marks
![]()
![]()
![]()
or
apostrophes
and consisting of any allowable character in the computer's character set. Nonnumeric literals may be of 1 to 160 characters in length. Whether quotation marks
![]()
![]()
![]()
or
apostrophes
are used as delimiters, the presence of that delimiter within a nonnumeric literal can be represented by two contiguous occurrences. The presence of the character that is not serving as the delimiter is represented by a single occurrence. The value of a nonnumeric literal in the run-time element is the string of characters itself, except:
All other punctuation characters are part of the value of the nonnumeric literal rather than separators; all nonnumeric literals are category alphanumeric. (See the topic The PICTURE Clause.)
![]()
In addition, hexadecimal
binary values can be attributed to nonnumeric literals by expressing literals
as: X"nn", where each n
is a hexadecimal digit in the set 0 through 9, A through F;
nn can be repeated up to 160 times, but the number
of hexadecimal digits must be even.
The number of hexadecimal digits
may be odd.
Numeric literals can be either fixed-point or floating-point numbers.
A numeric literal is a character-string whose characters are selected from the digits 0 through 9, the plus sign, the minus sign, and the decimal point. This implementation allows for numeric literals of 1 to 18 digits in length. The rules for the formation of numeric literals are as follows:
If a literal conforms to the rules for the formation of numeric literals, but is enclosed in quotation marks, it is a nonnumeric literal and is treated as such by your COBOL system.
The size of a numeric literal in standard data format characters is equal to the number of digits specified by the user.
In addition, hexadecimal binary
values can be attributed to numeric literals by expressing literals as:
H"nn", where each
n is a hexadecimal digit in the set 0-9
A-F; nn can be repeated up to 8 times,
but the number of hexadecimal digits must be even.
Floating-Point Numeric Literals
![]()
![]()
![]()
A floating-point literal is written in the form:

If you omit a sign, the system assumes a positive number.
The significand can contain between 1 and 16 digits. A decimal point must be included in the significand.
The exponent is represented by an E followed by an optional sign and one or two digits.
The magnitude of a floating-point literal value must fall between 0.54E-78 and 0.72E+76. For values outside this range, an error message is produced and the value is replaced by 0 or 0.72E+76 respectively. You must not use a floating-point literal when an integer literal is required.
A national literal is a string of national characters represented in the storage of the computer as characters of a uniform size. See your COBOL system documentation on national data (Unicode) for further information.
The value of the literal at runtime is the string of national characters that results from converting the compile-time value of the literal to its runtime equivalent.
Figurative constant values are generated by your COBOL system and referenced through the use of the reserved words given below. These words must not be bounded by quotation marks when used as figurative constants. The singular and plural forms of figurative constants are equivalent and can be used interchangeably.
The figurative constant values and the reserved words used to reference them are shown in Table 1.
| Constant | Representation |
|---|---|
| ZERO ZEROS ZEROES | Represents the value "0", or one or more of the character "0" depending on the context. |
| SPACE SPACES | Represents one or more of the character space from the computer's character set. |
| HIGH-VALUE HIGH-VALUES | Represents one or more of the character that has the highest ordinal position in the program collating sequence. (x"FF" for the extended ASCII character set.) |
| LOW-VALUE LOW-VALUES | Represents one or more of the character that has the lowest ordinal position in the program collating sequence. (x"00" for the ASCII character set.) |
| QUOTE QUOTES | Represents one or more of the character """. The word QUOTE or QUOTES cannot be used in place of a quotation mark in a source program to bound a nonnumeric literal. Thus QUOTE ABD QUOTE is incorrect as a way of stating "ABD". |
| ALL literal | Represents one or more characters of the string of
characters comprising the literal. The literal must be either a nonnumeric
literal or
a figurative constant other than ALL literal.
|
|
Represents one or more unset pointer
values. A data item with USAGE POINTER
and with a value of NULL is guaranteed not to represent the address of any data item
The NULL value varies between environments and is generally consistent with the equivalent value used in non-COBOL languages for each environment. |
When a figurative constant represents a string of one or more characters, the length of the string is determined by your COBOL system from context by applying the following rules in order:
Use of figurative constants in
Format 3 DISPLAY statements has specific effects, described in the General
Rules for that statement.
A figurative constant can be used wherever a literal appears in a format, except that whenever the literal is restricted to having only numeric characters in it, the only figurative constant permitted is ZERO (ZEROS, ZEROES).
When the figurative constants HIGH-VALUE(S) or LOW-VALUE(S) are used , the actual character associated with each figurative constant depends upon the program collating sequence specified. (See the topics The Object-Computer Paragraph and the The Special-Names Paragraph.)
Each reserved word that is used to reference a figurative constant value is a distinct character-string, with the exception of the construction "ALL literal" which is composed of two distinct character-strings.
![]()
![]()
The value associated with the QUOTE/QUOTES figurative
constant is sensitive to the APOST and QUOTE directives.
The figurative constant ALL
literal, when associated with a numeric or numeric edited item, and when its
length is greater than one, is classed as an obsolete element in the ANSI'85
standard and is scheduled to be deleted from the next full revision of the ANSI
Standard.
All dialects within this COBOL
implementation fully support this obsolete ALL literal syntax. The FLAGSTD
directive can be used to detect all occurrences of this syntax.
Although this obsolete ALL
literal syntax is a part of the standard COBOL definition, this syntax is
explicitly excluded from the X/Open COBOL language definitions and should not
be used in a conforming X/Open COBOL source program.
Constant-names are user-defined words described in the Data Division in level-78 data description entries. A constant-name may be used wherever a literal appears in a format. Its effect is as if the literal in the VALUE clause of its data description had been written instead. A constant-name with an integer value can also be used wherever a format requires an integer; for example, as a level number or segment number, or in a PICTURE character-string.
A constant-name can only be used after it has been described; that is, it cannot be the object of a forward reference.
A concatenation expression consists of two operands separated by the concatenation operator.
Special registers are data items or transient values generated by your COBOL system and referenced through the use of their associated names or expressions (see Table 2). These special registers are subject to special rules of reference and have implicit data descriptions (PICTUREs), as individually described.
| Special Register Name or Expression | Implicit Data Description Picture | Usage |
|---|---|---|
| USAGE IS POINTER | The expression generates a pointer value representing the address
of data-name-1. The expression is explicitly shown in the general format for
statements in which it can be used. Data-name-1 must be a data item declared in
the Linkage Section with a level number or 01 or 77
|
|
| X(8) | The CURRENT-DATE special register contains the value of the current date (as supplied by the COBOL execution environment), in the form: MM/DD/YY where MM is the month number, DD is the day of the month, and YY is the year number (from 1900). CURRENT-DATE is valid only as the sending area of a MOVE statement. | |
| DEBUG-ITEM | A group item of variable size | The DEBUG-ITEM special register provides information about the conditions that caused the execution of a Debugging Section. For further information see the section Debug Module. |
| 9(9) | The expression generates a value representing the current number
of bytes of storage used by data-name-2. The expression can be used wherever a
numeric data item can be used except as a subscript or a reference modifier.
|
|
| LINAGE-COUNTER | The LINAGE-COUNTER special register is generated by the presence of a LINAGE clause in a file description entry for a record sequential file. The implicit description is that of an unsigned integer whose size is equal to the size of integer-1 or the data item referenced by data-name-1 in the LINAGE clause. | |
| S9(4) COMP
|
The RETURN-CODE special register can:
A program's RETURN-CODE special register is set to zero when that program is first entered. RETURN-CODE is valid as a data-name in a Procedure Division statement wherever an elementary data item can be referenced. |
|
| X(1) | Used to switch the character representation from double-byte characters (DBCS) back to single-byte characters (SBCS) in environments where this is applicable. | |
| X(1) | Used to switch the character representation from single-byte characters (SBCS) to double-byte characters (DBCS) in environments where this is applicable. | |
| X(8)
|
Used only during sort and merge operations. You can reference it in the Procedure Division but it will contain spaces. | |
| S9(8) COMP | Used only during sort and merge operations. You can reference it in the Procedure Division but it will contain zeros. | |
| S9(8) COMP | Used only during sort and merge operations. You can reference it in the Procedure Division but it will contain zeros. | |
| X(8) | Used only during sort and merge operations. You can reference it in the Procedure Division but it will contain spaces. | |
| S9(5) COMP | Used only during sort and merge operations. You can reference it in the Procedure Division but it will contain zeros. | |
| S9(4) COMP | SORT-RETURN can be used to cause an abnormal termination of a SORT procedure. If a value of 16 is moved into this field, the SORT operation is terminated after the next RELEASE or RETURN. | |
| 9(5) COMP | The TALLY special register contains information produced by the EXAMINE...TALLYING statement. It is valid as a data-name in a Procedure Division statement wherever an elementary data item can be referenced. | |
| 9(6) DISPLAY | The TIME-OF-DAY special register contains the value of the current time of day (24-hour clock) (as supplied by the COBOL execution environment), in the form: hhmmss where hh =hour, mm=minutes, and ss=seconds. TIME-OF-DAY is valid only as the sending area of a MOVE statement. | |
| X(20) | The WHEN-COMPILED special register contains the time and date
that the COBOL compilation group was submitted to your COBOL system, in the
form: hh.mm.ssMMM DD, YYYY where
hh=hours (24-hour clock),
mm=minutes,
ss=seconds,
MMM=month name (first 3 characters),
DD=day of month, and
YYYY=year.
WHEN-COMPILED is valid only as the sending area of a MOVE statement. |
|
| X(20) | The WHEN-COMPILED special register contains the time and date
that the COBOL compilation group was submitted to your COBOL system, in the
form: MM/DD/YYhh.mm.ss where
DD, hh, mm and
ss are as above.
YY=year in century and
MM=month in year.
WHEN-COMPILED is valid only as the sending area of a MOVE statement. |
|
| S9(9) COMP | The XML-CODE special register is used to communicate status between the XML parser and the processing procedure identified in the XML PARSE statement. The XML parser sets XML-CODE for each event and at parser termination. You can reset XML-CODE in the processing procedure to -1 after a normal event, to indicate that the parser is to terminate with a user-initiated exception, which is not an EXCEPTION XML event, indicated by the returned XML-CODE value of -1. | |
| X(30) | The XML-EVENT special register is used to communicate event information from the XML parser to the processing procedure that was identified in the XML PARSE statement. Before passing control to the processing procedure, the XML parser sets the XML-EVENT special register to the name of the XML event. XML-EVENT cannot be used as a receiving data item. | |
| The XML-NTEXT special register is defined during XML parsing to
contain document fragments that are USAGE NATIONAL. XML-NTEXT is an elementary
national data item of the length of the contained XML document fragment. The
length of XML-NTEXT varies dynamically at run time.
When the operand of the XML PARSE statement is a national data item, and for the ATTRIBUTE-NATIONAL-CHARACTER and CONTENT-NATIONAL-CHARACTER events, the XML parser sets XML-NTEXT to the document fragment associated with an event before transferring control to the processing procedure. When XML-NTEXT is set, the XML-TEXT special register has a length of zero. At any given time, only one of the two special registers XML-NTEXT and XML-TEXT has a non-zero length. Use the LENGTH function to determine the number of national characters that XML-NTEXT contains. XML-NTEXT cannot be used as a receiving item. |
||
| The XML-TEXT special register is defined during XML parsing to
contain document fragments that are of class alphanumeric. XML-TEXT is an
elementary alphanumeric data item of the length of the contained XML document
fragment. The length of XML-TEXT varies dynamically at run time.
When the operand of the XML PARSE statement is an alphanumeric data item, except for the ATTRIBUTE-NATIONAL-CHARACTER event and the CONTENT-NATIONAL-CHARACTER event, the parser sets XML-TEXT to the document fragment associated with an event before transferring control to the processing procedure. When XML-TEXT is set, the XML-NTEXT special register has a length of zero. At any given time, only one of the two special registers XML-NTEXT and XML-TEXT has a non-zero length. Use the LENGTH function or the LENGTH OF special register for XML-TEXT to determine the number of bytes that XML-TEXT contains. XML-TEXT cannot be used as a receiving item. |
Footnotes:
The format of the contents of the CURRENT-DATE special register is sensitive to the CURRENT-DATE directive.
The LENGTH OF special register may be followed by an alphanumeric literal when using the Micro Focus dialect.
The size of the RETURN-CODE special register is sensitive to the XOPEN and RTNCODE-SIZE directives.
For a list of supported exception codes, see the topic XML-CODE Exception Codes.
The contents of XML-TEXT and XML-NTEXT vary depending on the contents of XML-EVENT. See Table 3 for additional information.
| Contents of XML-EVENT | Contents of XML-TEXT or XML-NTEXT |
|---|---|
| ATTRIBUTE-CHARACTER | The single character corresponding with the predefined entity reference in the attribute value. |
| ATTRIBUTE-CHARACTERS | The value within quotes or apostrophes. This can be a sub-string of the attribute value if the value includes an entity reference. |
| ATTRIBUTE-NAME | The attribute name, the string to the left of =. |
| ATTRIBUTE-NATIONAL-CHARACTER | Regardless of the type of the XML document specified by identifier-1 in the XML PARSE statement, XML-TEXT is empty and XML-NTEXT contains the single national character corresponding with the (numeric) character reference. |
| COMMENT | The text of the comment between the opening character sequence "<!--" and the closing character sequence "-->". |
| CONTENT-CHARACTER | The single character corresponding to the predefined entity reference in the element content. |
| CONTENT-CHARACTERS | The element content between start and end tags. This can be a substring of the element content if the content contains an entity reference to another element. |
| CONTENT-NATIONAL-CHARACTER | Regardless of the type of the XML dcument specified by identifier-1 in the XML PARSE statement, XML-TEXT is empty and XML-NTEXT contains the single national character corresponding with the (numeric) character reference.6 |
| DOCUMENT-TYPE-DECLARATION | The entire document type declaration including the opening and closing character sequences, "<!DOCTYPE" and ">". |
| ENCODING-DECLARATION | The value, between quotes or apostrophes, of the encoding declaration in the XML declaration. |
| END-OF-CDATA-SECTION | Always contains the string "]]>". |
| END-OF-DOCUMENT | Null, zero-length. |
| END-OF-ELEMENT | The name of the end element tag or empty element tag. |
| EXCEPTION | The part of the document successfully scanned, up to and including the point at which the exception was detected. 7 Special register XML-CODE contains the unique error code identifying the exception.8 |
| PROCESSING-INSTRUCTION-DATA | The rest of the processing instruction, not including the chlosing sequence, "?>", but including trailing, and not leading, white space characters. |
| PROCESSING-INSTRUCTION-TARGET | The processing instruction target name, which occurs immediately after the processing instruction opening sequence, "<?". |
| STANDALONE-DECLARATION | The value, between quotes or apostrophes, of the standalone declaration in the XML declaration. |
| START-OF-CDATA-SECTION | Always contains the string "<![CDATA[". |
| START-OF-DOCUMENT | The entire document. |
| START-OF-ELEMENT | The name of the start element tag or empty element tag, also know as the element type. |
| UNKNOWN-REFERENCE-IN-CONTENT | The entity reference name, not including the "&" and ";" delimiters. |
| UNKNOWN-REFERENCE-IN-ATTRIBUTE | The entity reference name, not including the "&" and ";" delimeters. |
| VERSION-INFORMATION | The value, between quotes or apostrophes, of the version declaration in the XML declaration. This is always "1.0". |
Footnotes:
National characters with scalar values larger than 65,535 (NX"FFFF") are represented using two encoding units (a surrogate pair). You should ensure that operations on the content of XML-NTEXT do not split the pair of encoding units that together form a graphic character, thereby forming invalid data.
Exceptions for encoding conflicts are signaled before parsing begins. For these exceptions, XML-TEXT is either zero in length or contains just the encoding delcaration value from the document.
See the IBM Enterprise COBOL Programming Guide for information on XML exception codes. Any exception not documented in the IBM Enterprise COBOL Programming Guide is returned with a value of 201.
The predefined object identifiers are:
| Predefined Object Identifier | Usage |
|---|---|
| SELF | References the object on which the current method is executing. May be used in the Procedure Division of a method. References the object that was used to invoke the method in which SELF appears. If SELF is specified for a method invocation, the search for the method includes all methods declared for the object. |
| References the object that is the class object of the current object (SELF). If SELF is itself a class object, SELFCLASS is the system class BEHAVIOR. The class object BEHAVIOR terminates this self-reference. (i.e., If SELF is the BEHAVIOR of class, so is SELFCLASS.) | |
| SUPER | References the object on which the current method is executing. May be used in the Procedure Division of a method. May be the object used to invoke a method with the INVOKE statement or inline invocation. References the object that was used to invoke the method in which SELF appears. If SUPER is specified for a method invocation, the search for the method ignores all the methods defined in the same class as the executing method. |
| NULL | References the null object reference value, which is a unique value that is guaranteed to never reference an object. NULL is implicitly described as class object and category object reference, and is not a universal object reference. NULL must not be specified as a receiving operand. |
A PICTURE character-string consists of certain combinations of characters in the COBOL character set, used as symbols. See the topic The PICTURE Clause for the PICTURE character-string and for the rules that govern its use.
Any punctuation character that appears as part of the specification of a PICTURE character-string is not considered to be a punctuation character, but a symbol used in the specification of that PICTURE character-string.
A comment-entry is an entry in the Identification Division that can be any combination of characters from the computer's character set. A comment-entry is for documentary purposes only, may extend over more than one line and is terminated upon encountering a division, section or paragraph name that is a reserved word
or encountering any
character
in area A of a line. The continuation of a comment-entry by the use of the hyphen in the indicator area is not permitted.
A general format is the specific arrangement of the elements of a clause or a statement. Throughout this document a format is shown adjacent to information defining the clause or statement. When more than one specific arrangement is permitted, the general format is separated into numbered formats. Clauses must be written in the sequence given in the general formats. (Clauses that are optional must appear in the sequence shown if they are used.) In certain cases, stated explicitly in the rules associated with a given format, the clauses can appear in sequences other than that shown. Applications, requirements or restrictions are shown as rules.
Syntax rules are those rules that define or clarify the order in which words or elements are arranged to form larger elements, such as phrases, clauses, or statements. Syntax rules also impose restrictions on individual words or elements.
These rules are used to define or clarify how the statement must be written; that is, the order of the elements of the statement and restrictions on what each element may represent.
General rules are those rules that define or clarify the meaning or relationship of meanings of an element or set of elements. They are used to define or clarify the semantics of the statement and the effect that it has on either execution or on the way intermediate code is produced.
Elements which make up a clause or a statement consist of upper-case words, lower-case words, level-numbers, brackets, braces, connectives and special characters.
To make data as computer-independent as possible, the characteristics or properties of the data are described in relation to a standard data format rather than to an equipment-oriented format. This standard data format is oriented to general data processing applications and uses the decimal system to represent numbers (regardless of the radix used by the computer) and the remaining characters in the COBOL character set to describe nonnumeric data items.
A level concept or hierarchy is inherent in the structure of a logical data record. This concept arises from the need to specify subdivisions of a record for the purpose of data reference. Once a subdivision has been specified, it can be further subdivided to permit more detailed data referral.
The most basic subdivisions of a record, that is, those not further subdivided, are called elementary items; consequently, a record is said to consist of a sequence of elementary items, or the record itself can be an elementary item.
In order to refer to a set of elementary items, the elementary items are combined into groups. Each group consists of a named sequence of one or more elementary items. Groups, in turn, can be combined into groups of two or more groups, and so on. Thus, an elementary item can belong to more than one group.
A system of level-numbers shows the organization of elementary items and group items. Since records are the most inclusive data items, level-numbers for records start at 01. Less inclusive data items are assigned higher (not necessarily successive) level-numbers not greater in value than 49. A maximum of 49 levels in a record is allowed. There are special level-numbers, 66, 77
, 78
and 88 which are exceptions to this rule (see below). Separate entries are written for each level-number used.
A group includes all group and elementary items following it until a level-number less than or equal to the level-number of that group is encountered. All items which are immediately subordinate to a given group item should be described using identical level-numbers greater than the level-number used to describe that group item
![]()
; this rule
is not insisted upon
.
Example
| Correct | |
|---|---|
01 A.
05 C-1.
10 D PICTURE X.
10 E PICTURE X.
05 C-2.
|
|
Four types of entries exist for which there is no true concept of level. These are:
Entries describing items by means of RENAMES clauses for the purpose of regrouping data items have been assigned the special level-number 66.
Entries that specify noncontiguous data items, which are not subdivisions of other items, and are not themselves subdivided, have been assigned the special level-number 77.
Entries that specify condition-names, to be associated with particular values of a conditional variable, have been assigned the special level-number 88.
Entries that specify constant-names, to be associated with the
value of a particular literal, have been assigned the special level-number 78.

Figure 2: Example of Level-numbers Representing a Data Hierarchy
Note that indentation of COBOL source code is a readability convention only and is not part of the language.
Elementary items are by definition those items without any subordinate entries (entries without numerically greater level-numbers) following, and must have a storage definition associated with them (see the topics The PICTURE Clause and the The USAGE Clause).
Note that only elementary items (marked with an asterisk, "*", above) and FILLER items (marked with a "#" sign above) have storage explicitly reserved for them (in accordance with the associated PICTURE clause); non-elementary items have implicit storage associated with them of size determined by their subordinate items plus any FILLER bytes needed for synchronization (see the topic The SYNCHRONIZED Clause).
Level-numbers need not be consecutively ascending or descending as shown above for clarity; thus, the next subordinate level after 01 could be 05, and the next level 10, and so on.
The above data descriptions would produce storage allocation in the following manner:

Figure 3: Data Record Storage Allocation
where:
| R-E-I | is Record-Entry-Item |
| M-G-I | is Major-Group-Item |
| R-G-I | is Regular-Group-Item |
| S-G | is Sub-Group |
| EI | is Elementary-Item |
| NEI | is Noncontiguous Elementary-Item |
Every elementary data item, every literal, and every function has a class and a category. The class and category of a data item are defined by its picture character string, by the BLANK WHEN ZERO clause, or by its usage; the class and category of a literal are defined in the section Literals.
and the class and
category of an intrinsic function are specified by the definition of that
intrinsic function (see the topic
Intrinsic
Functions)
.
The category of a group item is alphanumeric.
The following table depicts the relationship of categories to classes of data for elementary items.
| Class | Category |
|---|---|
| Alphabetic |
|
| Alphanumeric |
|
| Index |
|
| National |
|
| Numeric |
|
| Object |
|
| Pointer |
|
Algebraic signs fall into two categories:
The SIGN clause permits you to state explicitly the location of the operational sign. The clause is optional; if it is not used, operational signs are represented as described in the section Selection Of Character Representation And Radix.
Editing signs are inserted into a data item through the use of the sign control symbols of the PICTURE clause.
The standard rules for positioning data within an elementary item depend on the category of the receiving item. These rules are:
If the JUSTIFIED clause is specified for the receiving item, these standard rules are modified as described in the section The JUSTIFIED Clause in the chapter Data Division - File and Data Description.
Some computer memories are organized so that natural addressing boundaries exist in the computer memory (for example, word boundaries, half-word boundaries, byte boundaries). The way in which data is stored need not respect these natural boundaries.
However, certain uses of data (for example, in arithmetic operations or in subscripting) can be facilitated if the data is stored so as to be aligned on these boundaries. Specifically, additional machine operations in the run-time element can be repeated for the accessing and storage of data if portions of two or more data items appear between adjacent natural boundaries, or if certain natural boundaries divide a single data item.
Data items which are aligned on these natural boundaries in such a way as to avoid additional machine operations are defined to be synchronized. A synchronized item is assumed to be introduced and carried in that form; conversion to synchronized form occurs only during the execution of a statement (other than READ or WRITE) which stores data in the item.
Synchronization can be accomplished in two ways:
By use of the SYNCHRONIZED clause, the use of special types of alignment within a group can affect the results of statements in which the group is used as an operand. The effect of the implicit FILLER and the semantics of any statement referencing these groups is described later in this chapter.
The value of a numeric item (defined as numeric by its PICTURE, see the topic The PICTURE Clause) can be represented in the computer's storage in either binary or decimal form depending on the USAGE clause of the declaration (see the topic The USAGE Clause). These numeric formats are:
BINARY,
![]()
![]()
COMPUTATIONAL-4
or COMP-4
PACKED-DECIMAL
FLOAT-SHORT
or FLOAT-LONG
An alphanumeric function is always
represented in the standard data format. Its size is determined by the
definition of the function.
The representation of integer and numeric functions is as follows:
Integer and numeric functions can be used only in arithmetic expressions, and represent the value resulting from the evaluation of the function without the restriction on composite of operands and/or receiving data items.
When a computer provides more than one means of representing data, the standard data format must be used for data items
other than integer
and numeric functions,
if not otherwise specified by the data description.
The COBOL digit characters from 0 through 9 that represent the number value are held in radix 10, one digit character per byte of computer storage. This is the standard data format of the COBOL language. If the data item is signed and the sign is not specified as SEPARATE (see the topic The SIGN Clause and the rules for the NUMERIC SIGN clause in the topic The Special-Names Paragraph) the numeric sign is incorporated into either the leading or trailing digit, according to the LEADING or TRAILING phrase in the SIGN clause. Signed data is incorporated into the requisite digit as shown in Table 4 below. (Effectively, bit 6 (hexadecimal value "40" ) of the character is set from 0 to 1 if the number has a negative value.) If the data item is signed and the sign is specified as SEPARATE, then the sign is held as a separate single COBOL character, additional to the digits, either plus (+) or minus (-) as necessary. If the data item is signed and no SIGN clause applies, the numeric sign is incorporated into the trailing digit, unless the NUMERIC SIGN clause is specified in the Special-Names paragraph. If the SIGN clause is specified in a data description entry, the NUMERIC SIGN clause, if specified, is ignored for that entry.
In the following table, the numbers in brackets represent the hexadecimal encoding for the COBOL character. On some systems, the encoding can be varied by the CHARSET and SIGN Compiler directives.
| Leading or trailing value digit before sign incorporation | Sign Digit Character for: | |||||
|---|---|---|---|---|---|---|
| Positively-signed values | Negatively-signed values | |||||
| Charset (ASCII) | Charset (EBCDIC) | Charset (ASCII) | Charset (EBCDIC) | |||
| Sign (ASCII) | Sign (EBCDIC) | Sign (EBCDIC) | Sign (ASCII) | Sign (EBCDIC) | Sign (EBCDIC) | |
| 0 | 0(30) | {(7B) | {(C0) | p(70) | ||