THE PROGRAMMING LANGUAGE CLANG

1 Introduction

Clang is a toy programming language used in teaching compiler construction (Terry, 1997). This report describes "level 4" of Clang, and is not intended as a programmer's tutorial. It is intentionally kept concise. Its function is to serve as a reference for programmers, implementors, and manual writers. What remains unsaid is mostly left so intentionally, either because it is derivable from stated rules of the language, or because it would require a general commitment in the definition when a general commitment appears as unwise.

This specification of Clang is modelled on the Modula-2 and Oberon specifications (Wirth, 1980, 1990).


2 Syntax

A language is an infinite set of sentences, namely the sentences well formed according to its syntax. Each sentence is a finite sequence of symbols from a finite vocabulary. The vocabulary of Clang consists of identifiers, numbers, strings, operators, delimiters, and comments. These are called lexical symbols and are composed of sequences of characters. (Note the distinction between symbols and characters.)


3 Vocabulary and representation

Symbols are identifiers, numbers, string literals, operators, delimiters and comments.

The following lexical rules must be observed:

The representation of symbols in terms of characters is defined using the ASCII set.

Blanks and line breaks must not occur within symbols (except that line breaks are allowed in comments, and blanks are allowed within string literals).

Blanks and line breaks are ignored unless they are essential for separating two consecutive symbols.

Capital and lower-case letters are considered as being equivalent (the language is not "case sensitive").

Comments may be inserted between any two symbols in a program. They are arbitrary character sequences opened by the bracket (* and closed by *). Comments do not affect the meaning of a program.

Identifiers are sequences of letters and digits. The first character must be a letter.

    Examples:   x   scan   Clang4   GetSymbol   firstLetter

Restriction: An implementation is free to restrict the number of significant characters in an identifier. Typically this will be at least 10.

Numbers are (unsigned) integers. Integers are sequences of digits, and have their usual decimal interpretation.

Restriction: An implementation is free to restrict the range of numbers that it can handle. Typically this will be 0

String literals are sequences of one or more characters or escape character sequences enclosed in apostrophes ('). The number of characters in a string is called the length of the string. Strings can only be used within output statements (see 11.7). A string literal may not extend over a line break in the source text.

Within a string the escape character sequence '' denotes the apostrophe character itself.

   Examples:   'Clang'    'He said "hell''s bells" and ran away'

Operators and delimiters are the special characters, character pairs, or reserved words listed below. Reserved words cannot be used in the role of identifiers. Like identifiers, they are not case sensitive.

   (    >    [     BEGIN      FUNCTION    SIGNAL
   )    <    ]     COBEGIN    IF          THEN
   *    >=   .     COEND      PROCEDURE   VAR
   /    <=   :=    CONST      PROGRAM     WAIT
   +    =    ,     DO         READ        WHILE
   -    <>   ;     END        RETURN      WRITE


4 Programs

A Program is a collection of declarations of procedures, constants and variables (whose values are said to constitute the program state), and a sequence of statements for the purpose of altering the program state. A program typically constitutes a text that is compilable as a unit.

Program

     -----------> PROGRAM ----> identifier -----> ; ----> Block ------> .

Block

     -----------> Declarations --------> CompoundStatement ----------->

Declarations

     ---------------------------->------------------------------------>
                     |                           |
                     |----- ConstDeclarations <--|
                     |                           |
                     |----- VarDeclarations <----|
                     |                           |
                     `----- ProcDeclaration <----'


5 Declarations and scope rules

Every identifier occurring in a program must be introduced by a declaration. Declarations also serve to specify certain permanent properties of an entity, such as whether it is a function, procedure, constant, variable, or an array.

The identifier is then used to refer to the associated entity. This is possible only in those parts of a program that are within the scope of the declaration. No identifier may denote more than one entity within a given scope. The scope extends textually from the point of the declaration to the end of the Block in which the declaration has been made, and to which the entity is said to be local. It extends to blocks themselves declared local to that Block, with the exception that when an identifier is redeclared in one or more nested blocks, the innermost accessible declaration applies to each particular use of any identifier.


6 Type

A data type determines the set of values which variables of that type may assume, and the operators that are applicable to such variables.

Clang supports only one fundamental type for variables, representing the set of integer values {min ... max}, where min and max are implementation defined (typically the values {-32768 ... 32767}). Expressions may be of integer type, or of Boolean type. The latter are known as Conditions (see 10).


7 Constant declarations

A constant declaration permanently associates an identifier with a constant value.

ConstDeclarations

     ---> CONST -------> identifier -----> = ----> number ----> ; ------>
                   |                                                |
                   `-----------------------<------------------------'



Examples:

    CONST
      min = 0;
      max = 100;


8 Variable declarations

Variables are those data items whose values may be changed by execution of the program. Variable declarations serve to introduce variables and to associate them with identifiers that must be unique within the given scope.

VarDeclarations

     ------> VAR -------> identifier ----------------->----------------> ; -->
                    |                 |                             |
                    ,                 |                             |
                                      `----> [ -> UpperBound--> ] --,
                    |                                               |
                    `----------------<------------------------------'

UpperBound

      --------> number ------->

Each variable identifier denotes either a simple integer variable, or an array structure of integer elements. Each array structure has a fixed size; the number specified in its UpperBound specifies the maximum index that may be used to access elements in that array. Thus the size of an array is one greater than the value of its UpperBound.

When the CompoundStatement that is defined by a Block is activated, all variables declared locally to that Block are deemed to have initially undefined values.

Examples:

    VAR
      i, j, List[10];

Variables used in future examples are assumed to have been declared as indicated above.


9 Procedure declarations

Procedure declarations consist of a procedure heading and a procedure body. The heading specifies the procedure identifier, and the formal parameters (if any). The body contains declarations and statements.

ProcDeclaration

     ----> PROCEDURE -----> identifier ----> FormalParameters ------,
       |               |                 |                     |    |
       `-> FUNCTION ---'                 `---------->----------'    |
                                                                    |
               ,----------------------------------------------------'
               |
               `-- ; ---> Block --> ; ---->

There are two kinds of procedures, namely regular procedures (introduced by the key word PROCEDURE) and function procedures (introduced by FUNCTION). The latter are activated by a function designator as a constituent of an Expression, and yield a result that is an operand in the expression. Regular procedures are activated by a ProcedureCall. The CompoundStatement within the Block of a function procedure must contain a ReturnStatement that defines the result of the function procedure.

Note: The use of a procedure identifier in a call within its declaration implies recursive activation of the procedure.

9.1 Formal parameters

Formal parameters are identifiers which denote actual parameters specified in a procedure or function call; the correspondence between formal and actual parameters is only established when the procedure or function is activated and called.

FormalParameters

     ----> ( ------> identifier -------------------------> ) ----->
                |                       `--> [ ] --'  |
                `------------------ , <---------------'

The identifiers denoted in a formal parameter list are local to a procedure; their scope is the program text that constitutes the procedure declaration.

There are two kinds of formal parameters: value and variable (reference) parameters. Value parameters stand for local variables to which the result of the evaluation of the corresponding actual parameter is assigned as an initial value. Variable parameters correspond to actual parameters that are complete arrays, and they stand for these arrays. A variable formal parameter is denoted by the presence of the brackets after the identifier in the formal parameter list.

Examples:

    FUNCTION Max (A, B);
      BEGIN
        IF A > B THEN RETURN A;
        RETURN B
      END;

    PROCEDURE WriteBackwards (List[], N);
      BEGIN
        WHILE N >= 0 DO BEGIN
          WRITE(List[N]); N := N - 1
        END
      END;

9.2 Procedure activation

A procedure or function call may contain a list of actual parameters.

ActualParameters

     ---> ( ---------> Expression ---------> ) ----->
               |                       |
               `----------- , <--------'

Procedure activation associates each of the actual parameters with the corresponding formal parameter defined in the procedure declaration. The association is established by the positions of the parameters in the lists of actual and formal parameters respectively, and the mode of parameter passing must match the mode of formal parameter declaration.

In the case of variable formal parameters, the actual parameter must be a designator denoting an entire (unsubscripted) array variable. If the formal parameter is a value parameter, the corresponding actual parameter must be an expression. This expression is evaluated during the procedure activation, and the resulting value is assigned to the formal parameter, which then constitutes a local variable of that procedure.


10 Expressions and Conditions

Expressions are constructs denoting rules of computation whereby constants and current values of variables are combined to derive other values by the application of operators. Expressions consist of operands and operators. Parentheses may be used to express specific associations of operators and operands.

Condition

     --> Expression ----------------------------------------------,
                         |    |   |    |   |    |                 |
                         =   <>   <   <=   >   >=                 |
                         |    |   |    |   |    |                 |
                         `-------------------------> Expression ------->

Expression

          ,-> + --,
          |       |
     -----+-------+---------> Term -------------->-------------------->
          |       |                   |                    |     |
          `-> - --'                   |                    +     -
                                      |                    |     |
                                      `--- Term <----------------'

Term

     --------> Factor  -------------->--------------------------->
                           |                     |      |
                           |                     *      /
                           |                     |      |
                           `---  Factor <---------------'

Factor

     ----------------> number ------------------------------,
            |                                               |
            |--------> Designator --------------------------,
            |                       |                       |
            |                       `--> ActualParameters --,
            |                                               |
            `------> ( ---> Expression -----> ) ----------------->

10.1 Operands

With the exception of numbers, operands are denoted by designators. A Designator consists of an identifier referring to the function, constant or variable to be designated. This identifier must be followed by a selector if the identifier denotes an array. A designator may be followed by a (possibly empty) list of actual parameters if the identifier denotes a function.

Designator

     -------> identifier -----------------------------------,
                           |                                |
                           `----> [ --> Expression --> ] ---------->

If A designates an array, then A[E] denotes that element of A whose index is the current value of the expression E. E must be of integer type, and the value of E must lie within the range of possible values for the index of A. This range is from 0 ... UpperBound, as specified when A was declared.

If A designates a function and is followed by a (possibly empty) parameter list, the designator implies an activation and call of that function, and stands for the value resulting from its execution.

An operand that is a literal constant denotes a value that is the value of that constant.

10.2 Operators

The syntax of expressions distinguishes between four classes of operators with different precedences (binding strengths). Parentheses have the highest precedence, followed by multiplication operators, addition operators, and then relational operators. Operators of the same precedence associate from left to right. Thus, for example, x - y - z * w stands for (x - y) - (z * w).

The available operators are listed in the following tables.

10.2.1 Arithmetic operators

    symbol  result

    +       sum 
    -       difference 
    *       product 
    /       quotient 

These operators apply only to operands of integer type, and yield an integer type. When used as operators with a single operand, - denotes sign inversion and + denotes the identity operation.

10.2.2 Relational operators

    symbol  relation

    =       equal
    <>      unequal
    <       less 
    <=      less or equal 
    >       greater 
    >=      greater or equal 

These operators yield a result of Boolean type. The ordering operators apply to operands of integer type.

Examples of expressions and conditions:

    1996                    (Integer) 
    i / 3                   (Integer) 
    i < 3                   (Boolean)
    (i+j) * (i-j)           (Integer) 


11 Statements

Statements denote actions. There are elementary and structured statements.

Elementary statements are not composed of any parts that are themselves statements. They are the assignment statement, the input statement, the output statement, the return statement, the semaphore statement and the empty statement.

Structured statements are composed of parts that are themselves statements. They are used to express sequencing and conditional, selective, and repetitive execution, and to specify concurrent execution of processes.

Statement

     --------------------->-------------,
              |                         |
              |--> CompoundStatement ---|
              |                         |
              |--> Assignment ----------|
              |                         |
              |--> IfStatement ---------|
              |                         |
              |--> WhileStatement ------|
              |                         |
              |--> ReadStatement -------|
              |                         |
              |--> WriteStatement ------|
              |                         |
              |--> ProcedureCall -------|
              |                         |
              |--> ReturnStatement -----|
              |                         |
              |--> CobeginStatement  ---|
              |                         |
              `--> SemaphoreStatement ------->

11.1 Statement sequences

A CompoundStatement may be used to group several statements into one indivisible unit. A CompoundStatement is frequently used as a part of a structured statement.

CompoundStatement

     ------> BEGIN -----------> Statement -------------> END ---->
                          |                    |
                          `-------- ; <--------'

Statement sequences denote the sequence of actions specified by the component statements, executed in the order specified by the sequence.

11.2 Empty statements

An empty statement causes no change to the program state; such statements are included in order to relax punctuation rules in statement sequences.

Note: Within statement sequences, a semicolon is a separator, not a terminator. Care must be taken not to insert semicolons immediately after the THEN in an IfStatement, or immediately after the DO in a WhileStatement.

11.3 Assignments

An Assignment denotes the replacement of the value of a variable with a new value, and results in a change to the state of a program.

Assignment

     ---------> Variable ---> := ----> Expression -->

Variable

     ------> Designator ----->

An assignment serves to replace the current value of the designated Variable by the value specified by the Expression. The assignment operator is written as ":=" and pronounced as "becomes". The Variable must designate a scalar variable, or an array element.

Examples:

    i := 0
    j := i + List[j + 1]

11.4 If statements

An IfStatement specifies the conditional execution of a guarded statement.

IfStatement

     ----> IF ---> Condition --> THEN ---> Statement ---->

The Condition is an Expression (guard) that yields a Boolean result. Only if the guard evaluates to true is the guarded Statement executed.

Examples:

   IF i > max THEN BEGIN WRITE('limit exceeded'); RETURN END
   IF (j / 3 * 3 = j) THEN WRITE(j, ' is divisible by 3')

11.5 While statements

A WhileStatement specifies the possible repetition of a statement.

WhileStatement

     -------> WHILE ---> Condition ------> DO -----> Statement --->

If the Condition yields true, the Statement is executed in its entirety. Both the Condition evaluation and the Statement execution are then repeated, this process continuing as long as the Condition yields true.

Example:

   WHILE J > 0 DO BEGIN J := J / 2; I := I + 1 END

11.6 Input statements

ReadStatement

     ------> READ -------> ( --------> Variable -------------> ) ---->
                                 |                     |
                                 `------- , <----------'

A ReadStatement specifies a list of variables that are to be assigned new values from an input source external to the program. Each Variable must designate a scalar variable, or an element of an array.

Example:
        
  READ(i, j, List[i])

11.7 Output statements

WriteStatement

                                 ,----> Expression -------,
     -----> WRITE -------> ( ----,                        |------> ) -->
                              |  `------> STRING ---------'   |
                              |                               |
                              `----------------- , <----------'

A WriteStatement specifies a list of strings and expressions whose values are to be computed and then transferred to an output sink external to the program. A WriteStatement concludes by appending a line mark to this sink.

Example:

  WRITE('The result is ', i + j)

11.8 The stack dump statement

Implementations of Clang may optionally provide a statement STACKDUMP that will allow for debugging operations to be be invoked.

11.9 Procedure calls

A ProcedureCall serves to activate a regular procedure, and then to transfer control to that procedure. After the procedure returns, control is transferred to the statement following the procedure call.

ProcedureCall

     ---> Identifier ----------------->----------,
                          |                      |
                          `--> ActualParameters ------->


Example:

  WriteBackwards(MyList, Max(I, N) / 2)

Note: the identifier in a procedure call must denote a previously declared regular procedure (see 9).

11.10 Return statements

A ReturnStatement causes immediate termination of a function, procedure or program block, returning control to the invoking environment.

ReturnStatement

     ---> RETURN ---------------------,
                    |                 |
                    `-> Expression ---------->

Function procedures require the presence of a ReturnStatement, with a mandatory Expression whose value specifies the result value. There may be several, although only one will be executed. In programs and regular procedures, a ReturnStatement is implied by the end of the procedure body. In this context an explicit ReturnStatement thus appears as an additional (probably exceptional) termination point, and no Expression may be specified.

11.11 Concurrent statements

Clang supports the execution of simple concurrent processes.

CobeginStatement

     -----> COBEGIN ----------> ProcessCall ------> COEND ------>
                        |                      |
                        `----------- ; <-------'
ProcessCall

     ----> ProcedureCall  ------>

A process is statically defined by a regular procedure, which may be parameterized. A CobeginStatement may only occur within the context of the main program Block; it follows that the procedures that define processes must be declared at level 1 (although they may, during execution, invoke further procedures, either at level 1, or nested within themselves).

When a CobeginStatement is executed, each of the procedures that is defined in the sequence of ProcessCalls is activated (its actual parameters are evaluated; this is done within the context of the main program). Only once all these activations have been completed is the execution of the main program block suspended, and control is then transferred to one of the activated processes. After all the processes have run to conclusion, control returns once more to the statement following the CobeginStatement. While the processes are active, processor time is distributed in a round-robin fashion between those that are not waiting on semaphores and that have not yet run to completion.

Notes:

Example:

    COBEGIN
      Process1(x, y);
      Process1(3 * x, Min(a,b));
      Process2
    COEND

11.12 Semaphore statements

Clang permits concurrent processes to synchronize or protect shared resources by means of semaphores.

SemaphoreStatement

     ----------> SIGNAL -----,
        |                    |
        `------> WAIT -------------> ( ------> Variable -------> ) ---->

A SemaphoreStatement may only be executed within the context of a statement sequence that forms part of a concurrent sequential process. The semantics of these statements may be described by

      WAIT(S)       WHILE S < 0 DO (* nothing *);
                    S := S - 1;

      SIGNAL(S)     S := S + 1;

However, it is likely that an implementation of WAIT will not employ busy-waiting, but will suspend the invoking process if necessary, and transfer that process to a queue of waiting processes. The implementation must guarantee that other context switching is disabled while a SemaphoreStatement is executed.

Variables used as semaphores are declared as integers. No protection is provided by the implementation against their corruption by other statements; it is the programmer's responsibility to ensure that, after initialization, the only guaranteed operations on semaphores are WAIT and SIGNAL.


12 Complete example

No language manual would be complete without at least one complete example!

   PROGRAM Sieve;
   (* Sieve of Eratosthenes for finding primes 2 <= N <= 4000
      P.D. Terry, Rhodes University, 1997 *)
     CONST
       FALSE = 0;
       TRUE  = 1           (* no specific BOOLEAN type in CLANG *);
       Max   = 4000        (* largest number that can be tested *);
     VAR
       I , N , K ,         (* counters *)
       Uncrossed [4000]    (* BOOLEAN sieve *);
     BEGIN
       READ(N);
       IF N > Max THEN BEGIN WRITE('Too large, sorry'); RETURN END;
       WRITE('Prime numbers between 2 and ', N);
       WRITE('------------------------------------'); WRITE;
       I := 2;
       WHILE I <= N DO (* clear sieve *)
         BEGIN Uncrossed[I] := TRUE; I := I + 1 END;
       I := 2;
       WHILE I <= N DO (* the passes over the sieve *)
         BEGIN
           IF Uncrossed[I] = TRUE THEN
             BEGIN
               WRITE(I, ' '); K := I  (* now cross out multiples of I *);
               WHILE K <= N DO BEGIN Uncrossed[K] := FALSE; K := K + I END
             END;
           I := I + 1
         END
     END (* Sieve *).