THE PROGRAMMING LANGUAGE CS301-1


1 Introduction

CS301-1 is a toy programming language used in teaching compiler construction (Terry, 1997). This report describes "level 1" of CS301-1, and is not intended as a programmer's tutorial. It is intentionally kept concise. Its function is to serve as a reference for programmers, implementors, and manual writers. What remains unsaid is mostly left so intentionally, either because it is derivable from stated rules of the language, or because it would require a general commitment in the definition when a general commitment appears as unwise.

This specification of CS301-1 is modelled on the Modula-2 and Oberon specifications (Wirth, 1980, 1990). In particular, use is made of the specification language Cocol (Mössenböck, 1990) to express rules of syntax.


2 Syntax

A language is an infinite set of sentences, namely the sentences well formed according to its syntax. Each sentence is a finite sequence of symbols from a finite vocabulary. The vocabulary of CS301-1 consists of identifiers, numbers, strings, operators, delimiters, and comments. These are called lexical symbols and are composed of sequences of characters. (Note the distinction between symbols and characters.)


3 Vocabulary and representation

  IGNORE CASE
  IGNORE CHR(9) .. CHR(13)
  COMMENTS FROM "{" TO "}"

  CHARACTERS
    cr         = CHR(13) .
    lf         = CHR(10) .
    letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
    digit      = "0123456789" .
    instring   = ANY - "'" - cr - lf - CHR(0) .

  TOKENS
    identifier = letter { letter | digit } .
    number     = digit { digit } .
    string     = "'" (instring | "''") { instring | "''" } "'" .

Symbols are identifiers, numbers, string literals, operators, delimiters and comments.

The following lexical rules must be observed:

The representation of symbols in terms of characters is defined using the ASCII set.

Blanks and line breaks must not occur within symbols (except that line breaks are allowed in comments, and blanks are allowed within string literals).

Blanks and line breaks are ignored unless they are essential for separating two consecutive symbols.

Capital and lower-case letters are considered as being equivalent (the language is not "case sensitive").

Comments may be inserted between any two symbols in a program. They are arbitrary character sequences opened by the brace { and closed by }. Comments do not affect the meaning of a program.

Identifiers are sequences of letters and digits. The first character must be a letter.

Examples: x scan CS301-14 GetSymbol firstLetter

Restriction: An implementation is free to restrict the number of significant characters in an identifier. Typically this will be at least 10.

Numbers are (unsigned) integers. Integers are sequences of digits, and have their usual decimal interpretation.

An implementation is free to restrict the range of numbers that it can handle. Typically this will be 0 ... 32767.

String literals are sequences of one or more characters or escape character sequences enclosed in apostrophes ('). The number of characters in a string is called the length of the string. Strings can only be used within output statements (see 10.7). A string literal may not extend over a line break in the source text. Within a string the escape character sequence '' denotes the apostrophe character itself.

Examples: 'CS301-1' 'He said "hell''s bells" and ran away'

Operators and delimiters are the special characters, character pairs, or reserved words listed below. Reserved words cannot be used in the role of identifiers. Like identifiers, they are not case sensitive.

       (    >    [     AND         FALSE       READ
       )    <    ]     BEGIN       IF          RETURN
       *    >=   .     BOOL        INT         THEN
       /    <=   :=    CONST       NOT         TRUE
       +    =    ,     DO          OR          WHILE
       -    <>   ;     END         PROGRAM     WRITE


4 Programs

A Program is a collection of declarations of constants and variables (whose values are said to constitute the program state), and a sequence of statements for the purpose of altering the program state. A program typically constitutes a text that is compilable as a unit.

  Program           = "PROGRAM" identifier ";" Block "." .
  Block             = { ConstDeclarations | VarDeclarations }
                      CompoundStatement .


5 Declarations and scope rules

Every identifier occurring in a program must be introduced by a declaration. Declarations also serve to specify certain permanent properties of an entity, such as whether it is a constant, variable, or an array.

The identifier is then used to refer to the associated entity. This is possible only in those parts of a program that are within the scope of the declaration. No identifier may denote more than one entity within a given scope. The scope extends textually from the point of the declaration to the end of the Block in which the declaration has been made, and to which the entity is said to be local.


6 Type

A data type determines the set of values which variables of that type may assume, and the operators that are applicable to such variables.

CS301-1 supports two fundamental types for variables, representing the set of integer values {min ... max}, where min and max are implementation defined (typically the values {-32768 ... 32767}) and the set of Boolean values {FALSE, TRUE}. Expressions may be of integer type, or of Boolean type. The latter are known as Conditions (see 9).


7 Constant declarations

A constant declaration permanently associates an identifier with a constant value.

  ConstDeclarations = "CONST" OneConst { OneConst } .
  OneConst          = identifier "=" number ";" .

Examples:

    CONST
      min = 0;
      max = 100;


8 Variable declarations

Variables are those data items whose values may be changed by execution of the program. Variable declarations serve to introduce variables and to associate them with identifiers that must be unique within the given scope.

  VarDeclarations   = ( "INT" | "BOOL" ) OneVar { "," OneVar } ";" .
  OneVar            = identifier [ UpperBound ] .
  UpperBound        = "[" number "]" .

Each variable identifier denotes either a scalar variable, or an array structure of scalar elements. Each array structure has a fixed size; the number specified in its UpperBound specifies the maximum index that may be used to access elements in that array. Thus the size of an array is one greater than the value of its UpperBound.

When the CompoundStatement that is defined by a Block is activated, all variables declared locally to that Block are deemed to have initially undefined values.

Examples:

    INT
      i, j, List[10];
    BOOL
      Sieve[4000], VeryPerplexed;


9 Expressions and Conditions

Expressions are constructs denoting rules of computation whereby constants and current values of variables are combined to derive other values by the application of operators. Expressions consist of operands and operators. Parentheses may be used to express specific associations of operators and operands.

  Condition         = Expression .
  Expression        = AndExp { "OR" AndExp } .
  AndExp            = RelExp { "AND" RelExp } .
  RelExp            = AddExp [ RelOp AddExp ] .
  AddExp            = MultExp { AddOp MultExp } .
  MultExp           = UnaryExp { MulOp UnaryExp } .
  UnaryExp          = Factor | UnaryOp UnaryExp .
  Factor            = Designator | number | "TRUE" | "FALSE" | "(" Expression ")" .
  UnaryOp           = "+" | "-" | "NOT" .
  AddOp             = "+" | "-" .
  MulOp             = "*" | "/" .
  RelOp             = "=" | "<>" | "<" | "<=" | ">" | ">=" .


9.1 Operands

With the exception of numbers and the Boolean constants TRUE and FALSE, operands are denoted by designators. A Designator consists of an identifier referring to the constant or variable to be designated. This identifier must be followed by a selector if the identifier denotes an array.

  Designator        = identifier [ "[" Expression "]" ] .

If A designates an array, then A[E] denotes that element of A whose index is the current value of the expression E. E must be of integer type, and the value of E must lie within the range of possible values for the index of A. This range is from 0 ... UpperBound, as specified when A was declared.

An operand that is a literal constant denotes a value that is the value of that constant.


9.2 Operators

The syntax of expressions distinguishes between several classes of operators with different precedences (binding strengths). Parentheses have the highest precedence, followed by unary operators, multiplication operators, addition operators, relational operators, logical conjunction (AND) and finally logical disjunction (OR). Operators of the same precedence associate from left to right. Thus, for example, x - y - z * w stands for (x - y) - (z * w).

The available operators are listed in the following tables.

9.2.1 Arithmetic operators

    symbol  result

    +       sum
    -       difference
    *       product
    /       quotient

These operators apply only to operands of integer type, and yield a result of integer type. When used as operators with a single operand, - denotes sign inversion and + denotes the identity operation.

9.2.2 Relational operators

    symbol  relation

    =       equal
    <>      unequal
    <       less
    <=      less or equal
    >       greater
    >=      greater or equal

These operators yield a result of Boolean type. The ordering operators apply to operands of integer type.

9.2.3 Boolean operators

    symbol  relation

    AND     logical conjunction
    OR      logical disjunction
    NOT     logical inversion

These operators yield a result of Boolean type. The Boolean operators apply to operands of Boolean type.

Examples of expressions and conditions:

    1996                    (Integer)
    i < 3                   (Boolean)
    (i+j) * (i-j) / i       (Integer)


10 Statements

Statements denote actions. There are elementary and structured statements.

Elementary statements are not composed of any parts that are themselves statements. They are the assignment statement, the input statement, the output statement, the return statement and the empty statement.

Structured statements are composed of parts that are themselves statements. They are used to express sequencing and conditional, selective, and repetitive execution.

  Statement         = [   CompoundStatement | Assignment      | ReturnStatement
                        | IfStatement       | WhileStatement
                        | ReadStatement     | WriteStatement ] .


10.1 Statement sequences

A CompoundStatement may be used to group several statements into one indivisible unit. A CompoundStatement is frequently used as a part of a structured statement.

  CompoundStatement = "BEGIN" Statement { ";" Statement } "END" .

Statement sequences denote the sequence of actions specified by the component statements, executed in the order specified by the sequence.


10.2 Empty statements

An empty statement causes no change to the program state; such statements are included in order to relax punctuation rules in statement sequences.

Note: Within statement sequences, a semicolon is a separator, not a terminator. Care must be taken not to insert semicolons immediately after the THEN in an IfStatement, or immediately after the DO in a WhileStatement.


10.3 Assignments

An Assignment denotes the replacement of the value of a variable with a new value, and results in a change to the state of a program.

  Assignment        = Variable ":=" Expression .
  Variable          = Designator .

An assignment serves to replace the current value of the designated Variable by the value specified by the Expression. The assignment operator is written as ":=" and pronounced as "becomes". The Variable must designate a scalar variable, or an array element.

Examples:

    i := 0
    j := i + List[j + 1]


10.4 If statements

An IfStatement specifies the conditional execution of a guarded statement.

  IfStatement       = "IF" Condition "THEN" Statement .

The Condition is an Expression (guard) that yields a Boolean result. Only if the guard evaluates to true is the guarded Statement executed.

Examples:

   IF i > max THEN BEGIN WRITE('limit exceeded'); RETURN END
   IF (j / 3 * 3 = j) THEN WRITE(j, ' is divisible by 3')


10.5 While statements

A WhileStatement specifies the possible repetition of a statement.

  WhileStatement    = "WHILE" Condition "DO" Statement .
  Condition         = Expression .

If the Condition yields true, the Statement is executed in its entirety. Both the Condition evaluation and the Statement execution are then repeated, this process continuing as long as the Condition yields true.

Example:

   WHILE J > 0 DO BEGIN J := J / 2; I := I + 1 END


10.6 Input statements

  ReadStatement     = "READ" "(" Variable { "," Variable } ")" .
  Variable          = Designator .

A ReadStatement specifies a list of variables that are to be assigned new values from an input source external to the program. Each Variable must designate a scalar variable, or an element of an array.

Example:

  READ(i, j, List[i])


10.7 Output statements

  WriteStatement    = "WRITE" [ "(" WriteElement { "," WriteElement }  ")" ] .
  WriteElement      = string | Expression .

A WriteStatement specifies a list of strings and expressions whose values are to be computed and then transferred to an output sink external to the program. A WriteStatement concludes by appending a line mark to this sink.

Example:

  WRITE('The result is ', i + j)


10.8 Return statements

A ReturnStatement causes immediate termination of a program block, returning control to the invoking environment.

  ReturnStatement   = "RETURN" .

There may be several ReturnStatements in a program, although only one will be executed. In programs a ReturnStatement is implied by the end of the program body. In this context, an explicit ReturnStatement thus appears as an additional (probably exceptional) termination point.


10.9 The stack dump statement

Implementations of CS301-1 may optionally provide a statement STACKDUMP that will allow for debugging operations to be be invoked.


11 Complete example

No language manual would be complete without at least one complete example!

   PROGRAM Sieve;
   { Sieve of Eratosthenes for finding primes 2 <= N <= 4000
      P.D. Terry, Rhodes University, 2001 }
     CONST
       Max = 4000        { largest number that can be tested };
     INT
       I , N , K         { counters };
     BOOL
       Uncrossed [4000]  { BOOLEAN sieve };
     BEGIN
       READ(N);
       IF N > Max THEN BEGIN WRITE('Too large, sorry'); RETURN END;
       WRITE('Prime numbers between 2 and ', N);
       WRITE('------------------------------------'); WRITE;
       I := 2;
       WHILE I <= N DO { clear sieve }
         BEGIN Uncrossed[I] := TRUE; I := I + 1 END;
       I := 2;
       WHILE I <= N DO { the passes over the sieve }
         BEGIN
           IF Uncrossed[I] THEN
             BEGIN
               WRITE(I, ' '); K := I  { now cross out multiples of I };
               WHILE K <= N DO BEGIN Uncrossed[K] := FALSE; K := K + I END
             END;
           I := I + 1
         END
     END { Sieve }.


Home  © P.D. Terry