THE PROGRAMMING LANGUAGE TOPSY++

1 Introduction

Topsy++ is a toy programming language that evolved from Clang (Terry, 1996) and C++ (Stroustrup, 1990).

This report is not intended as a programmer's tutorial. It is intentionally kept concise. Its function is to serve as a reference for programmers, implementors, and manual writers. What remains unsaid is mostly left so intentionally, either because it is derivable from stated rules of the language, or because it would require a general commitment in the definition when a general commitment appears as unwise.

This specification of Topsy++ is modelled on the Modula-2 and Oberon specifications (Wirth, 1980, 1990).


2 Syntax

A language is an infinite set of sentences, namely the sentences well formed according to its syntax. Each sentence is a finite sequence of symbols from a finite vocabulary. The vocabulary of Topsy++ consists of identifiers, numbers, strings, operators, delimiters, and comments. These are called lexical symbols and are composed of sequences of characters. (Note the distinction between symbols and characters.)

To describe the syntax the variant of extended Backus-Naur formalism called Cocol/R is used. This is described in full detail elsewhere (Terry, 1996). Brackets [ and ] denote optionality of the enclosed sentential form, and braces { and } denote its repetition (possibly 0 times). Syntactic entities (non-terminal symbols) are denoted by English words expressing their intuitive meaning. Symbols of the language vocabulary (terminal symbols) are denoted by strings enclosed in quote marks (these include words written in lower case letters, so-called reserved key words).


3 Vocabulary and representation

The representation of symbols in terms of characters is defined using the ASCII set. Symbols are identifiers, numbers, string literals, character literals, operators, delimiters, and comments. The following lexical rules must be observed. Blanks and line breaks must not occur within symbols (except that line breaks are allowed in comments, and blanks are allowed within string and character literals). They are ignored unless they are essential for separating two consecutive symbols. Capital and lower-case letters are considered as being distinct.

  CHARACTERS
    cr         = CHR(13) . 
    lf         = CHR(10) . 
    back       = CHR(92) .
    letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" . 
    digit      = "0123456789" . 
    graphic    = ANY - '"' - "'" - back - cr - lf - CHR(0) .

  IGNORE CHR(9) .. CHR(13) 
  COMMENTS FROM "//" TO lf 
  COMMENTS FROM "/*" TO "*/" 

Comments may be inserted between any two symbols in a program. They are arbitrary character sequences either opened by the bracket /* and closed by */, or opened by the bracket // and closed at the end of that line. Comments do not affect the meaning of a program.

  TOKENS
    identifier = letter { letter | digit | "_" } . 
    number     = digit { digit } . 
    string     = '"' { graphic | back graphic | "\'" | back back | '\"' } '"' .
    character  = "'" ( graphic | back graphic | "\'" | back back | '\"' ) "'" .

Identifiers are sequences of letters and digits. The first character must be a letter.

Examples:

    x   scan   Topsy++   Get_Symbol   firstLetter

Restriction: An implementation is free to restrict the number of significant characters in an identifier. Typically this will be at least 10.

Numbers are (unsigned) integers. Integers are sequences of digits, and have their usual decimal interpretation.

Restriction: An implementation is free to restrict the range of numbers that it can handle. Typically this will be 0 .. 32767.

String literals are sequences of zero or more characters or escaped character sequences enclosed in quote marks ("). The number of characters in a string is called the length of the string. Strings can only be used within output statements (see 10.8). A string literal may not extend over a line break in the source text.

Character literals are denoted by a single graphic character or a single escape character sequence between single quote marks ('). Character literals denote the integer value of the corresponding ASCII character.

Within a string or character literal the following escaped character sequences denote non-graphical characters

      \a         Alert            (CHR(7))
      \b         Backspace        (CHR(8)) 
      \t         Horizontal tab   (CHR(9)) 
      \n         Line feed        (CHR(10)) 
      \f         Form feed        (CHR(12)) 
      \r         Carriage return  (CHR(13)) 
      \"         Quotation mark   (CHR(34)) 
      \'         Apostrophe       (CHR(39)) 
      \x         (where x is not a, b, t, n, r or f) denotes x itself 

(Within a string \n denotes the sequence Carriage return, Line feed, on MS-DOS systems.)

Examples:

     "Topsy++"    "He said\"hello\" and rang the bell\a"

Operators and delimiters are the special characters, character pairs, or reserved words listed below. The reserved words consist exclusively of lower case letters and cannot be used in the role of identifiers.

    !     ++    <=     bool    if
    !=    ,     =      char    int 
    %     -     ==     cin     main 
    &&    --    >      const   return 
    (     /     >=     cout    true 
    )     ;     >>     do      until 
    *     <     [      else    void 
    +     <<    ]      false   while 
    {     }     ||     for 


4 Programs

A program is a collection of declarations of constants and variables (whose values are said to constitute the program state), and a sequence of statements for the purpose of altering the program state. A program typically constitutes a text that is compilable as a unit.

  Topsy =  "void" "main" "(" "void" ")" Block .

  Block =  "{" { Declaration | Statement } "}" . 


5 Declarations and scope rules

Every identifier occurring in a program must be introduced by a declaration. Declarations also serve to specify certain permanent properties of an object, such as its type, and whether it is a constant, a variable, or an array.

The identifier is then used to refer to the associated object. This is possible only in those parts of a program that are within the scope of the declaration. No identifier may denote more than one object within a given scope. The scope extends textually from the point of the declaration to the end of the block in which the declaration has been made and to which the object is said to be local.

  Declaration =  ConstDeclaration | VarDeclarations .


6 Type

A data type determines the set of values which variables of that type may assume, and the operators that are applicable to such variables. Topsy++ supports only three fundamental types, representing the set of Boolean values {false, true}, the set of ASCII characters, and the set of integer values {min .. max}, where min and max are implementation defined (typically the values {-32768 .. 32767}).


7 Constant declarations

A constant declaration permanently associates an identifier with a constant value.

  ConstDeclaration = "const" identifier "="
                     ( number | character | "true" | "false" ) ";" . 

The type of the identifier is determined from the context of the declaration. true and false are the Boolean constants. Numbers and character literals are deemed to define constants of integer and character type respectively.

Examples:

    const max = 100;
    const YES = true; 
    const CapitalA = 'A'; 


8 Variable declarations

Variables are those data items whose values may be changed by execution of the program. Variable declarations serve to introduce variables and to associate them with identifiers that must be unique within the given scope. They also serve to associate a fixed data type with each variable so introduced.

  VarDeclarations =  ( "int" | "bool" | "char" ) OneVar { "," OneVar } ";" .
  OneVar =  identifier [ ArraySize | "=" Expression ] . 
  ArraySize =  "[" number "]" . 

Each variable identifier denotes either a simple scalar variable of a specified type, or an array structure of scalar elements that all have that specified type. Each array structure has a fixed size, denoted by the number specified in its declaration.

Scalar variables and arrays whose identifiers appear in the same list are all of the same type. This type is either integer (denoted by the key word int), character (denoted by the key word char), or Boolean (denoted by the key work bool).

When the statement sequence that is defined by the program Block is activated, all variables are deemed to have initially undefined values, except for those scalar variables that have been assigned the values of expressions within the variable declaration sequence where they were declared.

Examples:

    int i, j = 8, List[10];
    bool Started, Suitable; 
    char Letter, reply 

Variables used in future examples are assumed to have been declared as indicated above.


9 Expressions

Expressions are constructs denoting rules of computation whereby constants and current values of variables are combined to derive other values by the application of operators. Expressions consist of operands and operators. Parentheses may be used to express specific associations of operators and operands.

  Expression = SimpleExpr [ RelOp SimpleExpr ] . 
  SimpleExpr = ( "+" Term | "-" Term | Term ) { AddOp Term } . 
  Term       = Factor { MulOp Factor } . 
  Factor     =   Designator | number | "true" | "false" | character 
               | "!" Factor | "(" Expression ")" . 
  AddOp      = "+" | "-" | "||" . 
  MulOp      = "*" | "/" | "%" | "&&" . 
  RelOp      = "==" | "!=" | "<" | "<=" | ">" | ">=" . 


9.1 Operands

With the exception of numbers and character literals, operands are denoted by designators. A Designator consists of an identifier referring to the constant or variable to be designated. This identifier must be followed by a selector if the identifier denotes an array.

Designator = identifier [ "[" Expression "]" ] .

If A designates an array, then A[E] denotes that element of A whose index is the current value of the expression E. E must be of integer or character type, and the value of E must lie within the range of possible values for the index of A. This range is from 0 ... ArraySize - 1, as specified when A was declared.

If the designated object is a variable or an array element, then, when used as an operand, the designator refers to the variable's current value, and the type of the operand is the type of that variable or array element. An operand that is a literal constant denotes a value that is the value of that constant, and the type of the operand is the type of that literal constant.


9.2 Operators

The syntax of expressions distinguishes between four classes of operators with different precedences (binding strengths). The operator ! has the highest precedence, followed by multiplication operators, addition operators, and then relational operators. Operators of the same precedence associate from left to right. Thus, for example, x - y - z * w stands for (x - y) - (z * w).

The available operators are listed in the following tables.


9.2.1 Logical operators

    symbol  result

    ||      logical disjunction
    &&      logical conjunction
    !       negation

These operators apply only when both operands are of the Boolean type, and yield a Boolean result.

    p || q   stands for  "if p then true, else q"
    p && q   stands for  "if p then q, else false"
    ! p      stands for  "not p"


9.2.2 Arithmetic operators

    symbol  result

    +       sum
    -       difference
    *       product
    /       quotient
    %       modulus

These operators apply only to operands of integer or character type, and yield an integer type. When used as operators with a single operand, - denotes sign inversion and + denotes the identity operation.

The operators / and % are related by the following formulae defined for any dividend x and positive divisor y:

    x  =  (x / y) * y  +  (x % y)
    0 <=  (x % y) < y.


9.2.3 Relational operators

    symbol  relation

    ==      equal
    !=      unequal
    <       less
    <=      less or equal
    >       greater
    >=      greater or equal

These operators yield a result of Boolean type. The ordering operators <, <=, > and >= apply to operands of the integer and character types. The relational operators == and != also apply when both operands are of the Boolean type. The operands of a relational operator may be evaluated in any convenient order.

Examples of expressions:

    1996                    (Integer)
    i / 3                   (Integer) 
    !Started || Suitable    (Boolean) 
    (i+j) * (i-j)           (Integer) 
    (0 <= i) && (i < max)   (Boolean) 


10 Statements

Statements denote actions. There are elementary and structured statements.

Elementary statements are not composed of any parts that are themselves statements. They are the assignment, the input statement, the output statement, the return statement, and the empty statement.

Structured statements are composed of parts that are themselves statements. They are used to express sequencing and conditional, selective, and repetitive execution.

A statement may assume the form of a Block, and may thus incorporate declarations of identifiers, whose scope is local to that block.

  Statement =   Block           | Assignment      | IfStatement 
              | WhileStatement  | ForStatement    | RepeatStatement 
              | ReadStatement   | WriteStatement  | ReturnStatement 
              | EmptyStatement  . 


10.1 Assignments

An Assignment denotes the replacement of the value of a variable with a new value, and results in a change to the state of a program.

  Assignment =  Variable ( "=" Expression | "++" | "--" ) ";" .
  Variable   =  Designator . 

If the Expression is present, the assignment serves to replace the current value of the designated Variable by the value specified by the Expression. The assignment operator is written as "=" and pronounced as "becomes". The types of the Expression and of the Variable must be assignment compatible, and the Variable must designate a scalar variable, or an array element.

Assignment compatibility is defined by the following

      Type of Variable  Type of Expression

      integer           integer or character
      character         character or integer
      Boolean           Boolean

Statements of the form Variable++; and Variable--; are semantically equivalent to the statements Variable = Variable + 1; and Variable = Variable - 1; respectively. In this case the Variable must be of the integer or character type.

Examples:

    i = 0;
    Suitable = i == j; 
    j = i + 1; 
    List[i]++; 


10.2 Blocks - statement sequences

A Block statement may be used to group several statements and declarations into one indivisible unit. Block statements are frequently used as parts of structured statements.

  Block = "{" { Declaration | Statement } "}" .

Statement sequences within a Block statement denote the sequence of actions specified by the component statements, executed in the order specified by the sequence.


10.3 If statements

An IfStatement specifies the conditional execution of a guarded statement or statements.

  IfStatement =  "if" "(" Condition ")" Statement [ "else" Statement ] .
  Condition =  Expression . 

The Expression (guard) must yield a Boolean result. If the guard evaluates to true the guarded Statement is executed. If the guard evaluates to false, the Statement following the symbol else is executed, if there is one.

Note: A statement of the form

    if (condition1) if (condition2) statement1; else statement2;

is deemed to have the semantic meaning

    if (condition1) {if (condition2) statement1; else statement2;} else;

and not

    if (condition1) {if (condition2) statement1; else;} else statement2;

Examples:

   if (i > max) { cout << "limit exceeded\n"; return; }
   if (j % 3 = 0) cout << "divisible by 3"; 
   else cout << "not divisible by 3"; 


10.4 While statements

A WhileStatement is one form of specifying the repetition of a statement.

  WhileStatement =  "while" "(" Condition ")" Statement .
  Condition =  Expression . 

The Expression must yield a Boolean result. If this expression yields true, the Statement is executed. The expression evaluation and the statement execution are repeated as long as the Condition yields true.

Example:

   while (j > 0) { j := j / 2; i := i + 1; }


10.5 Repeat Statements

  RepeatStatement =  "do" { Statement } "until" "(" Condition ")" ";" .
  Condition =  Expression . 

The Expression must yield a Boolean result. A RepeatStatement specifies the repeated execution of the statement sequence until the Condition is satisfied. The statement sequence is executed at least once.

Example:

   repeat j := j / 2; i := i + 1; until (j <= 0);


10.6 For statements

A ForStatement is a notational shorthand for expressing a form of WhileStatement that occurs frequently in practice.

  ForStatement =  "for" "(" identifier "=" Expression ";" Condition
                  [ ";" Epilogue ] ")" Statement . 
  Epilogue =  identifier ( "=" Expression | "++" | "--" ) . 

A ForStatement is semantically equivalent to

           identifier = Expression;
           while (Condition) { Statement; Epilogue } 

Frequently epilogues are of the forms identifier++ and identifier--, which are semantically equivalent to the assignments identifier = identifier + 1; and identifier = identifier - 1; respectively. The Epilogue may be omitted. The identifier in a ForStatement and in its Epilogue must denote the same scalar variable, which must be of the integer or character type. The Statement must not alter the value of this variable in any way.


10.7 Input statements

  ReadStatement =  "cin" ">>" Variable { ">>" Variable } ";" .

A ReadStatement specifies a list of variables that are to be assigned new values from an input source external to the program. Each Variable must designate a scalar variable, or an element of an array.

Example:

   cin >> i >> j >> List[i] >> reply;


10.8 Output statements

  WriteStatement =  "cout" "<<" WriteElement { "<<" WriteElement } ";" .
  WriteElement =  string | Expression . 

A WriteStatement specifies a list of strings and expressions whose values are to be computed and then transferred to an output sink external to the program.

Example:

   cout << "The result is " << i + j << "\n";


10.9 Return statements

A ReturnStatement causes immediate termination of the program execution, returning control to the invoking environment.

  ReturnStatement = "return" ";" .


10.10 Empty statements

An EmptyStatement causes no change to the program state.

  EmptyStatement =  ";" .


11 Complete example

void main (void) { // A simple Topsy++ program for doing a Bubble Sort
  const Max = 200; 
  int  List[200], I = 0, N, Number, Temp; 
  bool Sorted; 
  cout << "Hello there - welcome to \"Pat\'s Fiendish Sorter\" - it\'s great!"; 
  do 
    cin >> List[I]; I++; 
  until ((List[I - 1] % 8 == 0) /* A number divisible by 8 stops the list */ 
         || (I == Max));        /* The list is about to overflow */ 
  N = I - 1; 
  Number = N; 
  Sorted = false; 
  while ((N > 1) && !Sorted) { 
    Sorted = true; /* be optimistic */ 
    for (I = 0; I <= N - 2; I++) 
      if (List[I] > List[I + 1]) { 
        Sorted = false; 
        Temp = List[I]; List[I] = List[I + 1]; List[I + 1] = Temp; 
      } 
    N = N - 1; 
  } 
  if (Number > 0) { 
    cout << "Sorted List of " << Number << " numbers\n"; 
    // list in forward order 
    for (I = 1; I <= Number; I++) cout << List[I - 1] << " "; 
    cout << "\n"; 
    // list in reverse order 
    for (I = Number; I >= 1; I--) cout << List[I - 1] << "\t"; 
    cout << "\n"; 
  } 
  else cout << "No data given\n"; }