Computer Science 3 - 2001

Programming Language Translation


Practical for Week 20, beginning 3 September 2001 - Solutions

Complete solutions (sources) can be found on the course WWW pages in the file PRAC20A.ZIP. On the WWW pages will also be found a listing of the complete grammar for the pretty-printer application.


Task 2 Do you really think you know the language?

   PROGRAM Query;
     CONST
       Header = 'Title';  { Can I declare a string constant? }

(a) No. Syntax for ConstDeclaration does not admit string as an alternative.

     INT
       X, Y,              { Some easy ones }
       L1[10], L2[10],    { Are these the same size? }

(b) Yes. Report, page 3, section 8, text makes it clear what the size is, even though the syntax does not.

       L3[20], I, Query,  { Can I reuse the program name as a variable? }

(b) The Report, section 5, demands that "identifiers must be unique within the given scope". It goes on to say that the scope extends from the point of declaration in the Block in which the declaration is made to the end of that block. Strictly the program name has been declared before the Block, and so strictly one can reuse the program name. But in practice you would probably find that this was forbidden for safety reasons. Consider, for example, the implications of languages permitting declarations like

    Silly FUNCTION Silly (PARAMETER Silly)
      LOCAL VAR Silly;

In fact, as several students discovered, the version of the compiler you had been given for CS301 made no checks on redeclaration of identifiers at all. This was deliberate - to drive home the point that answering questions about a language by asking them of a compiler is not really a very "safe" thing to do. Compilers can have bugs too! Strictly the compiler should implement what the defining document requires - and of course sometimes the defining document is not explicit either, or has contradictions or omissions. Defining languages is much harder than it might at first appear, as many language designers have found to their cost.

       Header,            { Can I reuse a constant name as a variable name? }
       L3[15];            { What happens if I use a variable name again? }

(b) No. Report, section 5, demands that "identifiers must be unique within the given scope".

     CONST                { Can I declare constants after variables? }

(a) Yes. Syntax is quite explicit that declarations can occur in any order.

       Max = 1000;
       Min = -89;         { Can I define negative constants? }

(a) No. ConstDeclarations only allow numbers, and the syntax for these tokens admits only digits, not signs.

       MaxPlus3 = Max+3;  { Can I define constants in terms of others? }
       XX = X;            { Can I give one variable X two names like this? }
       UNTRUE = NOT TRUE; { Can I define Boolean constants }

(a) No. ConstDeclarations only allow numbers, not Expressions.

     INT                  { Can I have another variable section? }

(a) Yes. Syntax is quite explicit that declarations occur in any order.

       BigList[Max];      { Can I use named constants to set array sizes? }

(a) No. Syntax is quite explicit - array size is defined in terms of a number.

     BEGIN
       Write(Header);     { Can I write constants? }

(a and b) Yes. Syntax admits to Expressions as WriteElements. Syntax admits to Designators as Factors. Section 9.1 in the report specifically allows Designators to refer to constants. Technically, of course, this statement would not mean anything, as we could not have declared a constant to be a string. But the general question that is being asked can be answered in this way.

       L1[10] := 34;      { Does L1[10] exist? }

(b) Yes. Report, page 3, section 9.1 makes it clear in the text (but not in the syntax) that the permitted values of the subscripts to L1 can lie in the range 0 ... 10.

       L1 := L2;          { Can I copy complete arrays? }

(b) No. You cannot tell this from the syntax, and it is a good example of a kink in the language design. Report, page 3, section 9.1 makes it explicit that designators that refer to arrays must have a subscript selector.

You might like to ponder why this restriction exists. Are you allowed "array assignment" statements like that in C++? In Pascal? In any other languages you know? What would it "mean" to assign one array to another if the source array was longer than the destination? Shorter than the destination? When the source array had elements that had never had values stored in them?

       Write(L3);         { Can I write complete arrays? }

(b) No. The reason is the same - Report, page 3, section 9.1 makes it explicit that designators that refer to arrays must have a subscript selector.

So the implication is that you must always process arrays element by element. That is partly why languages have ForStatements - they provide a very handy way of doing this.

       ;; X := Query;;;   { What about spurious semicolons? }

(a) No problem. The syntax admits to the empty statement (syntactically between the "spurious" semicolons) as a valid form of statement.

       IF (X) THEN        { Can I have a single variable as a "condition"? }

(a) Syntactically yes. Syntax only requires a Condition which syntactically is a very general Expression. But the Report (Section 10.4) requires that a Condition be a Boolean expression, so overall the answer is no, as X was defined of integer type.

         BEGIN END;        { Can I write BEGIN END as a valid construction? }

(a) Yes. Syntax allows a CompoundStatement (which is a valid alternative for Statement) to be completely empty, without even a semicolon inside! (Have you ever seen this construction being used in C++? Where?)

       WHILE (X = 7) DO;  { Is this a valid loop? }

(a) Yes. Although the syntax requires a statement after the DO, this statement can be any statement, including the empty statement.

Of course this has to be one of the most dangerous flaws in a language like C++. 99% of programmers reading code like

           for (i = 0; i <= 10; i++);
             { doSomething();
               doSomethingElse();
             }

will not "see" the stray semicolon, and might spend hours wondering why the obvious loop body was only done once, not 10 times. The same abomination exists in Pascal, but not in Modula-2 where they finally got it right, by demanding that every IF, FOR and WHILE statement had to have an explicit closing "END".

       X := X * -5;       { Can I do this? }

(a) Yes, the syntax allows it, as a careful study of the grammar in Section 9 will reveal. It would not be allowed in Clang, Pascal, or Modula-2 which have a different expression grammar.

       BEGIN END;         { Can I do this as a stand alone statement? }

(a) Yes. Syntax allows a CompoundStatement as a valid alternative for Statement; it may also be completely empty.

       X := L1[1];        { Presumably this is okay? }

(a and b) Yes. Syntactically it is correct in a context-free way; semantically L1 has to be followed by an index expression.

       L1 := X;           { Can I do this? Does it mean anything? }

(a and b) No. Syntactically it is correct in a context-free way; semantically L1 has to be followed by an index expression.

       X := TRUE + 5;     { Can I do this? Does it mean anything? }

(b) Syntactically yes, but semantically no. Careful reading of section 9.2.1 shows that the addition operator requires operands of integer type.

       X := 5 AND NOT 6;  { Can I do this? Does it mean anything? }

(b) Syntactically yes, but semantically no. Careful reading of section 9.2.3 shows that the Boolean operators require operands of Boolean type.

       X := X - - + Y;    { Can I do this? Does it mean anything? }

(a) Yes, the syntax allows it, as a careful study of the grammar in Section 9 will reveal. It would not be allowed in Clang, Pascal, or Modula-2 which have a different expression grammar.

     END.


Task 3 Better ways of declaring arrays

Syntactically this is trivial;

  OneVar     = Identifier [ UpperBound ] .
  UpperBound = "[" Expression "]" .

Semantically we should have to check that the Expression resolved to a constant predictable value at compile time.


Task 4 Statement sequences

  CompoundStatement = "BEGIN" StatementSequence "END" .
  StatementSequence = Statement { ";" Statement } .

  Statement
  =  [   Assignment     | IfStatement    | WhileStatement
       | CaseStatement  | ReadStatement  | WriteStatement ] .

  WhileStatement    = "WHILE" Condition "DO" StatementSequence "END" .


Task 5 Better control over your output

  WriteStatement = "WRITE" [ "(" WriteElement { "," WriteElement } ")" ] | "NEWLINE" .
  WriteElement   = String | Expression [ ":" Expression ] .

Semantically we should have to check that the second Expression was of integer type


Task 6 More power to the arms of the IF statement

  IfStatement       = "IF" Condition "THEN" StatementSequence
                      [ "ELSE" StatementSequence ] "END" .


Task 7 Why does nobody use CS301-1? (More assignment operators)

  Assignment = Variable ( AssignOp Expression | "++" | "--" ) SYNC .
  AssignOp = ":=" | "+=" | "-=" | "*=" | "/=" | "%=" | "=" (. SemError(93); .) .


Task 8 Case Statements

  CaseStatement     = "CASE" Expression "OF"
                         OneCase { "|" OneCase }
                         [ "DEFAULT" ":" StatementSequence ]
                      "END" .

  OneCase           = [ Range { "," Range } ":" StatementSequence ] .

  Range             = Expression [ ".." Expression ] .

Semantically we would require a check that the defining Expressions in the Range productions resolved to constant known values at compile time.


Task 10 A pretty-printer for the new, improved CS301-2

This is fairly straightforward, and a full solution can be found on the web pages.

Several people were clearly perturbed by the manner in which the pretty-printer as supplied handled "empty" statements as exemplified by (poor) code like

       BEGIN
         A := B; ; ; ;
         C := D;
       END

as the effect was to pretty-print this to give

       BEGIN
         A := B;
         ;
         ;
         C := D;

       END

For some time I have thought that there was not much that one could do about this. But then Terry's Second Law struck me again ("There is always a better way - find it"). Indeed, one can improve things. The original pretty-printer included code like

  CompoundStatement
  =  "BEGIN"                     (. Append("BEGIN"); IndentNewLine(); .)
        Statement
        { WEAK ";"               (. Append(";"); NewLine(); .)
          Statement }
     "END"                       (. ExdentNewLine(); Append("END"); .) .

  Statement
  =  SYNC [   Assignment        | ReturnStatement
            | IfStatement       | WhileStatement   | CaseStatement
            | ReadStatement     | WriteStatement ] .

If this is replaced with code like

  CompoundStatement
  =  "BEGIN"                     (. Append("BEGIN"); Indent(); .)
        Statement
        { WEAK ";"               (. Append(";"); .)
          Statement }
     "END"                       (. ExdentNewLine(); Append("END"); .) .

  Statement
  =  SYNC (                      (. NewLine(); .)
          (   Assignment        | ReturnStatement
            | IfStatement       | WhileStatement   | CaseStatement
            | ReadStatement     | WriteStatement
          )
            |                    /* empty statement */
          ) .

the effect is that only the output of non-empty statements is preceded by a call to NewLine(). (Corresponding alterations have to be made in other places in the actions for WhileStatement, OneCase and IfStatement).

The solution kit includes two attributed grammars - one closely based on what you were given to work from, and the other incorporating this later refinement.

Getting the system to include, rather than ignore comments, is a little tricky. Here is one possibility: One declares the comment as a pragma - that is, as a token that can occur anywhere but is almost ignorable. This means removing the COMMENTS clause in the Cocol specification, and replacing it with a PRAGMAS clause.

When the comment token is recognized, its text is extracted and passed to the pretty-printer routines - note the use of LookAheadString, rather than LexString:

  char CommentText[5000];

  IGNORE CASE
  IGNORE CHR(9) .. CHR(13)

  CHARACTERS
    incomment  = ANY - "}" - CHR(0) .

  PRAGMAS /* this is not totally satisfactory for comments */
    comment    = "{" { incomment } "}" .
                                   (. LookAheadString(CommentText, sizeof(CommentText) - 1);
                                      AppendComment(CommentText); .)

The changes needed to the pretty-printer routines are as follows:

  #include "misc.h"

  static int Indentation = 0;
  static bool HaveComment = 0;
  static char Comment[10000] = "";

  void Append (char String[])
  {
    printf("%s", String);
    if (HaveComment) {
      NewLine(); printf("%s", Comment);
    }
    HaveComment = FALSE;
    Comment[0] = '\0';
  }

  void AppendComment (char str[])
  {
    strcat(Comment, str); HaveComment = TRUE;
  }

If you try this out you will find it is not totally satisfactory. Comments tend to be reattached in slightly odd places, and to have peculiar internal spacing. However, a more sophisticated system could be constructed if the pretty-printing routines were more "intelligent" still!



  COMPILER CS301  $XCN /* Generate main module, C++ */
  /* CS301 level 2 grammar - no procedures, functions, parameters
     Pretty Printer Grammar - no code generation or constraint analysis
     Version where empty statements do not produce extraneous blank lines
     P.D. Terry, Rhodes University, 2001 */

  #include "prettypr.hpp"
  char CommentText[5000];

  IGNORE CASE
  IGNORE CHR(9) .. CHR(13)

  CHARACTERS
    cr         = CHR(13) .
    lf         = CHR(10) .
    letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
    digit      = "0123456789" .
    instring   = ANY - "'" - cr - lf - CHR(0) .
    incomment  = ANY - "}" - CHR(0) .

  TOKENS
    identifier = letter { letter | digit } .
    number     = digit { digit } .
    string     = "'" (instring | "''") { instring | "''" } "'" .

  PRAGMAS /* this is not totally satisfactory for comments */
    comment    = "{" { incomment } "}" .
                                   (. LookAheadString(CommentText, sizeof(CommentText) - 1);
                                      AppendComment(CommentText); .)

  PRODUCTIONS
    CS301
    =  "PROGRAM"                   (. Append("PROGRAM "); .)
       Identifier
       WEAK ";"                    (. Append(";"); IndentNewLine(); .)
       Block
       "."                         (. Append("."); NewLine(); .) .

    Block
    =  SYNC { ( ConstDeclarations | VarDeclarations ) SYNC }
       CompoundStatement .

    ConstDeclarations
    =  "CONST"                     (. Append("CONST"); IndentNewLine(); .)
       OneConst
       {                           (. NewLine(); .)
         OneConst }                (. ExdentNewLine(); .) .

    OneConst
    =  Identifier WEAK "="         (. Append(" = "); .)
       Expression ";"              (. Append(";"); .) .

    VarDeclarations
    =  (   "INT"                   (. Append("INT"); .)
         | "BOOL"                  (. Append("BOOL"); .)
       )                           (. IndentNewLine(); .)
       OneVar
       { WEAK ","                  (. Append(", "); .)
         OneVar
       }
       ";"                         (. Append(";"); ExdentNewLine(); .) .

    OneVar
    =  Identifier [ UpperBound ] .

    UpperBound
    =  "["                         (. Append("["); .)
       Expression
       "]"                         (. Append("]"); .) .

    CompoundStatement
    =  "BEGIN"                     (. Append("BEGIN"); Indent(); .)
          Statement
          { WEAK ";"               (. Append(";"); .)
            Statement }
       "END"                       (. ExdentNewLine(); Append("END"); .) .

    StatementSequence
    = Statement { WEAK ";"         (. Append(";"); .)
      Statement } .

    Statement
    =  SYNC (                      (. NewLine(); .)
            (   Assignment        | ReturnStatement
              | IfStatement       | WhileStatement   | CaseStatement
              | ReadStatement     | WriteStatement
            )
              |                    /* empty statement */
            ) .

    Assignment
    =  Variable
       ( AssignOp Expression
         | "++"                    (. Append("++"); .)
         | "--"                    (. Append("--"); .)
       ) SYNC .

    AssignOp
    =   ":="                       (. Append(" := "); .)
      | "+="                       (. Append(" += "); .)
      | "-="                       (. Append(" -= "); .)
      | "*="                       (. Append(" *= "); .)
      | "/="                       (. Append(" /= "); .)
      | "%="                       (. Append(" %= "); .)
      | "="                        (. SemError(93); Append(" := "); .)
    .

    ReturnStatement
    =  "RETURN"                    (. Append("RETURN"); .) .

    Variable
    =  Designator .

    Designator
    =  Identifier
       [  "["                      (. Append("["); .)
          Expression
          "]"                      (. Append("]"); .)
       ] .

    IfStatement
    =  "IF"                        (. Append("IF "); .)
         Condition
       "THEN"                      (. IndentNewLine(); Append("THEN"); Indent(); .)
         StatementSequence
       [ "ELSE"                    (. ExdentNewLine(); Append("ELSE "); Indent(); .)
         StatementSequence ]
       "END"                       (. Exdent(); ExdentNewLine(); Append("END"); .)
       .

    WhileStatement
    =  "WHILE"                     (. Append("WHILE "); .)
          Condition
       "DO"                        (. Append(" DO "); Indent(); .)
          StatementSequence
       "END"                       (. ExdentNewLine(); Append("END"); .)
       .

    CaseStatement
    =
      "CASE"                       (. Append("CASE "); .)
      Expression
      "OF"                         (. Append(" OF "); IndentNewLine(); Append("  "); .)
      OneCase { "|"                (. NewLine(); Append("| "); .)
        OneCase
      }
      [ "DEFAULT"                  (. NewLine(); Append("DEFAULT"); .)
        ":"                        (. Append(" :"); Indent(); Indent(); .)
        StatementSequence          (. Exdent(); Exdent(); .)
      ]
      "END"                        (. ExdentNewLine(); Append("END"); .) .

    OneCase
    = [ Range { ","                (. Append(", "); .)
        Range }
        ":"                        (. Append(" :"); Indent(); Indent(); .)
        StatementSequence          (. Exdent(); Exdent(); .)
      ] .

    Range
    =  Expression
       [ ".."                      (. Append(" .. "); .)
       Expression ] .

    Condition
    =  Expression .

    ReadStatement
    =  "READ" "("                  (. Append("READ("); .)
        Variable
        { ","                      (. Append(", "); .)
          Variable }
       ")"                         (. Append(")"); .) .

    WriteStatement
    =   "WRITE"                    (. Append("WRITE"); .)
        [ "("                      (. Append("("); .)
          WriteElement
          { WEAK ","               (. Append(", "); .)
          WriteElement }
          ")"                      (. Append(")"); .)
        ]
      | "NEWLINE"                  (. Append("NEWLINE"); .)
     .

    WriteElement
    =   String
      | Expression
        [ ":"                      (. Append(" : "); .)
          Expression ] .


    Expression
    =  AndExp { "OR"               (. Append(" OR "); .)
       AndExp } .

    AndExp
    =  RelExp { "AND"              (. Append(" AND "); .)
       RelExp } .

    RelExp
    =  AddExp [ RelOp AddExp ] .

    AddExp
    =  MultExp { AddOp MultExp } .

    MultExp
    =  UnaryExp { MulOp UnaryExp } .

    UnaryExp
    =  Factor | UnaryOp UnaryExp .

    Factor
    =   Designator
      | Number
      | "TRUE"                     (. Append("TRUE"); .)
      | "FALSE"                    (. Append("FALSE"); .)
      | "("                        (. Append("("); .)
        Expression ")"             (. Append(")"); .) .

    UnaryOp
    =  "+"                         (. Append(" +"); .)
     | "-"                         (. Append(" -"); .)
     | "NOT"                       (. Append(" NOT "); .) .

    AddOp
    =  "+"                         (. Append(" + "); .)
     | "-"                         (. Append(" - "); .) .

    MulOp
    =  "*"                         (. Append(" * "); .)
     | "/"                         (. Append(" / "); .) .

    RelOp
    =  "="                         (. Append(" = "); .)
     | "<>"                        (. Append(" <> "); .)
     | "<"                         (. Append(" < "); .)
     | "<="                        (. Append(" <= "); .)
     | ">"                         (. Append(" > "); .)
     | ">="                        (. Append(" >= "); .) .

    Identifier
    =                              (. char IdentName[20]; .)
       identifier                  (. LexString(IdentName, sizeof(IdentName) - 1);
                                      Append(IdentName); .) .

    Number
    =                              (. char Num[20]; .)
       number                      (. LexString(Num, sizeof(Num) - 1);
                                      Append(Num); .) .

    String
    =                              (. char str[200]; .)
       string                      (. LexString(str, sizeof(str) - 1);
                                      Append(str); .) .

  END CS301.


Home  © P.D. Terry