RHODES UNIVERSITY


Computer Science 301 - 2011 - Programming Language Translation

Well, here you are. Here is the free information you have all been waiting for, with some extra bits of advice:


Preliminary to the examination

As you should know by now, having discussed this in lectures, a pretty-printer is a form of translator that takes source code and translates this into object code that is expressed in the same language as the source. This probably does not sound very useful! However, the main aim is to format the "object code" neatly and consistently, according to some simple conventions, making it far easier for humans to understand.

For example, a system that will read a set of EBNF productions, rather badly laid out as

    Goal={One}.
    One
    = Two "plus" Four ".".
                 Four =Five {"is" Five|(Six Seven)}.
    Five = Six [Six       ].
    Six=  Two| Three    |"(*" Four "*)"|'(' Four ')'.

and produce the much neater output

    Goal
      = { One } .

    One
      = Two "plus" Four "." .

    Four
      = Five { "is" Five | ( Six Seven ) } .

    Five
      = Six  [ Six ] .

    Six
      = Two | Three | "(*" Four "*)" | '(' Four ')' .

is easily developed by using the following Cocol grammar

  import library.*;

  COMPILER EBNF $CN
  /* Pretty-print a set of EBNF productions
     P.D. Terry, Rhodes University, 2011 */

    static int indentation = 0;

    public static void append(String s) {
    // Append string s to results file
      IO.write(s);
    }

    public static void newLine() {
    // Write line mark to results file but leave indentation as before
      int i = 1;
      IO.writeLine();
      while (i <= indentation) { IO.write(' '); i++; }
    }

    public static void indentNewLine() {
    // Write line mark to results file and prepare to indent further lines
    // by two spaces more than before
      indentation += 2;
      newLine();
    }

    public static void exdentNewLine() {
    // Write line mark to results file and prepare to indent further lines
    // by two spaces less
      indentation -= 2;
      newLine();
    }

    public static void indent() {
    // Increment indentation level by 2
      indentation += 2;
    }

    public static void exdent() {
    // Decrement indentation level by 2
      indentation -= 2;
    }

  CHARACTERS
    letter   = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
    lowline  = "_" .
    control  = CHR(0) .. CHR(31) .
    digit    = "0123456789" .
    noquote1 = ANY - "'" - control .
    noquote2 = ANY - '"' - control .

  TOKENS
    nonterminal = letter { letter | lowline | digit } .
    terminal    = "'" noquote1 { noquote1 } "'" | '"' noquote2 { noquote2 } '"' .

  COMMENTS FROM "(*" TO "*)"  NESTED

  IGNORE control

  PRODUCTIONS
    EBNF
    = { Production                        (. newLine(); .)
      } EOF .

    Production                            (. String name; .)
    = NT<out name>                        (. append(name); .)
      "="                                 (. indentNewLine(); append("= "); .)
      Expression  "."                     (. append(" ."); exdentNewLine(); .)
    .

    Expression
    = Term { "|"                          (. append(" | "); .)
      Term } .

    Term
    = Factor {                            (. append(" "); .)
      Factor } .

    Factor                                (. String name; .)
    =   NT<out name>                      (. append(name); .)
      | terminal                          (. append(token.val); .)
      | "["                               (. append("[ "); .)
        Expression "]"                    (. append(" ]"); .)
      | "("                               (. append("( "); .)
        Expression ")"                    (. append(" )"); .)
      | "{"                               (. append("{ "); .)
        Expression "}"                    (. append(" }"); .)
    .

    NT<out String name>
    =  nonterminal                        (. name = token.val; .)
    .

  END EBNF.

The "code generator" is embodied in the methods append, indent, exdent and so on, which for illustration have simply been incorporated at the start of the grammar. A more sophisticated system might direct the output to a named file in the manner familiar to you from earlier practicals, and have a CodeGen class.

As the first step in preparing for the examination you are invited to experiment with the above system, code for which is included in the examination kit, and then to go on to develop a pretty-printer for programs expressed in the version of Parva described in the following grammar. A version of this grammar - spread out to make the addition of actions easy - can also be found in the examination kit, along with the necessary frame files. The kit also includes a complete executable Parva compiler that matches this grammar.

    COMPILER Parva $CN
    /* Parva level 1.5 grammar for examination - Grammar only
       Java operator precedences
       Supplied Parva Compiler matches this grammar (and has a few extensions)
       P.D. Terry, Rhodes University, 2011 */

     CHARACTERS
      lf           = CHR(10) .
      backslash    = CHR(92) .
      control      = CHR(0) .. CHR(31) .
      letter       = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
      digit        = "0123456789" .
      nonZeroDigit = "123456789" .
      stringCh     = ANY - '"' - control - backslash .
      charCh       = ANY - "'" - control - backslash .
      printable    = ANY - control .

    TOKENS
      identifier   = letter { letter | digit | "_" { "_" } ( letter | digit ) } .
      number       = "0" | nonZeroDigit { digit } .
      stringLit    = '"' { stringCh  | backslash printable } '"' .
      charLit      = "'" ( charCh    | backslash printable ) "'" .

    PRAGMAS
      CodeOn       = "$C+" .
      CodeOff      = "$C-" .
      DebugOn      = "$D+" .
      DebugOff     = "$D-" .
      ShortCircOn  = "$S+" .
      ShortCircOff = "$S-" .
      WarnOn       = "$W+" .
      WarnOff      = "$W-" .

    COMMENTS FROM "//" TO lf
    COMMENTS FROM "/*" TO "*/"

    IGNORE CHR(9) .. CHR(13)

    PRODUCTIONS
      Parva               = "void" Ident "(" ")" Block .
      Block               = "{" { Statement } "}" .
      Statement           = (   Block             | ConstDeclarations  | VarDeclarations | AssignmentStatement
                              | IfStatement       | WhileStatement     | ForStatement    | DoWhileStatement
                              | BreakStatement    | HaltStatement      | ReadStatement   | WriteStatement
                              | ReadLineStatement | WriteLineStatement | ";"
                           ) .
      ConstDeclarations   = "const" OneConst { "," OneConst } ";" .
      OneConst            = Ident "=" Constant .
      Constant            = IntConst | CharConst | "true" | "false" | "null" .
      VarDeclarations     = Type OneVar { "," OneVar } ";" .
      Type                = BasicType [ "[]" ] .
      BasicType           = "int" | "bool" | "char" .
      OneVar              = Ident [ "=" Expression ] .
      AssignmentStatement = Assignment ";" .
      Assignment          =   Designator ( AssignOp Expression | "++" | "--" )
                            | "++" Designator | "--" Designator .
      Designator          = Ident [ "[" Expression "]" ] .
      IfStatement         = "if" "(" Condition ")" Statement
                             { "elsif" "(" Condition ")" Statement }
                             [ "else" Statement ] .
      WhileStatement      = "while" "(" Condition ")" Statement .
      DoWhileStatement    = "do" Statement "while" "(" Condition ")" ";" .
      ForStatement        = "for" ForControl Statement .
      ForControl          =   "(" [ [ BasicType ] Ident "=" Expression ] ";" [ Condition ] ";" [ Assignment ] ")"
                            | Ident "=" Expression ( "to" | "downto" ) Expression .
      BreakStatement      = "break" ";" .
      HaltStatement       = "halt" ";" .
      ReadStatement       = "read" "(" ReadElement { "," ReadElement } ")" ";" .
      ReadLineStatement   = "readLine" "(" [ ReadElement { "," ReadElement } ] ")" ";" .
      ReadElement         = StringConst | Designator .
      WriteStatement      = "write" "(" WriteElement { "," WriteElement } ")" ";" .
      WriteLineStatement  = "writeLine" "(" [ WriteElement { "," WriteElement } ] ")" ";" .
      WriteElement        = StringConst | Expression .
      Condition           = Expression .
      Expression          = AndExp { "||" AndExp } .
      AndExp              = EqlExp { "&&" EqlExp } .
      EqlExp              = RelExp { EqlOp RelExp } .
      RelExp              = AddExp [ RelOp AddExp ] .
      AddExp              = MulExp { AddOp MulExp } .
      MulExp              = Factor { MulOp Factor } .
      Factor              = Primary | "+" Factor | "-" Factor | "!" Factor .
      Primary             =   Designator | Constant | "new" BasicType "[" Expression "]"
                            | "("
                              (   "char" ")" Factor
                                 | "int" ")"  Factor
                                 | Expression ")"
                              ) .
      MulOp               = "*"  | "/" | "%" .
      AddOp               = "+"  | "-" .
      EqlOp               = "==" | "!=" .
      RelOp               = "<"  | "<=" | ">" | ">=" .
      AssignOp            = "=" .
      Ident               = identifier .
      StringConst         = stringLit .
      CharConst           = charLit .
      IntConst            = number .
    END Parva.

You should find that this is quite easy to do, leading to a system that might be able to convert a "messy" little Parva program like

     void main(){int b,c,d;
       b=        10;
       while (b>0) { write("b = ", b); b--;
       if (b == 4) writeLine(" hit four!")
       ;       else writeLine(" on we go");}
       }

into the rather more readable code

     void main() {
       int b, c, d;
       b = 10;
       while (b > 0) {
         write("b = ", b);
         b--;
         if (b == 4)
           writeLine(" hit four!");
         else
           writeLine(" on we go");
       }
     }

Hints:

(a) Do not change the grammar to include any of the other extensions you made in Prac 25.

(b) One test for correctness is to take the output from the pretty-printer and pretty-print that again (ie take the output file from one run and use it as the input file for another run). You should, of course, just get the same result again.

(c) Another test is to compile the "raw" form of one of the demonstration programs, and also compile the "pretty" form of the same program. The .cod files produced by each compilation should be the same.


Home  © P.D. Terry