Computer Science 301 - 2000


Tutorial for week 26 - attribute grammars and multisegment languages

1. Brinch Hansen introduced an extended form of the WHILE loop into his language Edison:

       WhileStatement  =  "WHILE" Condition "DO" Statement
                          { "ELSE" Condition "DO" Statement }
                          "END" .

The Conditions are evaluated one at a time in the order written until one is found to be true, when the corresponding Statement is executed, after which the process of evaluating conditions is repeated. If no Condition is true, the loop terminates.

How could the code generation for this be implemented? Can you think of any algorithms where this statement would be useful? Any situations where it might get you into trouble?

  WhileStatement
  =                             (. CGEN_labels startloop, testlabel, dummylabel; .)
     "WHILE"                    (. CGen->storelabel(startloop); .)
     Condition "DO"             (. CGen->jumponfalse(testlabel, CGen->undefined); .)
     Statement                  (. CGen->jump(dummylabel, startloop); .)
     {                          (. CGen->backpatch(testlabel); .)
     "ELSE Condition "DO"       (. CGen->jumponfalse(testlabel, CGen->undefined); .)
     Statement                  (. CGen->jump(dummylabel, startloop); .)
     }                          (. CGen->backpatch(testlabel); .)
     "END" .

The choice of ELSE as the internal keyword is a little awkward - consider what would happen if the internal statement was an IF statement.

2. Consider an expression grammar like that below. How would you attribute these productions so that a parser, after recognising an Expression, would be able to determine whether that Expression was a Boolean expression (that is, would be acceptable as a Condition in IF and WHILE and REPEAT statements), or was of some other type? Assume that the Designator production can be attributed so that it returns the result of searching the symbol table to determine the type of the associated identifier.

      Expression = Simple [ Relop Simple ] .
      Simple     = Term { Addop Term } .
      Term       = Factor { MulOp Factor } .
      Factor     = Designator | number | "true" | "false" | "(" Expression ")" .
      Relop      = "=" | "!=" | "<" | "<=" | ">" | ">=" .
      AddOp      = "+" | "-" | "||" .
      MulOp      = "*" | "/" | "&&" .

We need to parameterize all the non-terminals, and to introduce some local variables and Boolean logic. Once you see what to do it is easy enough!

  Expression<bool &isBool>
  =                               (. bool alsoBool; .)
  Simple<isBool>
  [ RelOp                         /* RelOp should normally force it to be Boolean */
    Simple<alsoBool>              (. isbool = !(isbool || alsoBool);
                                  /* but we cannot have a RelOp between two Booleans */
  ] .

  Simple<bool &isBool>
  =                               (. bool alsoBool, boolOp; .)
    Term<isBool>
    { AddOp<boolOp>               /* we can only have a boolOp between Boolean terms */
      Term<alsoBool>              (. isBool = isBool && boolOp && alsoBool; .)
    } .

  Term<bool &isBool>
  =                               (. bool alsoBool, boolOp; .)
    Factor<isBool>
    { MulOp<boolOp>               /* we can only have a boolOp between Boolean Factors */
      Factor<alsoBool>            (. isBool = isBool && boolOp && alsoBool; .)
    } .

  Factor<bool &isBool>
  =   Designator<isBool>
    | number                      (. isBool = false; .)
    | "true"                      (. isBool = true; .)
    | "false"                     (. isBool = true; .)
    | "(" Expression<isBool> ")" .

  AddOp<bool &boolOp>
  =   "+"                         (. boolOp = false; .)
    | "-"                         (. boolOp = false; .)
    | "||"                        (. boolOp = true; .) .

  MulOp<bool &boolOp>
  =   "*"                         (. boolOp = false; .)
    | "/"                         (. boolOp = false; .)
    | "&&"                        (. boolOp = true; .) .

3. How would you change the Topsy grammar if you wanted to have a C-like system in which multiple functions were permitted, but where these could not be "nested" as in Clang. Blocks may still have local declarations of variables and constants, and the "global" concept must also be possible. Limit yourself to the situation where only void, parameterless functions are allowed. (You do not have to write out all the productions again, only those that would need to change.)

One solution is as follows:

  PRODUCTIONS
    Topsy = { Declaration | FunctionDefinition } .
    FunctionDefinition = "void" Identifier "(" "void" ")" ( Block | ";" ) .
    Block = "{" { Declaration | Statement } "}" .
    Declaration = ConstDeclaration | VarDeclarations .
    Statement
    =    Block           | AssignmentOrCall
       | IfStatement     | WhileStatement
       | ForStatement    | RepeatStatement
       | ReadStatement   | WriteStatement
       | ReturnStatement | EmptyStatement .
    AssignmentOrCall = Designator ( "=" Expression | "++" | "--" | "()" ) ";" .

We would need some kind of semantic restriction that at least one function would be named "main", but it would be easier to impose this (linker-related) constraint in that way than to try to do it with clever syntax, as the latter tends to introduce LL(1) errors or undesirable other constraints.


  COMPILER Topsy
  (* Topsy++ level 1 grammar - no procedures, functions, parameters
     P.D. Terry, Rhodes University, 2000 *)

  CHARACTERS
    cr         = CHR(13) .
    lf         = CHR(10) .
    back       = CHR(92) .
    letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
    digit      = "0123456789" .
    graphic    = ANY - '"' - "'" - back - cr - lf - CHR(0) .

  IGNORE CHR(9) .. CHR(13)
  COMMENTS FROM "//" TO lf
  COMMENTS FROM "/*" TO "*/"

  TOKENS
    identifier = letter { letter | digit | "_" } .
    number     = digit { digit } .
    string     = '"' { graphic | back graphic | "\'" | back back | '\"' } '"' .
    character  = "'" ( graphic | back graphic | "\'" | back back | '\"' ) "'" .

  PRODUCTIONS
    Topsy = "void" "main" "(" "void" ")" Block .
    Block = "{" { Declaration | Statement } "}" .
    Declaration = ConstDeclaration | VarDeclarations .
    Statement
    =    Block           | Assignment
       | IfStatement     | WhileStatement
       | ForStatement    | RepeatStatement
       | ReadStatement   | WriteStatement
       | ReturnStatement | EmptyStatement .
    ConstDeclaration
    =  "const" identifier "=" ( number | character | "true" | "false" ) ";" .
    VarDeclarations = ( "int" | "bool" | "char" ) OneVar { "," OneVar } ";" .
    OneVar = identifier [ ArraySize | "=" Expression ] .
    ArraySize = "[" number "]" .
    Assignment = Variable ( "=" Expression | "++" | "--" ) ";" .
    Variable = Designator .
    Designator = identifier [ "[" Expression "]" ] .
    IfStatement = "if" "(" Condition ")" Statement [ "else" Statement ] .
    WhileStatement = "while" "(" Condition ")" Statement .
    RepeatStatement = "do" { Statement } "until" "(" Condition ")" ";" .
    Condition = Expression .
    ForStatement
    =  "for" "(" identifier "=" Expression ";" Condition [ ";" Epilogue ] ")"
       Statement .
    Epilogue = identifier ( "=" Expression | "++" | "--" ) .
    ReadStatement = "cin" ">>" Variable { ">>" Variable } ";" .
    WriteStatement = "cout" "<<" WriteElement { "<<" WriteElement } ";" .
    WriteElement = string | Expression .
    ReturnStatement = "return" ";" .
    EmptyStatement = ";" .
    Expression = SimpleExpr [ RelOp SimpleExpr ] .
    SimpleExpr = ( "+" Term | "-" Term | Term ) { AddOp Term } .
    Term = Factor { MulOp Factor } .
    Factor =   Designator | number | character | "true" | "false"
             | "!" Factor | "(" Expression ")" .
    AddOp = "+" | "-" | "||" .
    MulOp = "*" | "/" | "%" | "&&" .
    RelOp = "==" | "!=" | "<" | "<=" | ">" | ">=" .
  END Topsy.