Computer Science 3 - 2000 - Test on Prac 24

1. One more time - what is your name?

2. In the prac just completed you used the following set of productions to build a Boolean calculator

     PRODUCTIONS
       Bool       = { Print | Assignment } "QUIT" .
       Print      = "PRINT" OneExpr { WEAK "," OneExpr } SYNC ";" .
       OneExpr    = Expression .
       Assignment = identifier "=" Expression ";" .
       Expression = Term { Or Term } .
       Term       = Factor { [ And ] Factor } .
       Factor     = Not Factor | Primary { "'" } .
       Primary    = identifier | True | False | "(" Expression ")" .
       And        = "AND" | "&" | "." .
       Or         = "OR" | "|" | "+" .
       Not        = "NOT" | "!" .
       True       = "TRUE" | "1" .
       False      = "FALSE" | "0" .

Someone asked a demonstrator whether the grammar could have been simplified to omit the semicolons:

Print = "PRINT" OneExpr { WEAK "," OneExpr } .
Assignment = identifier "=" Expression .

What should the demonstrator have replied by way of a good explanation?

Many people realised that this was not a good idea, but did not give convincing explanations. The semicolons are fundamentally necessary for maintaining an LL(1) grammar only because the AND operator is "optional". If the production

Term = Factor { [ And ] Factor } .

were to be changed to

Term = Factor { And Factor } .

then the system would still be LL(1) compliant (try it if you don't believe me). The semicolons are also useful as synchronisation points, but this is secondary.

3. Below you will find the familiar grammar for Clang. Suppose we wanted to allow Clang to have "goto" statements like those in Basic:

           BEGIN
             Total := 0;
             10: Read(A);
             IF A = 0 THEN GOTO 20;
             Total := Total + A;
             GOTO 10;
             20: Write(Total);
           END.

What syntactic changes would you suggest making to the grammar to allow this? (simply add to the grammar below).

The easiest change is probably:

      Statement     = [ Label ]
                      SYNC [   CompoundStatement | Assignment | GoToStatement
                             | IfStatement       | WhileStatement
                             | ReadStatement     | WriteStatement ] .
      Label         = number ":" .
      GoToStatement = "GOTO" number .

Discuss briefly the sorts of static semantic checks that a compiler would have to make if you wanted to allow such statements. You need not show in detail how the checks would have to be made.

Some people missed the point here - the questions was aimed at static semantic checking, not code generation. A compiler would need to check that each label mentioned in a GoToStatement was actually defined, and that the labels at the start of labelled statements were unique. We could not allow a program like

      BEGIN
        GOTO 20;
        10: READ(A);
        10: WRITE(B);
      END.

However, those are only the easy checks to make. There is a far more awkward scenario, which only one student seemed to have appreciated, and even then not completely. We should not allow code like

       BEGIN
         GOTO 10;
         WHILE A < B DO
           10: WRITE(X);
       END

because to jump into the body of a WHILE loop would be a very silly and dangerous thing to do. Similarly, code like

       BEGIN
         GOTO 10;
         IF A < B THEN
           10: WRITE(X);
       END

is dangerous. How can we prevent this? The answer offered by the student who seemed to have grasped this point was to have syntax like

  CompoundStatement = "BEGIN" [ Label ] Statement
                         { WEAK ";" [ Label ] Statement }
                      "END" .
  Statement         = SYNC [   CompoundStatement | Assignment | GoToStatement
                             | IfStatement       | WhileStatement
                             | ReadStatement     | WriteStatement ] .
  Assignment        = Variable ":=" Expression SYNC .
  IfStatement       = "IF" Condition "THEN" Statement .
  WhileStatement    = "WHILE" Condition "DO" Statement .
  Label             = number ":" .
  GoToStatement     = "GOTO" number .

which at first sight looks as though it does the trick rather nicely. But, alas, no. Consider the trivial way to break the rule:

       BEGIN
         GOTO 10;
         WHILE A < B DO
           BEGIN 10: WRITE(X) END;
       END

If you really want to prevent people from breaking into the middle of loop bodies or into the middle of THEN clauses you will need a far more complex strategy. You would have to maintain lists of the labels used inside compound statements and maintain scope information about these lists rather like the scope information we need for variables in block structured languages such as we have just been discussing in lectures.

The GoTo statement is not only abhorred by the anti-spaghetti-code establishment. It really is really rather awkward to check (let alone) generate code for it!

4. As many of you know, my brother Peter occasionally brightens up your TV screens with good advice as to how to keep your toilets clean using Domestos and your bathroom walls beautiful with tiles from CTM. (I suspect he may teach you more useful skills than I do?) What you may not know is that he worked for State Theatre in Pretoria, which recently folded, so he is dependent more than ever before on what the SABC pay him. He has asked me to write a system to allow him to analyse the offerings of the SABC to determine the number of times he appears on screen. I have got as far as the following grammar describing the situation:

     COMPILER SABCTV $XCN

     IGNORE CHR(1) .. CHR(31)

     PRODUCTIONS
       SABCTV      = { Programme } "Closedown" .
       Programme   = "Announcement" { QualityShow }
                     "Reminder" /* you know it's the right thing to do */ .
       QualityShow = "Programme" { Advert } .
       Advert      = "CTMTiles" | "Domestos" | "Reminder" | "VodaCom" | "Nando's" | "LGDigital" .
     END SABCTV.

Have I produced an LL(1) grammar? If not, where not?

The production for QualityShow contains the nullable section { Advert }. The first set of this is

{ CTMTilesSym, DomestosSym, ReminderSym, VodaComSym, NandosSym, LGDigitalSym }

but { ReminderSym } is the follow set of QualityShow. So the grammar is not LL(1).

How do I add the actions to be able to count his appearances? (Assume for this part of the test that any LL(1) violations are not important.)

Two solutions are shown. Note that Peter does not appear in all possible adverts!

     int appears = 0;  // globally declared

     SABCTV = { Programme }
     "Closedown"                           (. printf("Appeared %d times", appears); .)
     .

     Programme = "Announcement" { QualityShow } "Reminder" .

     QualityShow = "Programme" { Advert } .

     Advert
     =  ( ( "CTMTiles" |  "Domestos" )     (. appears++; .) )
        | "Reminder" | "VodaCom" | "Nando's" | "LGDigital" .

or

     SABCTV
     =                                   (. int appears = 0; .)
       { Programme<appears>
       } "Closedown"                     (. printf("Appeared %d times", appears); .)
       .

     Programme<int &n> = "Announcement" { QualityShow<n> } "Reminder" .

     QualityShow<int &n> = "Programme" { Advert<n> } .

     Advert<int &N>
     =  ( ( "CTMTiles" |  "Domestos" )   (. N++; .) )
        | "Reminder" | "VodaCom" | "Nando's" | "LGDigital" .


Computer Science 3 - 2000 - Test on Prac 24

1. One more time - what is your name?

2. Below you will find the familiar grammar for Clang. Someone asked a demonstrator whether one could change the productions describing constant and variable declarations to look more consistent. The change suggested was

     ConstDeclarations = "CONST" OneConst { ";" OneConst } ";" .
     OneConst          = identifier "=" number .
     VarDeclarations   = "VAR" OneVar { "," OneVar } ";" .
     OneVar            = identifier [ UpperBound ] .
     UpperBound        = "[" number "]" .

How should the demonstrator have responded, giving a reasonable explanation?

The grammar is now non-LL(1).

As proposed, ConstDeclaration includes the nullable section { ";" OneConst } where the FIRST set is the SemicolonSym which is also the FOLLOW set for this section. So don't do this. (Incidentally, answers received to this question told me more about demonstrator behaviour than I wanted to know!)

3. Suppose we wanted to allow Clang to have "loop - end" and "exit" statements like those in Modula-2:

           BEGIN
             Total := 0; I := 1;
             LOOP
               Read(A[I]);
               IF A[I] = 0 THEN EXIT;
               Total := Total + A[I];
               I := I + 1;
               IF I > Max THEN EXIT;
             END;
             Write(Total);
           END.

What syntactic changes would you suggest making to the grammar to allow this? (simply add to the grammar below).

The syntactic changes are trivially easy:

  Statement     = SYNC [   CompoundStatement | Assignment
                         | IfStatement       | WhileStatement
                         | LoopStatement     | ExitStatement
                         | ReadStatement     | WriteStatement ] .
  LoopStatement = "LOOP" Statement { WEAK ";" Statement } "END" .
  ExitStatement = "EXIT" .

However, most answers submitted were incorrect. The EXIT is not necessarily attached directly to an IF statement. Code like this might be required:

      IF A > 9 THEN
        BEGIN (* clean up *)
          C := D; Write("Some message'); EXIT
        END

Discuss briefly any non-syntactic checks that a compiler would have to make if you wanted to allow such statements. You need not show in detail how the checks would have to be made.

The problem is that the ExitStatement may only appear "inside" a LoopStatement statement sequence, and not at other arbitrary places in the code. So this sort of error has to be detected:

      BEGIN
        EXIT;   (* illegal *)
        LOOP
          A := B;
          EXIT  (* legal *)
        END;
        EXIT    (* illegal *)
      END

This turns out to be something that cannot easily be done with syntax alone, if at all. As an alternative, the compiler can keep track of whether a loop statement sequence is being parsed, and object to EXIT if it appears when this is not the case. Since loop statements can be nested this has to be done with some sort of counter, not a Boolean flag.

There were several students who offered the suggestion that it would be necessary to check that an EXIT always appeared at least once inside a loop. That looks like a good idea (especially for teaching beginners, who should be encouraged to avoid infinite loops), but most languages would need to allow constructs like

        LOOP
          AttemptPrac;
          IF Disaster THEN BEGIN CleanUp; HALT END;
          DoNextQuestion
        END

and even in our present language we can write infinite loops if we persist

        WHILE 1 = 1 DO Write('Ho')

4. As many of you know, my brother Peter occasionally brightens up your TV screens with good advice as to how to keep your toilets clean using Domestos and your bathroom walls beautiful with tiles from CTM. (I suspect he may teach you more useful skills than I do?) The SABC fear that he may be suffering from over- exposure and have asked me to write a system to allow them to check that he never appear in two advertisements in a row (that is, that the Domestos and CTM adverts must be separated by at least one other advert. I have got as far as the following grammar describing the situation:

      COMPILER SABCTV $XCN

      IGNORE CHR(1) .. CHR(31)

      PRODUCTIONS
        SABCTV      = { Programme } "Closedown" .
        Programme   = "Announcement" { QualityShow }
                      "Reminder" /* you know it's the right thing to do */ .
        QualityShow = "Programme" { Advert } .
        Advert      = "CTMTiles" | "Domestos" | "Reminder" | "VodaCom" | "Nando's" | "LGDigital" .
      END SABCTV.

Would Coco/R object to the fact that I have used the word Programme in two ways? Explain.

Not at all. Programme is the name of a non-terminal, and "Programme" is the spelling of a terminal token.

How do I add the actions to be able to check my brother's appearances? Assume for this part of the test that any LL(1) violations are not important and that the use of the word Programme in two ways is unimportant.

Several people missed the point that they had been asked to check the constraint that Peter does not appear twice in a row, and not to count how often he appeared in toto. It is not necessary to manipulate strings or to construct a symbol table either - the internal tokens used by Coco/R will do the trick.

Two possible solutions are suggested below. As a follow up question - does this grammar check his appearances within one burst of adverts only, or across a whole evening? If you wanted it the other way around, what would you have to change?

     bool lastWasPeter = false;   /* global */

     SABCTV      = { Programme } "Closedown" .
     Programme   = "Announcement" { QualityShow } "Reminder" .
     QualityShow = "Programme" { Advert } .
     Advert
     = (
         ( "CTMTiles" | "Domestos" )   (. if (lastWasPeter) SemError(200);
                                          lastWasPeter = true; .)
       )
       |
       (
         (   "Reminder" | "VodaCom"
           | "Nando's" | "LGDigital" ) (. lastWasPeter = false; .)
       ) .

or

     SABCTV
     =                                  (. bool last = false, okay = true .)
     { Programme<okay, last> }          (. if (!okay) printf("Peter overexposed"); .)
     "Closedown" .

     Programme<bool &okay, bool &last>
     = "Announcement" { QualityShow<okay, last> } "Reminder" .

     QualityShow<bool &okay, bool &last>
      = "Programme" { Advert<okay, last> } .

     Advert<bool &okay, bool &lastWasPeter>
     = (
         ( "CTMTiles" | "Domestos" )   (. if (lastWasPeter) okay = false;
                                          lastWasPeter = true; .)
       )
       |
       (
         (   "Reminder" | "VodaCom"
           | "Nando's" | "LGDigital" ) (. lastWasPeter = false; .)
       ) .


COMPILER Clang $XCN
/* Simple grammar for Clang level 1
   P.D. Terry, Rhodes University, 2000 */

IGNORE CASE
IGNORE CHR(9) .. CHR(13)

CHARACTERS
  cr         = CHR(13) .
  lf         = CHR(10) .
  letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
  digit      = "0123456789" .
  instring   = ANY - "'" - cr - lf .

COMMENTS FROM "(*" TO "*)"

TOKENS
  identifier = letter { letter | digit } .
  number     = digit { digit } .
  string     = "'" (instring | "''") { instring | "''" } "'" .

PRODUCTIONS
  Clang             = "PROGRAM" identifier WEAK ";" Block "." .
  Block             = { ConstDeclarations | VarDeclarations }
                      CompoundStatement .
  ConstDeclarations = "CONST" OneConst { OneConst } .
  OneConst          = identifier WEAK "=" number ";" .
  VarDeclarations   = "VAR" OneVar { WEAK "," OneVar } ";" .
  OneVar            = identifier [ UpperBound ] .
  UpperBound        = "[" number "]" .
  CompoundStatement = "BEGIN" Statement { WEAK ";" Statement } "END" .
  Statement         = SYNC [   CompoundStatement | Assignment
                             | IfStatement       | WhileStatement
                             | ReadStatement     | WriteStatement ] .
  Assignment        = Variable ":=" Expression SYNC .
  Variable          = Designator .
  Designator        = identifier [ "[" Expression "]" ] .
  IfStatement       = "IF" Condition "THEN" Statement .
  WhileStatement    = "WHILE" Condition "DO" Statement .
  ReadStatement     = "READ" "(" Variable { WEAK "," Variable } ")" .
  WriteStatement    = "WRITE" "(" WriteElement { WEAK "," WriteElement } ")" .
  WriteElement      = string | Expression .
  Condition         = Expression RelOp Expression .
  Expression        = ( "+" Term | "-" Term | Term ) { AddOp Term } .
  Term              = Factor { MulOp Factor } .
  Factor            = Designator | number | "(" Expression ")" .
  AddOp             = "+" | "-" .
  MulOp             = "*" | "/" .
  RelOp             = "=" | "<>" | "<" | "<=" | ">" | ">=" .
END Clang.