Test for week 7

Computer Science 301 - 2016 - Test on Week 7 - solutions

TEXTBOOKS MAY NOT BE USED.

Preferably answer this test directly on the computer. Max 20

The questions below are to be found in a file TEST.TXT or TEST.DOC (MS word) Preferably answer the test by typing your answers in the obvious blank spaces. If you really don't feel up to this, write them by hand on the test sheet and hand this in instead. Either print the file or e-mail it to me when you are finished.

You will notice that the production for ANumber has actions that catch potential exception that could be raised by Convert.ToInt32(). (Line 249). But the number terminal can consist of nothing but a string of digits. Is it possible for this exception to be thrown (if so how) or is this just an example of over cautious programming?

The exception would be thrown if token.val defined a number that was outside the capacity of the C# 32 bit int32 type.

How would you modify the definition of the number terminal (line 92) so that the only number that could start with 0 would be the single digit number 0 (that is, forbid numbers like 000067)?

  CHARACTERS
    digit19    = "123456789" .
    digit09    = "0123456789" .
    letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .

  TOKENS
    number     = digit19 { digit09 } | '0' .
    identifier = letter { letter | digit09 } .

Would this affect your answer to the previous question? Explain.

It would make no difference. "number" could still represent a number that exceeds the capacity of the int32 type.

Why do all the productions for the components of an Expression mark their formal parameters with the key word out?

Effectively because they are all passing information gathered within the production "up" a parse tree - synthesized rather than inherited attributes of the Node objects.

In line 235 a new Node object is created. What purpose might this serve - given that all the subsequent options for a Factor each construct a node in any case? Just another sloppy bit of code left lying about?

The question was phrased a little badly (deliberately). Nor all the options in Factor define a new Node - see the code below. Not sure now whether anyone spotted this.

      Factor<out Node node, bool eval>     (. string name;
                                              int n;
                                              node = new Node(); .)
 **   =   AVariable<out name>              (. Variable v = Table.Find(name);
 **                                           if (v == null) SemError("undeclared variable");
 **                                           else node = new Node(v.type, v.value); .)
        | ANumber<out n>                   (. node = new Node(Types.intType, n); .)
        | "true"                           (. node = new Node(Types.boolType, 1); .)
        | "false"                          (. node = new Node(Types.boolType, 0); .)
        | "sqr" "("
             Expression<out node, eval>    (. if (node.type != Types.intType)
                                                SemError("integer value expected");
                                              if (eval) node.value = node.value * node.value;
                                              node.type = Types.intType; .)
          ")"
        | "(" Expression<out node, eval> ")" .

But if the code been written as follows then, although every option would seem to have defined a new node

      Factor<out Node node, bool eval>     (. string name;
                                              int n; .)
      =   AVariable<out name>              (. Variable v = Table.Find(name);
                                              if (v == null) SemError("undeclared variable");
                                              node = new Node(v.type, v.value); .)
        | ANumber<out n>                   (. node = new Node(Types.intType, n); .)
        | "true"                           (. node = new Node(Types.boolType, 1); .)
        | "false"                          (. node = new Node(Types.boolType, 0); .)
        | "sqr" "("
             Expression<out node, eval>    (. if (node.type != Types.intType)
                                                SemError("integer value expected");
                                              if (eval) node.value = node.value * node.value;
                                              node.type = Types.intType; .)
          ")"
        | "(" Expression<out node, eval> ")" .

this would have not satisfied the C# compiler. Coco would have generated a switch statement, and the default would have been to "do nothing", thus not guaranteeing that every path through the method would have assigned a value to "out Node node".

There is a subtle error in the production for UnaryExp. See if you can spot it and correct it.

The ! (NOT) operator is applied to a Boolean value and returns a Boolean value. This was a genuine silly slip on my side when preparing the solution to Prac 6 and shows you just how easily this sort of error can slip in!

      | "!" UnaryExp<out node, eval>     (. if (node.type != Types.boolType)
                                               SemError("boolean operand required");
 **                                          node.type = Types.boolType;
                                             if (eval) node.value = ToInt(!ToBool(node.value)); .) .

If you study the productions for the components of an expression you will see that they are based on

  EqlExp = RelExp { EqlOp RelExp }.

  RelExp = AddExp [ RelOp AddExp ] .

  AddExp = MultExp { AddOp MultExp } .

  MultExp = UnaryExp { MulOp UnaryExp

One of these uses [ ] bracketing, the others use { } bracketing. Are they correct? Explain, or correct any errors that you see.

This more interesting than one might at first think. Since an expression like

a < b < c

must be invalid, no matter what the types of a, b and c are, a syntactic restriction like the one in RelExp seems a good way to go. On the other hand, although an expression like

a == b == c

would be illegal if a, b and c were all integers, it might have some meaning if they were all Booleans. Hence the relaxation in the definition of EqlExp to allow more than two RelExps to form components of an EqlExp.
Now you might like to think further. Should a == b == c be parsed to match semantics like (a == b) == c, or should it be parsed to match semantics like a == (b == c), or is there any real difference?
Maybe it might be better to restrict the grammar to

EqlExp = RelExp [ EqlOp RelExp ]. // square RelExp = AddExp [ RelOp AddExp ] . // square AddExp = MultExp { AddOp MultExp } . // braces MultExp = UnaryExp { MulOp UnaryExp // braces

Then users would be obliged to incorporate parentheses if they were trying to write code like

a == b == c == d

and it would ultimately make life easier. The same sort of reasoning can be used to suggest that the dangling else problem can be killed by requiring that an IfStatement has a specific terminator like "end", or that the word "while" should not be used in two different looping constructs. Programming language design is full of traps!

Is there any deep significance in the numbers used in the constant declarations in lines 81 - 85? Could one have used any arbitrary numbers? Why introduce these constant definitions in any case?

No deep significance. But one could not use "any" arbitrary numbers - they would have to be "distinct" (different), which several students may not have thought worth noting (or had not noticed!) They are introduced so that the rest of the code, hopefully, becomes more readable. Code full of magic numbers is very hard to follow or debug, code written with meaningful identifier names is much easier to handle.

Suppose one wished to add a function into the calculator to enable you to evaluate expressions like

a + sqr(b + c)

or

sqr(sqr(34))

where sqr means "square" and not "square root". Where and how would the grammar need modification? (Give the actual code, and the line numbers where it would need to be inserted.)

You can see the answer in the revised code for Factor above. Some students were tempted to write this branch

        | "sqr" "("
 **        AndExp<out node, eval>        (. if (node.type != Types.intType)
                                                SemError("integer value expected");
                                              if (eval) node.value = node.value * node.value;
                                              node.type = Types.intType; .)
          ")"
        | "(" Expression<out node, eval> ")" .

presumably - to give them full marks for really thinking hard - so that one could be prevented syntactically for nonsense like

sqr( 34 > 23 )

However, the best laid plans of mice and Rhodents can easily be beaten by nonsense like this

sqr( (34 > 23) )

because the "inner" parenthesized expression (34 > 23) will be parsed as a complete Expression. You would have to rely on semantics to pick up the folly in this case.

Someone has suggested that the code in lines 101 to 111 should be replaced with

      Assignment                      (. Node rootNode;
                                         string name; .)
      = AVariable<out name>           (. Variable v = Table.Find(name);
                                         if (v == null)  // declaration
                                           v = new Variable(name); .)
      "=" Expression<out rootNode>    (. v.type  = rootNode.type;
                                         v.value = rootNode.value; .)
      SYNC ";" .

What are your thoughts on the matter? Is this better/worse/the same as the original code? Give reasons.

Well, most people spotted that this would never add a variable to the variable table, so would be quite useless.

Confession. I had meant to set this question with code reading as follows, but in my haste to get the test set for you, got it wrong

      Assignment                      (. Node rootNode;
                                         string name; .)
      = AVariable<out name>           (. Variable v = Table.Find(name);
                                         if (v == null)  {   // declaration
                                           v = new Variable(name);
 **                                      Table.Add(v); .)
      "=" Expression<out rootNode>    (. v.type  = rootNode.type;
                                         v.value = rootNode.value; .)
      SYNC ";" .

Does this make a difference to the answer? If so why (or why not)?

This week's big bonus. In the discussion on Monday we came to the conclusion that defining a calculator in this way would not afford an opportunity to incorporate "short circuit" semantics into the evaluation of Boolean expressions, because the system had to parse an expression completely, and thus would necessarily evaluate all its components. You were left with a challenge to see whether you could find a way around it.

I am sure that, like me, you went away and thought about it some more?

Of course it is quite possible. See if you can sketch out how it can be done, and how we can come up with a system whose semantics match the definition of short circuit semantics

A OR B = if A then TRUE else B

A AND B = if A then B else FALSE ( ie if !A then FALSE else B )

To my disappointment, very few students had any clue about this one.

The trick is to challenge the part of the above that reads "the system had to parse an expression completely, and thus would necessarily evaluate all its components". You certainly have to PARSE all the components, but you can suppress the EVALUATION of some of them. All you need to do is to pass a parameter "down" the tree indicating whether the calculation of the node value needs to be performed. Here are a few extracts from the modified grammar to show this - you can see the whole lot if you download the solution kit in TEST&A.ZIP.

 **   Expression<out Node node, bool eval> (. Node node2;
 **                                           bool eval2 = true; .)
 **   = AndExp<out node, eval>
 **     {                                  (. if (ToBool(node.value)) eval2 = false; .)
 **       "||" AndExp<out node2, eval2>    (. if (node.type != Types.boolType || node2.type != Types.boolType)
 **                                             SemError("boolean operands required");
 **                                           node.type  = Types.boolType;
 **                                           if (eval2) node.value = node2.value; .)
        } .

 **   AndExp<out Node node, bool eval>     (. Node node2;
 **                                           bool eval2 = true; .)
 **   = EqlExp<out node, eval>
 **     {                                  (. if (! ToBool(node.value)) eval2 = false; .)
 **       "&&" EqlExp<out node2, eval>     (. if (node.type != Types.boolType || node2.type != Types.boolType)
 **                                             SemError("boolean operands required");
 **                                           node.type = Types.boolType;
 **                                           if (eval2) node.value = node2.value; .)
        } .

 **   EqlExp<out Node node, bool eval>     (. Node node2;
                                              int op; .)
 **   = RelExp<out node, eval>
        { EqlOp<out op>
 **       RelExp<out node2, eval>          (. if (node.type != node2.type)
                                                SemError("incomparable operands ");
                                              node.type = Types.boolType;
 **                                          if (eval) switch(op) {          // bypass evaluation?
                                                case opeql:
                                                  node.value = ToInt(node.value == node2.value); break;
                                                case opneq:
                                                  node.value = ToInt(node.value != node2.value); break;
                                                default:
                                                  break;
                                              } .)
        } .