Test for week

Computer Science 301 - 2014 - Test 6 - Attributed Grammars - Solutions

You may use your textbook, or refer to material on the website if you like, but not your solutions to prac 6, Keep your solutions as simple as possible! Answer on the question sheet (4 sides)

1. I have really lost it this week. I cannot even remember the question! But what is the answer?

Someone suggested 42 - Thank you, Douglas Adams. Fortunately my amnesia has gone away - Pat (Terry) 63T0844 of course.

2. Deja vu all over again. Remember those awful trains for which you struggled to find an LL(1) grammar that imposed safety constraints (no fuel truck just behind a loco, or immediately in front of a coach)? Well, here is a much simpler grammar that does not impose safety syntactically - but with appropriate semantic actions the checking is easily performed.

    COMPILER Train $CN /* Grammar to describe a simple railway train */

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Train     = LocoPart [ [ GoodsPart ] HumanPart ] SYNC "." .
      LocoPart  = "loco" { "loco" } .
      GoodsPart = Truck { Truck } .
      HumanPart = "brake" | "coach" { "coach" } .
      Truck     = ( "closed" | "open" | "fuel" ) .
    END Train.

There were several intriguing submissions. A fairly popular hack was to make use of the (little appreciated) lookahead token, which some of you had clearly noted when we touched on it in class (see page 137):

    COMPILER Trains $CN /* Grammar to describe a simple railway train - C# version */

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Trains    = { Train } .
      Train     = LocoPart [ [ GoodsPart ] HumanPart ] "." .
      LocoPart  = "loco" { "loco" }             (. if (la.val.Equals("fuel"))
                                                    SemError("Fuel may not follow Loco directly"); .) .
      GoodsPart = Truck { Truck } .
      HumanPart = "brake" | "coach" { "coach" } .
      Truck     = ( "closed" | "open" | "fuel"  (. if (la.val.Equals("coach"))
                                                     SemError("Coach may not follow Fuel directly"); .) ) .
    END Trains.

However, if you try this you will find that the error messages may appear to come too early. Rather than use la.val one can make use of the more familiar token.val if the checks are placed in the grammar ahead of the critical tokens (this demonstrates a rather useful technique which is worth remembering).

    PRODUCTIONS
      Trains    = { Train } .
      Train     = LocoPart [ [ GoodsPart ] HumanPart ] "." .
      LocoPart  = "loco" { "loco" } .
      GoodsPart = Truck { Truck } .
      HumanPart =   "brake"
                  |                             (. if (token.val.Equals("fuel"))
                                                   SemError("Coach may not follow Fuel directly"); .)
                    "coach" { "coach" } .
      Truck     =   "closed" | "open"
                  |                             (. if (token.val.Equals("loco"))
                                                   SemError("Fuel may not follow Loco directly"); .)
                    "fuel" .
    END Trains.

What I thought you might try was more like this: if you are a fuel truck or a coach, take a backward glance rather than a forward glance at the rest of the train as used above. We can do this by using a string variable last to remember the previous vehicle in the train:

    COMPILER Trains $CN /* Grammar to describe a simple railway train - C# version */

      static string last = "";

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Trains    = { Train } .
      Train     = LocoPart [ [ GoodsPart ] HumanPart ] "." .
      LocoPart  = "loco" { "loco" }            (. last = token.val; .)  .
      GoodsPart = Truck { Truck } .
      HumanPart
      =   "brake"
        | "coach"                              (. if (last.Equals("fuel"))
                                                    SemError("Coach may not follow Fuel directly"); .)
          { "coach" } .
      Truck
      = (   "closed" | "open"
          | "fuel"                             (. if (last.Equals("loco"))
                                                    SemError("Fuel may not follow Loco directly"); .)
        )                                      (. last = token.val; .) .
    END Trains.

Yet another way is to resist the temptation to probe token.val for particular strings, but to use two bool variables to keep track of the errors, if any. This seems to parse the train in a more abstract way:

    COMPILER Trains $CN /* Grammar to describe a simple railway train - C# version */

      static bool lastWasLoco, lastWasFuel;

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Trains = { Train } .
      Train = LocoPart [ [ GoodsPart ] HumanPart ] "." .
      LocoPart = "loco" { "loco" }             (. lastWasLoco = true; .) .
      GoodsPart = Truck { Truck } .
      HumanPart =   "brake"
                  | "coach"                    (. if (lastWasFuel)
                                                    SemError("Coach may not follow Fuel directly"); .)
                    { "coach" } .
      Truck =  (  "closed" | "open" )          (. lastWasFuel = false; lastWasLoco = false; .)
                | "fuel"                       (. if (lastWasLoco)
                                                    SemError("Fuel may not follow Loco directly");
                                                  lastWasFuel = true; lastWasLoco = false; .) .
    END Trains.

What several people suggested was to collapse the train into one long string, from which all the spaces had been elided - and then to look for the substrings locofuel or fuelcoach. I guess this solves the problem, but now (a) the error messages can only come right at the end - and because Coco generated parsers suppress multiple error messages at one point to try to help with the spurious error message problem, you might not get the full sorry story. and (b) you are not really doing a parser-like attack on the problem!

    COMPILER Trains $CN /* Grammar to describe a simple railway train - C# version */

      static StringBuilder trainSB;

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Trains
      = {                                        (. trainSB = new StringBuilder(); .)
          Train                                  (. string str = trainSB.ToString();
                                                    if (str.IndexOf("locofuel") >= 0)
                                                      SemError("Fuel may not follow Loco directly");
                                                    if (str.IndexOf("fuelcoach") >= 0)
                                                      SemError("Coach may not follow Fuel directly"); .)
        } .
      Train     = LocoPart [ [ GoodsPart ] HumanPart ] "." .
      LocoPart  = "loco" { "loco" }              (. trainSB.Append(token.val); .) .
      GoodsPart = Truck { Truck } .
      HumanPart = "brake" | "coach" { "coach" }  (. trainSB.Append(token.val); .) .
      Truck     = ( "closed" | "open" | "fuel" ) (. trainSB.Append(token.val); .) .
    END Trains.

You could write an equivalent program in a very few lines simply by reading all the characters and copying the non-spaces into a StringBuilder and searching that - try it if you don't believe me!

As a general rule good compiler writers will aim to get error messages related to points very close to where the error has been committed. Nothing is worse than those infuriating messages at the end of a long file saying "class is not terminated properly" or, simply "syntax error" or "some variables undeclared".

3. And deja vu once more! By now you are well acquainted with my family, having seen a grammar to describe families. Slightly simplified (not bothering about possessions or funny names with hyphens or quotes) it looked like this.

    COMPILER Family $CN
    /* Describe a family */

    CHARACTERS
      uLetter    = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" .
      lLetter    = "abcdefghijklmnopqrstuvwxyz" .

    TOKENS
      name       = uLetter { lLetter } .

    IGNORE CHR(0) .. CHR(31) .


    PRODUCTIONS
      Family      = { Generation } Surname { Generation } "." .
      Surname     = "Surname" ":" name { name } .
      Generation  = ( "Parents" | "Grandparents" | "Children" | "Grandchildren" ) ":" NameList .
      NameList    = OnePerson { "," OnePerson } .
      OnePerson   = name { name } [ "(" "deceased" ")" ] { Description } [ Spouse ] .
      Spouse      = "=" name { name } .
      Description = "[" Relative "of" OnePerson "]" .
      Relative    =   "son"  | "daughter" | "mother"  | "father" | "sister"
                    | "brother" | "wife" | "husband"  | "partner" | "mistress" .
    END Family.

As you probably know, families tend to perpetuate first names (politically correctly nowadays called given names) - my father was William David, my grandfather was William, and so was his father. My second name is David and my son is Kenneth David.

How would you add actions to this grammar to (a) record the given names of the members and the number of times that given name had appeared anywhere in the description, printing this information at the end of parsing it (b) check that each Generation has appeared at most once.

Once again - keep it simple. There is no need to pass parameters here either - static fields will work well enough.

This is rather easier than it may at first appear, and I was delighted to see that many students had made a good attack on the problem. Here is one solution, which makes use of a very simple table of GivenName objects and a simple add or update method. For a change all these solutions have been expressed in C#, but very similar code would follow if you translated them to Java. Notice that we do not apply the actions to the components of the Surname section - the problem speciically asked for first/given names. One can also analyse the number of times a particular generation appears by setting up a list of them, as some students tried, but that is really rather overkill for this problem.

A variation on this exercise would be to ensure that all of the generations appeared exactly once. How would you do that?

    using Library;
    using System.Collections.Generic;

    COMPILER Family $CN
    /* Describe a family - C# version */

    class GivenName {
    // Objects of this class record a first name and the number of times it
    // has been given to a family member

      public string name;
      public int occurrence;

      public override string ToString () { // neat way of doing it - thanks Jessica!
        return occurrence + " occurrences of " + name;
      } // ToString

      public GivenName(string name) {
        this.name = name;
        this.occurrence = 1;
      } // constructor

    } // GivenName class

    static List<GivenName> names = new List<GivenName>();  // global for simplicity here!

    public static void AddOrUpdateName(string name) {
    // Adds name to list of unique names if not already there; otherwise note one more
    // occurrence of a name previously recorded
      int look = 0;
      while (look < names.Count && !names[look].name.Equals(name)) look++;
      if (look < names.Count) names[look].occurrence++;
      else names.Add(new GivenName(name));
    } // AddOrUpdateName


    static bool
      parents       = false,
      grandparents  = false,
      children      = false,
      grandchildren = false;

    CHARACTERS
      uLetter    = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" .
      lLetter    = "abcdefghijklmnopqrstuvwxyz" .


    TOKENS
      name       = uLetter { lLetter } .

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Family
      = { Generation }
        Surname
        { Generation } "."           (. foreach (GivenName name in names)
                                          IO.WriteLine(name); .) .


      Surname
      = "Surname" ":" name { name } .

      Generation
      = (  "Parents"                 (. if (parents)
                                          SemError("only one parents generation allowed");
                                        parents = true; .)
         | "Grandparents"            (. if (grandparents)
                                          SemError("only one grandparents generation allowed");
                                        grandparents = true; .)
         | "Children"                (. if (children)
                                          SemError("only one children generation allowed");
                                        children = true; .)
         | "Grandchildren"           (. if (grandchildren)
                                          SemError("only one grandchildren generation allowed");
                                        grandchildren = true; .)
        ) ":" NameList .

      NameList
      = OnePerson { "," OnePerson } .

      OnePerson
      =   name                       (. AddOrUpdateName(token.val); .)
        { name                       (. AddOrUpdateName(token.val); .)
        }
        [ "(" "deceased" ")" ]
        { Description }
        [ Spouse ] .

      Spouse
      = "="
          name                       (. AddOrUpdateName(token.val); .)
        { name                       (. AddOrUpdateName(token.val); .)
        }
      .

      Description
      = "[" Relative "of" OnePerson "]" .

      Relative
      =    "son"  | "daughter" | "mother" | "father" | "sister"
        | "brother" | "wife" | "husband"  | "partner" | "mistress" .

    END Family.

4. Can you think of a simple way of telling a newcomer to Coco/R under what conditions it will be necessary to add parameters to methods, and when global (static) fields in the parser would suffice?

What I was looking for here was the insight that suggests that non-recursive grammars (or, sometimes, recursive ones where the methods require no specific actions) can often be handled with a static parser class and static fields for transmitting data between sub-parsers. These are probably the exception in fact - the sorts of grammars used for defining programming languages are highly recursive, and the information required is most safely passed up and down the implicit tree of method calls using parameters.

You probably realize that several of the grammars I have used in some of these pracs are really rather silly, and describe solutions to problems that really don't warrant constructing a full LL(1) parserr at all. However, I hope that they collectively give you some idea of the techniques which are then better exploited in grammars for Parva (or any other language).

The sources for the grammars in these solutions have been added to the solution source kits for this week's practical.