Computer Science 301 - 2005

Extra tutorials for Swot Week (2) - Attributing grammars

(1) The grammar below attempts to describe your hard-working lifestyle over the last few weeks.

    COMPILER Prac $CN /* Describe progress of a compiler practical */

    CHARACTERS
      letter  = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
    TOKENS
      name = letter { letter } .
    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Prac         = Handouts { Task } "Submit" .
      Handouts     = "PracSheet" { "HintSheet" } .
      Task         = "Attempt" { ConsultTutor  "Attempt" }  .
      ConsultTutor = name .
    END Prac.

As you have complained, you sometimes have trouble finding tutors. As you may not know, we are going to have problem paying them, because they have recently told me that they want to be paid a fee for each occasion they are consulted. So what I need now is a list of tutors with a count of such occasions for each one. What must I add to the grammar to produce me such a list?

Typical output might be as follows (it does not have to be in alphabetic order):

       Kevin      12
       Ingrid      5
       Nyasha      8
       Pat        45

The first solution simply makes use of a static list structure, and avoids parameter passing:

    import java.util.*;
    import Library.*;

    COMPILER Prac $CN
    /* Determine how often each tutor has been consulted during a practical
       using a static list
       P.D. Terry, Rhodes University, 2005 */

    static class Tutor {
      public String name;
      public int count;

      public Tutor(String name, int count) {
        this.name = name; this.count = count;
      }

      public void writeLine() {
        IO.write(name, -14);
        IO.writeLine(count);
      }

    } // Tutor

    static ArrayList list = new ArrayList();              // global for simplicity here!

    static int position(String name) {
    // Finds position of entry with search key name in list, or -1 if not found
      int i = list.size() - 1;                            // index of last entry
      while (i >= 0 &&                                    // short-circuit protection
             !name.equals(((Tutor)list.get(i)).name))     // must cast before extracting field
        i--;                                              // will reach -1 if no match
      return i;
    }

    CHARACTERS
      letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
    TOKENS
      name   = letter { letter } .
    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Prac
      = Handouts { Task }
        "Submit"           (. for (int i = 0; i < list.size(); i++)
                                ((Tutor)list.get(i)).writeLine(); .) .

      Handouts
      = "PracSheet" { "HintSheet" } .

      Task
      = "Attempt" { ConsultTutor "Attempt" } .

      ConsultTutor
      = name               (. int pos = position(token.val);
                              if (pos >= 0) ((Tutor)(list.get(pos))).count++;
                              else list.add(new Tutor(token.val, 1)); .) .

    END Prac.

The second solution declares the list structure within the method for the goal symbol, necessitating that this list is then passed to all sub parsers. This is really over kill for a problem of this sort. Since an "object" is being passed we really pass (by value) a reference to the list, which means that the sub parsers have access to the elements of the list and can manipulate them (without altering the reference itself):

    COMPILER Prac1 $CN
    /* Determine how often each tutor has been consulted during a practical
       using paramterized methods
       P.D. Terry, Rhodes University, 2005 */

    static class Tutor {
      ...
    } // Tutor

    static int position(ArrayList list, String name) {
    // Finds position of entry with search key name in list, or -1 if not found
      int i = list.size() - 1;                            // index of last entry
      while (i >= 0 &&                                    // short-circuit protection
             !name.equals(((Tutor)list.get(i)).name))     // must cast before extracting field
        i--;                                              // will reach -1 if no match
      return i;
    }

    CHARACTERS
      letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
    TOKENS
      name   = letter { letter } .
    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Prac1                        (. ArrayList list = new ArrayList(); .)
      = Handouts { Task<list> }
        "Submit"                   (. for (int i = 0; i < list.size(); i++)
                                        ((Tutor)list.get(i)).writeLine(); .) .

      Handouts
      = "PracSheet" { "HintSheet" } .

      Task<ArrayList list>
      = "Attempt" { ConsultTutor<list> "Attempt" } .

      ConsultTutor<ArrayList list>
      = name                      (. int pos = position(list, token.val);
                                     if (pos >= 0) ((Tutor)(list.get(pos))).count++;
                                     else list.add(new Tutor(token.val, 1)); .) .

    END Prac1.

(2) The Cocol grammar below describes a sequence of Regular Expressions (written one to a line) using the conventions discussed during the course.

    COMPILER RE  $CN  /* Regular expression grammar */

    CHARACTERS
      control  = CHR(0) .. CHR(31) .
      noquote1 = ANY - control - "'" .
      noquote2 = ANY - control - '"' .
      meta     = "()*|[]-?+" .
      lf       = CHR(10) .
      simple   = ANY - control - "'" - '"' - meta .
    TOKENS
      atomic  = simple .
      escaped = "'" noquote1 "'" | '"' noquote2 '"' .
      EOL     = lf .

    IGNORE  control - lf

    PRODUCTIONS
      RE         = { Expression EOL } EOF .
      Expression = Term { "|" Term } .
      Term       = Factor { Factor } .
      Factor     = Element [ "*" | "?" | "+" ] .
      Element    = Atom | Range | "(" Expression ")" .
      Range      = "[" OneRange { OneRange } "]" .
      OneRange   = Atom [ "-" Atom ] .
      Atom       = atomic | escaped .
    END RE.

How would you add appropriate actions and attributes so that you could generate a program that would parse a sequence of regular expressions and report on the alphabets used in each one. Input like

            a | b c d | ( x y z )*
            [a-g A-G] [x - z]?
            a? "'" z+

the output should be something like

            Alphabet = a b c d x y z
            Alphabet = A B C D E F G a b c d e f g x y z
            Alphabet = ' a z

The first solution simply makes use of a static array of boolean flags, and avoids parameter passing:

    import Library.*;

    COMPILER RE $CN
    /* Regular expression grammar - using a static array to determine alphabet
       P.D. Terry, Rhodes University, 2005 */

    static boolean[] alphabet = new boolean[256];

    CHARACTERS
      control  = CHR(0) .. CHR(31) .
      noquote1 = ANY - control - "'" .
      noquote2 = ANY - control - '"' .
      meta     = "()*|[]-?+" .
      lf       = CHR(10) .
      simple   = ANY - control - "'" - '"' - meta .

    TOKENS
      atomic  = simple .
      escaped = "'" noquote1 "'" | '"' noquote2 '"' .
      EOL     = lf .

    IGNORE  control - lf

    PRODUCTIONS
      RE                                (. int i; .)
      = {                               (. for (i = 1; i < 256; i++)
                                             alphabet[i] = false; .)
          Expression EOL                (. IO.write("Alphabet = ");
                                           for (i = 1; i < 256; i++)
                                             if (alphabet[i]) IO.write( (char) i + " ");
                                           IO.writeLine(); .)
        } EOF .

      Expression
      = Term { "|" Term } .

      Term
      = Factor { Factor } .

      Factor
      = Element [ "*" | "?" | "+" ] .

      Element                           (. char ch; .)
      =   Atom<out ch>                  (. alphabet[ch] = true; .)
        | Range
        | "(" Expression ")" .

      Range
      = "[" OneRange { OneRange } "]" .

      OneRange                          (. char ch, ch1, ch2; .)
      = Atom<out ch1>                   (. alphabet[ch1] = true; .)
        [ "-" Atom<out ch2>             (. if (ch2 < ch1)
                                             SemError("invalid range");
                                           for (ch = ch1; ch <= ch2; ch++)
                                             alphabet[ch] = true; .)
        ] .

      Atom<out char ch>                 (. ch = 0; .)
      = (   atomic                      (. ch = token.val.charAt(0); .)
          | escaped                     (. ch = token.val.charAt(1); .)
        ) .

    END RE.

The second solution declares the data structure within the method for the goal symbol, necessitating that this list is then passed to all sub parsers. This is really over kill for a problem of this sort. Since an "object" is being passed we really pass (by value) a reference to the array, which means that the sub parsers have access to the elements of the list and can manipulate them (without altering the reference itself):

    import Library.*;

    COMPILER RE1 $CN
    /* Regular expression grammar - using parameters to determine alphabet
       P.D. Terry, Rhodes University, 2005 */

       ...

    PRODUCTIONS
      RE1                               (. boolean[] alphabet = new boolean[256];
                                           int i; .)
      = {                               (. for (i = 1; i < 256; i++)
                                             alphabet[i] = false; .)
          Expression<alphabet> EOL      (. IO.write("Alphabet = ");
                                           for (i = 1; i < 256; i++)
                                             if (alphabet[i]) IO.write( (char) i + " ");
                                           IO.writeLine(); .)
        } EOF .

      Expression<boolean[] alphabet>
      = Term<alphabet> { "|" Term<alphabet> } .

      Term<boolean[] alphabet>
      = Factor<alphabet> { Factor<alphabet> } .

      Factor<boolean[] alphabet>
      = Element<alphabet> [ "*" | "?" | "+" ] .

      Element<boolean[] alphabet>       (. char ch; .)
      =   Atom<out ch>                  (. alphabet[ch] = true; .)
        | Range<alphabet>
        | "(" Expression<alphabet> ")" .

      Range<boolean[] alphabet>
      = "[" OneRange<alphabet> { OneRange<alphabet> } "]" .

      OneRange<boolean[] alphabet>      (. char ch, ch1, ch2; .)
      = Atom<out ch1>                   (. alphabet[ch1] = true; .)
        [ "-" Atom<out ch2>             (. if (ch2 < ch1)
                                             SemError("invalid range");
                                           for (ch = ch1; ch <= ch2; ch++)
                                             alphabet[ch] = true; .)
        ] .

      Atom<out char ch>                 (. ch = 0; .)
      = (   atomic                      (. ch = token.val.charAt(0); .)
          | escaped                     (. ch = token.val.charAt(1); .)
        ) .

    END RE1.


Home  © P.D. Terry