Computer Science 3 - 2007

Programming Language Translation


Practical for Week 24, beginning 8 October 2007

Hand in your solutions to this practical before lunch time on your next practical day, correctly packaged in a transparent folder with your cover and individual assessment sheets. Please do NOT come to a practical and spend the first hour printing or completing solutions from the previous week's exercises. Since the practical will have been done on a group basis, please hand in one copy of the cover sheet for each member of the group. These will be returned to you in due course, signed by the marker. Please make it clear whose folder you have used for the electronic submission, for example g03A1234. Please resist the temptation to carve up the practical, with each group member only doing one task. The group experience is best when you work on tasks together.


Objectives:

Ja, well, no, fine. All you folks who thought this was a bit of a jorl so far, or have been thinking that IS projects are an excuse for back-sliding - think again. In this practical you are to

You will need this prac sheet and your text book. As usual, copies of the prac sheet are also available at http://www.cs.ru.ac.za/CSc301/Translators/trans.htm.


Outcomes:

When you have completed this practical you should understand


To hand in:

This week you are required to hand in, besides the cover sheets (one per group member):

I do NOT require listings of any Java code produced by Coco/R.

Keep the prac sheet and your solutions until the end of the semester. Check carefully that your mark has been entered into the Departmental Records.

You are referred to the rules for practical submission which are clearly stated on page 18 of our Departmental Handbook. However, for this course pracs must be posted in the "hand-in" box outside the laboratory and not given to demonstrators.

A rule not stated there, but which should be obvious, is that you are not allowed to hand in another group's or student's work as your own. Attempts to do this will result in (at best) a mark of zero and (at worst) severe disciplinary action and the loss of your DP. You are allowed - even encouraged - to work and study with other students, but if you do this you are asked to acknowledge that you have done so. You are expected to be familiar with the University Policy on Plagiarism, which you can consult at:

        http://www.scifac.ru.ac.za/plag.htm 


Task 1 - creating a working directory and unpacking the prac kit

There are several files that you need, zipped up this week in the file PRAC24.ZIP (Java version) or PRAC24C.ZIP (C# version)


Task 2 - A simple calculator

In the kit you will find CALC.ATG. This is an attributed grammar for a simple four function calculator that can store values in any of 26 memory locations, inspiringly named A through Z.

  import Library.*;
  import java.util.*;

  COMPILER Calc  $CN
  /* Simple four function calculator with 26 memory cells
     P.D. Terry, Rhodes University, 2007 */

    static double[] mem = new double[26];

  CHARACTERS
    digit      = "0123456789" .
    letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" .

  TOKENS
    Number     = digit { digit } [ "." { digit } ] .
    Variable   = letter .

  IGNORE CHR(0) .. CHR(31)

  PRODUCTIONS
    Calc                               (. int index = 0; double value = 0.0;
                                          for (int i = 0; i < 26; i++) mem[i] = 0.0; .)
    = { Variable                       (. index = token.val.charAt(0) - 'A'; .)
        "=" Expression<out value>      (. mem[index] = value;
                                          IO.writeLine(value); .)
      } EOF .

    Expression<out double expVal>      (. double expVal1 = 0.0; .)
    = Term<out expVal>
      {   "+" Term<out expVal1>        (. expVal += expVal1; .)
        | "-" Term<out expVal1>        (. expVal -= expVal1; .)
      } .

    Term<out double termVal>           (. double termVal1 = 0.0; .)
    = Factor<out termVal>
      {   "*" Factor<out termVal1>     (. termVal *= termVal1; .)
        | "/" Factor<out termVal1>     (. termVal /= termVal1; .)
      } .

    Factor<out double factVal>         (. factVal = 0.0; .)
    =   Number                         (. try {
                                            factVal = Double.parseDouble(token.val);
                                          } catch (NumberFormatException e) {
                                            factVal = 0; SemError("number out of range");
                                          } .)
      | Variable                       (. int index = token.val.charAt(0) - 'A';
                                          factVal = mem[index]; .)
      | "(" Expression<out factVal> ")"
      .

  END Calc.

Start off by studying this grammar carefully, and then making and executing the program.

Use Coco/R to generate and then compile source for a complete calculator. You do this most simply by

cmake CALC

A command like

crun Calc calc.txt

will run the program Calc and try to parse the file calc.txt, sending error messages to the screen. Giving the command in the form

crun Calc calc.bad -L

will send an error listing to the file listing.txt, which might be more convenient. Try this out.


Task 3 - A better calculator

Make the following extensions to the system:

Test your system out thoroughly - give it both correct and incorrect data.


Task 4 - Internet, anyone?

(This task is based on a problem set in an examination some years back. In the examination, candidates were given the basic grammar and asked to suggest the actions that had to be added.)

As you should be aware, IP addresses as used in Internet communication are typically expressed in "dotted quad" form, as exemplified by 146.231.128.6. The IP address actually represents a 32 bit integer; the four numbers in the quadruple corresponding to successive 8 bit components of this integer. For humans, machine addressing is usually more memorable when expressed in "DNS" format, as exemplified by terrapin.ru.ac.za. Some systems maintain tables of matching addresses, for example

         146.231.122.131   bungee.ru.ac.za       #comments appear like this
         146.231.128.6     terrapin.ru.ac.za
         146.231.56.10     thistle-sp.ru.ac.za
         147.28.0.62       psg.com

When we moved our CS and IS departments to new premises in Hamilton on the famous 9/11 in 2001, a decision was made to rename and uniquely renumber all the many machines in our possession. Our system administrators tried to draw up a table like the one above, which was then merged with the existing table in the IT division. Unfortunately, a few mistakes were made, which caused havoc until they were ironed out. For example, there were lines reading

         146.231.122.1123  pdt1.ict.ru.ac.za     #invalid IP address
         146.231.122.156   pdt2.ict.ru.ac.za
         146.231.122.156   pdt3.ict.ru.ac.za     #non-unique IP address

The ATG files below show how Coco/R might be used to develop a system that would enable a file in this format to be checked quickly, and the errors identified. One shows how parameters may be passed between parsing methods, and the other how static variables in the parser class may be accessed from several methods:

    import Library.*;
    import java.util.*;

    COMPILER Check3 $CN
    // Put your names and a description here
    // This version uses parameters

    IGNORECASE

    CHARACTERS
      digit   = "0123456789" .
      letter  = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
      eol     = CHR(10) .

    TOKENS
      number = digit { digit } .
      name   = letter { letter | digit | "." | "-" } .

    COMMENTS FROM "#" TO eol

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Check3                        (. ArrayList<Integer> IPList = new ArrayList<Integer> (); .)
      =
      { Entry<IPList> }
      EOF                           (. if (Successful()) IO.writeLine("all correct"); .)
      .

      Entry<. ArrayList<Integer> IPL .>
      = IPNumber<IPL> name .

      IPNumber<. ArrayList<Integer> IPL .>
      =                             (. int n, m; .)
      Number<out n>
      "." Number<out m>             (. n = 256 * n + m; .)
      "." Number<out m>             (. n = 256 * n + m; .)
      "." Number<out m>             (. n = 256 * n + m;
                                       if (IPL.contains(n)) SemError("duplicate IP number");
                                       else IPL.add(n); .)
      .

      Number<out int n>
      =  number                     (. try {
                                         n = Integer.parseInt(token.val);
                                       } catch (NumberFormatException e) {
                                         n = 256;
                                       }
                                       if (n > 255) SemError("number too large"); .)
      .

    END Check3.

----------------------------------------------------------------------------

    import Library.*;
    import java.util.*;

    COMPILER Check4 $CN
    // Put your names and a description here
    // This version uses a static Arraylist, but no parameters

    static ArrayList <Integer> IPList = new ArrayList <Integer> ();

    IGNORECASE  // as before
    CHARACTERS  // as before
    TOKENS      // as before
    COMMENTS    // as before
    IGNORE      // as before
    PRODUCTIONS
      Check4
      =
      { Entry }
      EOF                           (. if (Successful()) IO.writeLine("all correct"); .)
      .

      Entry
      = IPNumber name .

      IPNumber
      =                             (. int n, m; .)
      Number<out n>
      "." Number<out m>             (. n = 256 * n + m; .)
      "." Number<out m>             (. n = 256 * n + m; .)
      "." Number<out m>             (. n = 256 * n + m;
                                       if (IPList.contains(n)) SemError("duplicate IP number");
                                       else IPList.add(n); .)
      .


      Number<out int n>
      ... as before

    END Check4.

Start off by studying these grammars carefully, and then making and executing the program.

As before, you should be able to use Coco/R to generate and then compile source for an application like this, as exemplified by

cmake Check3 or cmake Check4

A command like

crun Check3 ipdata.txt

will run the program Check3 and try to parse the file ipdata.txt, sending error messages to the screen. Giving the command in the form

crun Check4 ipdata.txt -L

will send an error listing to the file listing.txt, which might be more convenient. Try this out.


Task 5 Extending the application

As you may know, multiple names may be given to a host (associated with one IP number) on the Internet. On the basis that we might wish to do this for the computers in our building, but still only assign each name once, extend the application so that it might react sensibly to data of a form exemplified below.

          146.231.122.131   bungee.ru.ac.za       #comments appear like this
                            humfac.ru.ac.za       #alternative name
                            www.humfac.ru.ac.za   #and yet another name
          146.231.122.75    bungee.ru.ac.za       #invalid - name already used
          146.231.128.6     terrapin.ru.ac.za
          146.231.56.10     thistle-sp.ru.ac.za
          147.28.0.62       psg.com
          147.28.0.67       psg.com               #invalid - name already used
          146.231.122.1123  pdt1.ict.ru.ac.za     #invalid IP address
          146.231.122.156   pdt2.ict.ru.ac.za
          146.231.122.156   pdt3.ict.ru.ac.za     # non-unique IP address

Hint: This is not hard to do. Use the ArrayList class to create two different lists.


Task 5 Eliminating metabrackets

During the translators course we have made frequent use of the Cocol/EBNF notation for expressing production rules, and in particular the use of the {} and [] meta-brackets introduced by Wirth in his 1977 paper. An example of a system of productions using this notation is as follows:

    Home      = Family { Pets } [ Vehicle ] "house" .
    Pets      = "dog" [ "cat" ] | "cat" .
    Vehicle   = ( "scooter" | "bicycle" ) "fourbyfour" .
    Family    = Parents { Children } .
    Parents   = "Dad" "Mom" | "Mom" "Dad" .
    Parents   = "Dad" | "Mom" .
    Child     = "Margaret" | "Janet" | "Anne" | "Ntombizodwa" | "Ntombizanele" .

In analysing such productions and/or testing the grammar it has often been suggested that:

(a) they be rewritten without using the Wirth brackets, but using right recursive productions and an explicit e (in Cocol this is just omitted), as for example

    Home      = Family AllPets Transport "house" .
    AllPets   = Pets AllPets |  .
    Transport = Vehicle |  .
    Pets      = "dog" PussyCat | "cat" .
    PussyCat  = "cat" | .
    Vehicle   = ( "scooter" | "bicycle" ) "fourbyfour" .
    Family    = Parents Offspring .
    Offspring = Offspring Children | .
    Parents   = "Dad" "Mom" | "Mom" "Dad" .
    Parents   = "Dad" | "Mom" .
    Child     = "Margaret" | "Janet" | "Anne" | "Ntombizodwa" | "Ntombizanele" .

(b) a check be made to see that all the non-terminals have been defined (this does not occur in the above case - Children is undefined);

(c) a check be made to see that each non-terminal is defined only once (this does not occur in the above case - there are two rules for Parents);

(d) a check be made to see that each non-terminal (other than the goal) must appear on the right side of at least one production (this does not occur in the above case; Child defines a non-teminal which does not appear in any other production).

As a useful service to others who might take this course in future years (and hopeful that, although you might encourage others to do so, you are not forced to do so yourself!), develop a system using Coco/R that will carry out the above transformations and checks.

Here are some suggestions for deriving a complete system:

(a) In the kit you will find the skeleton of a grammar file EBNF.ATG with suitable definitions of character sets and tokens similar to those you have seen before:

    COMPILER EBNF $CN
    /* Parse a set of EBNF productions
       P.D. Terry, Rhodes University, 2007 */

    CHARACTERS
      letter   = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
      lowline  = "_" .
      digit    = "0123456789" .
      noquote1 = ANY - "'" .
      noquote2 = ANY - '"' .

    TOKENS
      nonterminal = letter { letter | lowline | digit } .
      terminal    = "'" noquote1 { noquote1 } "'" | '"' noquote2 { noquote2 } '"' .

    COMMENTS FROM "(*" TO "*)"  NESTED

    IGNORE  CHR(9) .. CHR(13)

    PRODUCTIONS
       EBNF       = { Production } EOF .
       Production = nonterminal "=" Expression "." .
       Expression = Term { "|" Term } .
       Term       = [ Factor { Factor } ] .
       Factor     =   nonterminal
                    | terminal
                    | "[" Expression "]"
                    | "(" Expression ")"
                    | "{" Expression "}" .
    END EBNF.

(b) It should be obvious that you will need to set up a "symbol table". In the kit is supplied the skeleton of table handler in the file EBNF\Table.java.

(c) To help you on your way, the first part of the attribution of the grammar might take the form:

    COMPILER EBNF $CN
    /* Do remember your name!
       And a comment about what this is suppoded to do */


    CHARACTERS
      ...
    TOKENS
      ...
    COMMENTS FROM "(*" TO "*)"  NESTED

    IGNORE  CHR(9) .. CHR(13)

    PRODUCTIONS
      EBNF
      = { Production }
        EOF                                 (. if (Successful()) {
                                                 Table.listProductions();
                                                 Table.testProductions();
                                               }
                                            .)
        .

      Production
      = SYNC nonterminal                    (. String name = token.val;
                                               int i = Table.add(name, Table.LHS);
                                               Table.addText(name, i);
                                               Table.addText("= ", i);
                                            .)
        WEAK "=" Expression<name, i>
        SYNC "."                            (. Table.addText(".\n", i); .)

    /* obviously a lot more needed here */

    END EBNF.

(d) Inventing additional non-terminal names (without clashing with others that might already exist) might be done with the aim of deriving, for example

        Home = Family HomeSeq1 HomeOpt2 "house" .

(e) In the prac kit will be found some other example data and production sets to assist in your development of the system, and an executable derived from a model solution.

(f) After converting a set of productions from EBNF to BNF, try converting the BNF productions once again to check that your system works consistently.

    class Entry {
      public String name;
      public ArrayList<String> text;
                                        // You may need other fields
      public Entry(String name) {       // constructor
        this.name = name;
        this.text = new ArrayList<String>();
      }
    } // Entry

    class Table {
                                        // You may need other static variables and methods
      public static final int
        LHS = 1,
        RHS = 2,
        BOTH = 3;

      static ArrayList<Entry> list = new ArrayList<Entry>();

      public static int add(String name, int where) {
      // Search for, and possibly add a nonterminal "name" to the table, according as to "where"
      // in a production it was found, and return the position as the
      // index where the name was either added previously or added by this call

      public static void addText(String str, int i) {
      // Add str to the list of strings for the rule in the table at position i

      public static void listProductions() {
      // List all productions to an output file

      public static void testProductions() {
      // Check that all non terminals have appeared once on the left side of
      // each production, and at least once on the right side of each production

    } // Table



Home  © P.D. Terry