Nhamo Honours Project:: Home-page

updating frequently - a lot being done!!!!

-----------------------------------------------------------------------------------------------------------------------------------

TODAY: working on writeup (draft) first chapter!!! NEXT PROJECT MILESTONE :-(

-----------------------------------------------------------------------------------------------------------------------------------

19 September 2005

submitted the Short paper...

-----------------------------------------------------------------------------------------------------------------------------------

14 September 2005

Adopted Richard Roiger's techniques in the evaluation of the two algorithms...quite a long procedure but results finally obatained .

suprisingly, O-cluster algorithm performs much better than the K-Means algorithm.

continued with writing the Short paper thus including the results obtained.

the other results included in the short paper involved describing the clusters obtained for the data from the Centre for AIDS Development, Research and Evaluation Institute for Social and Economic Research

-----------------------------------------------------------------------------------------------------------------------------------

12 September 2005

at present 5 models from each algorithm have been built were one from each would be selected as the best by a number of calculations yet to be done.

had a long 45min meeting with my supervisor.

was also analysing variables that will be needed in the APPLY step [took some time hey!!].

-----------------------------------------------------------------------------------------------------------------------------------

10 September 2005

1) build 5 models for each algorithm with each model having different settings on dataset1.

2) load dataset 2 and 3 for the purpose of applying the models.

-----------------------------------------------------------------------------------------------------------------------------------

9 September 2005

now that the database has started, what is the next step?

cut down the size of the dataset that i got from the Centre for AIDS Development, Research and Evaluation Institute for Social and Economic Research into partitions of at most 300 rows but with the 131 attributes maintained. loaded the partitioned datasets, then build models with the K-Means algorithm. If it works then we back in business.

after 3 hours:

the database has successfully started and i have partition the dataset into 300 rows each. have loaded the 1st set, build a test model with k-means algorithm and it works fine.

-----------------------------------------------------------------------------------------------------------------------------------

8 September 2005

for the past week i have been having irritating problems with Oracle10g. the main problem is that the oracle database would not start at all. this generally means that all production has been suspended. this is because the oracle data miner is a kind of a separately installed entity that solely depends on the database functionality in term of storage.

solution:

start up the database, but the question is how?

well the provided way is using oracle10g's enterprise manager which pops up a screen informing one to startup the database as it is shutdown. NB, this does not work because it costantly rejects my credentials hence giving a login exception error, which i should say do no handle very well.

Now the goodnews is, i found another way but using oracle's sql*plus (command prompt for oracle). all i had to do was enter:

sql>STARTUP

but this gave errors with the spfile which was some how corrupted, therefore needed to create a new spfile then increase the sga_target valuue to some larger value. i did this in the foolowing way using these commands:

sql> create pfile from spfile;

So you can get a pfile to edit
You startup with it if you rename the old spfile

At the command prompt, you'd type:sqlplus /nolog
connect / as sysdba
create pfile from spfile;

(And that last command needs no instance running to work).

This will produce you a regular old text-file initXXX.ora in
ORACLE_HOME\databases. Edit that to put your SGA settings back to something
sensible, and then:

sqlplus /nolog
connect / as sysdba
create spfile from pfile;

This over-writes your dodgy spfile with a revised, clean one that matches
the contents of your edited init.ora. Then you can just say 'startup'.

-----------------------------------------------------------------------------------------------------------------------------------

31 August 2005

well things were looking good, with some datasets from the Centre for AIDS Development, Research and Evaluation Institute for Social and Economic Research.

have loaded the dataset into the database for testing purposes. i mean since i recently got everything working just a few days ago, i felt it would be better to test both the database and the data miner if they could do something.

what is the outcome?

1) created a table for the dataset which contains 131 variables and 899 rows of data (major task as time consuming). loaded the dataset using the tradition and very trusted sql*plus loader.

2) building the models:

the O-cluster models build very well with various algorithm settings. i have built 5 test models with this particular algorithm and noted NO problems.

the K-Means algorithm produced several errors mostly related to memory size of the machine(server Athena) which i am currently running the database and data minerfrom. but surprisingly, if i load the database with very small size of dataset the algoritm works perfectly. so as a way of trying to make the algoritm work with this particular large size dataset, i had to play around with some of oracle's settings relating to memory usage. unfortunately this even worsennmy problem

the database would not START.

I'M FRUSTRATED :-(

-----------------------------------------------------------------------------------------------------------------------------------

----------

14 March 2005

submitted project proposal today. In terms of project still in the initial stages were i still need to really understand the background literature and getting a practical feel of data mining, hence still doing tutorials not only for data mining but for the database oracle 9i too as i am not really familiar with the software.

tutorial:

Berger, C., 09/2004, Oracle Data Mining, Know More, Do More, Spend Less - An Oracle White Paper, URL:
<http://www.oracle.com/technology/products/bi/od m/pdf/bwp_db_odm_10gr1_0904.pdf>, Accessed: 14 April 2005

---------------------------------------------------------------------------------------------------------------------------------------------

1 March 2005.

Met John Ebden supervisor: discussed in more detail about what the project is all about and what I am expected to do.

current week's objectives:

• Carry on with background reading mainly to gain a stable understanding of the field and to know the main issues to gain interest.

Achieved this week:

• Read the following papers

read and did data mining tutorial by Emily davis

Initial Readings

Mastering Data Mining: The Art and Science of Customer Relationship Management, Michael J.A.
Berry and Gordon S. Linoff, USA, Wiley Computer
Publishing, 2000

Data mining: concepts and techniques by Jiawei Han and Micheline Kamber, San Francisco,
California, Morgan Kauffmann, 2001.

David Hand, Heikki Mannila and Padhraic Smyth, Principles of data mining, Cambridge
Massachusetts, MIT Press, 2001.

PROGRESS REPORTS

Documents

ROAD TO WRITE-UP