WebClass 0.1

WebClass 0.1
An Intermediary for the Classification of HTML Pages
 

developed by

 

LACAM Dipartimento di Informatica
Universitą degli Studi di Bari
via Orabona, 4
70126 Bari

Java Technology Center
IBM SEMEA Sud
via Tridente, 42/14
70125 Bari

Description

WebClass is a prototypical workbench written in Java for experimenting the application of Statistical and Case-Based Reasoning methods to automatic Web page classification.
 

System Requirements

Platform : Any Java 1.1 (or higher) enabled platform
Java development tool: JDK 1.1
Processor: Intel 486 (or higher)
 

Installation Procedure & Testing

The distribution package webclass.zip contains the following files:

README.html      : this file
webclass.jar          : WebClass jar file
html.jar                  : jar file corresponding to a sub-directory containing sample Web pages.
experiment1.jar     : jar file corresponding to a sub-directory containing Webclass configuration data files
experiment2.jar     : jar file corresponding to a sub-directory containing Webclass configuration data files

To install, make the following steps:

o      Unzip the file webclass.zip into a directory you have chosen

o      Go to this directory

o      Extract webclass.jar with command jar -xfM webclass.jar

o      Extract html.jar with command jar -xfM  html.jar

o      Extract experiment1.jar with command jar -xfM  experiment1.jar

o      Extract experiment2.jar with command jar -xfM  experiment2.jar


 

The system User Manual and other documentation is not yet available. Anyway you can test the system by doing the experiments described hereunder. The system is not "stable" (it is written by our university students and its main aim is to "demonstrate" hopefully good ideas). Therefore, please be patient for slowly running or for eventually system errors.
 

The First Experiment
In the sub-directory
experiment1 there are all WebClass configuration files for running a simple experiment with the Web pages stored in the sub-directory html.  In particular, by copying all the content of the sub-directory experiment1in the main directory and by typing the command java WebClass, the system will be ready for running the experiment.

Note:
In this experiment the system will start considering  4 classes, i.e. Astronomy, Car, Moto, Jazz, as its Knowledge Base. In this configuration there are 5 training pages and 5 testing pages for each class . You can see all the training and testing pages by selecting the menu-item "Categories" from the pop-up menu "Browse" in the WebClass main window. Then, the Categories Management Window will appear showing you all categories currently active in the workbench (you can also add other ones for doing other experiments) and allowing you to browse the training and testing pages for each category (also in this case you can add other pages for doing other experiments).

You can explore the system features by using the experiment 1 knowledge base: for example, you can classify a Web page by selecting the menu-item "Page" from the pop-up menu "Classify". A micro-browser will appear allowing you to load a web page from the directory html (buttons: openfile and reload) and classify it by pushing one of the classification buttons: Classify by Centroids, Classyfy by NN, Classify by k-NN.
You can download the web page to be classified also from Internet/Intranet by writing the http address. Don't forget to configure in the proper way the proxy server or the socks server by selecting the menu-item "Network Configuration" from the pop-up menu "Preferences".

The Second Experiment
In the sub-directory
experiment2 there are all WebClass configuration files for running a simple experiment with the Web pages stored in the sub-directory html.  In particular, by copying all the content of the sub-directory experiment2 in the main directory and by typing the command java WebClass, the system will be ready for running the experiment.
In this second experiment the training set consists of 128 pages and the test set consists of 64 pages.
 

Future work

We started to embed the best results produced by WebClass into WBI plug-ins with the aim of building "intelligent proxy servers".
 

Publications

·       F. Esposito, D. Malerba, L. Di Pace, & P. Leo (2000). A Machine Learning Approach to Web Mining, In E. Lamma & P. Mello (Eds.), AI*IA 99: Advances in Artificial Intelligence, Lecture Notes in Artificial Intelligence, 1321, 439-442, Springer, Berlin, Germany.

·       F. Esposito, D. Malerba, L. Di Pace, & P. Leo (2000). WebClass: An Intermediary for the Classification of HTML Pages, Demo paper for AI*IA '99, Bologna, Italy.

·       G. Convertino, L. di Pace, P. Leo, A. Maffione, D. Malerba & G. Vespucci. Tecniche di Web Mining per supportare l'attivitą di navigazione in rete, Proceedings of AICA '98, 53-74, Naples, Italy.

FAQs

None yet available. Send all requests/comments to: Pietro Leo,  IBM Java Technology Center, Bari (Italy).
 

Last modified 10/01/2000