Department of Software Technology
Vienna University of Technology
Exploration of Text Collections with Hierarchical Feature Maps
Document classification is one of the central issues in
information retrieval research.
The aim is to uncover similarities between text documents.
In other words, classification techniques are used to gain insight in the
structure of the various data items contained in the text archive.
In this paper we show the results from using a hierarchy of self-organizing
maps to perform the text classification task.
Each of the individual self-organizing maps is trained independently
and gets specialized to a subset of the input data.
As a consequence, the choice of this particular artificial neural network
model enables the true establishment of a document taxonomy.
The benefit of this approach is a straightforward representation of
document similarities combined with dramatically reduced training time.
In particular, the hierarchical representation of document collections
is appealing because it is the underlying organizational principle in
use by librarians providing the necessary familiarity for the user.
The massive reduction in the time needed to train the artificial neural
network together with its highly accurate clustering results makes it
a challenging alternative to conventional approaches.
Up
Comments: rauber@ifs.tuwien.ac.at