Department of Software Technology
Vienna University of Technology
Data Mining in Large Free Text Document Archives
Document classification may be regarded as one of the central issues in
information retrieval research during the last decades.
The challenge of classification is to uncover the similarities between groups
of data in order to improve the retrieval effectiveness of the overall system.
From an exploratory data analysis point of view the same process of
classification may be used to gain insight in the structure of the various
data items and may thus be referred to as data mining in text archives.
In this paper we show the results from applying a neural network model,
the hierarchical feature map, to such a data mining task.
The neural network is carefully designed to impose a hierarchical structure
on the underlying document collection which leads to straight-forward
representation of data similarities.
Apart from the benefit for text data mining, we are able to demonstrate that
the hierarchical feature map leads to a tremendous speed-up of the training
process as compared to more traditional neural network architectures that are
already known to be effective in text classification tasks.
It is this time-consuming training-process that is commonly regarded as a
major obstacle of real-world large-scale neural network application.
Hence, hierarchical feature maps point the way towards an effective usage
of neural network technology in realistic applications and thus, represent a
powerful alternative to traditional methods for text classification.
Up
Comments: rauber@ifs.tuwien.ac.at