Department of Software Technology
Vienna University of Technology
Text Data Mining
Classification is one of the central issues in any system dealing with
text data.
The need for effective approaches is dramatically increased nowadays due
to the advent of massive digital libraries containing free-form documents.
What we are looking for are powerful methods for the exploration of such
libraries whereby the discovery of similarities between groups of text
documents is the overall goal.
In other words, methods that may be used to gain insight in the inherent
structure of the various items contained in a text archive are needed.
In this paper we demonstrate the applicability of unsupervised neural
networks for the task of text document clustering.
Specifically, we describe the results from using self-organizing maps
for the exploration of document archives.
We further argue in favor of paying more attention to the fact that
text archives lend themselves naturally to a hierarchical structure.
We take advantage of this fact by using a hierarchically organized
network built up from self-organizing maps to represent the contents
of a text archive in order to enable the true establishment of a
document taxonomy.
Up
Comments: rauber@ifs.tuwien.ac.at