Numerous approaches to the problem of structure analysis of text corpora have been developed, either trying to impose a hierarchy on a given text collection or to provide some other way of clustering, using both supervised or unsupervised analysis methods [2]. However, most systems primarily provide a method for convenient and `intelligent' document retrieval based on query systems of differing degrees of sophistication with too little emphasis on visualization so far. As a consequence, interactive exploration is usually not supported. One well-known technique for the visualization of high-dimensional data spaces is Sammon's Mapping (SM) [5], aiming to represent the distances between data points in the high-dimensional input space as closely as possible in a 2-dimensional plot. Recent approaches use neural networks to structure large text corpora and to provide an interface for intuitive browsing of these collections. A prominent neural network architecture based on unsupervised learning is the self-organizing map (SOM), which has repeatedly been used to analyze and to visualize text archives, the most prominent example probably being the WEBSOM project [1].
The standard map display to represent the results of SOM training has its limits in that cluster boundaries are difficult to detect. To overcome this problem, we apply a new visualization technique based on an extended learning rule for SOM resulting in an intuitive representation of clusters as groups of nodes in a 2-dimensional output space. The basic idea of this Adaptive Coordinate (AC) approach [4] is to have the nodes of the SOM arrange themselves in a 2-dimensional output space during the training process in such a way as to approximate their geometric relationship in the high-dimensional vector space as faithfully as possible. The resulting visualization of the trained SOM is by its very idea similar to the SM, but stems from the self-organization during the learning process.
In this paper we demonstrate the application of SOM enhanced with the AC visualization technique to the problem of structure visualization of free form text corpora. We further compare the resulting AC visualization both with the standard SOM visualization as well as with the corresponding SM to analyze its capabilities in the fields of text archive exploration.