Department of Software Technology
Vienna University of Technology
The SOMLib Digital Library - Architecture
Overview
The SOMLib digital library system consits of a series of modules.
The following sections provide an overview of the architecture of the SOMLib system, giving a somewhat more detailed presentation of the single components.
Furthermore, we provide intermediate results demonstrating the functionality of each of the modules.
More detailed information on the various components can be found in the papers listed in each section.
Architecture
- Text Representation:
The various texts to be included in the library need to be transformed into a numerical representation. This is achieved by full-text indexing all documents and transforming them into a suitable, weighted representation.
This section provides an overview of this transformation procedure and presents some examples for the text archives used in experiments with the SOMLib system.
- Self-Organizing Map:
The self-organizing map (SOM or Kohonen-map) forms the core of the SOMLib digital library system.
It is a popular self-organizing neural network providing a topology-preserving mapping from high-dimensional feature spaces onto a usually two-dimensional output space.
Within the Somlib system it is used to organize the documents by topic similar to their organization in conventional, manually sorted libraries.
This section provides an introduction into the architecture and training procedure of the self-organizing map and related architectures, such as the growing hierarchical self-organizing map (GHSOM).
It furthermore presents some interactively explorable examples of maps of document archives.
- Integration of Distributed Libraries:
Digital libraries usually do not exist as single collections in one central location.
Rather, they are distributed acroess several locations or, as in the case of annual journal or conference proceedings collections, are issued as successive single collections in certain intervals.
Instead of having to move all material that is to be included in a digital library into a single location for local processing, we want to integarte these distributed colections at a higher level.
This section describes the integration of various SOMLib document repositories into higher-level collections based on the library representation of each collection rather than on the very documents in those collections.
Higher-level SOMLib maps are created based on (parts of) the individual maps, allowing the creation of personal libraries, topic-specific collections or general, larger document repositories.
- Labeling the Library:
While the self-organizing map organizes documents by topic, it does not make the content of the documents in the various areas of the map explicit.
In order to allow the user to get an instant overview of the contents of a library, what we want is to automatically label the various sections of the SOMLib library map with keywords describing the contents of the documents in the respective areas.
Contrary to most current approaches, which rely on manual interpretation of the map or on some a-priori knowledge on the data to assign labels to the map, this section presents the LabelSOM approach to automatically label a self-organizing map based on the features learned during the training process.
- Library Visualization:
In spite of sophisticated inforation retrieval techniques, most digital libraries present themselves with query-based interfaces, returning long lists of search results.
The documents returned by a query are described by additional metadata, giving details such as title, date of creation, document size, author etc.
However, this representation is neither intuitive nore easy to use, preventing the user from getting an overview of the library or the search results.
With the libViewer we present a metaphor-graphics based interface to a digital library system.
The various metadata attributes are mapped onto graphical representations allowing a user to inutitively understand the presented documents.
This section presents some details on the libViewer interface as well as a java-based online prototype.
Publications
- The SOMLib Digital Library System.
A.~Rauber and D.~Merkl
Proceedings of the 3rd Europ. Conf. on Research and Advanced
Technology for Digital Libraries (ECDL'99),
Paris, France, September 22. - 24. 1999,
Lecture Notes in Computer Science (LNCS 1696), Springer, 1999.
A somewhat more detailed description of the various components of the SOMLib Digital Library System with examples providing a good overview of the system.
HTML,
gnu-zipped Postscript,
gnu-zipped PDF,
- SOMLib: A Digital Library System Based on Neural Networks,
A.~Rauber and D.~Merkl
Proceedings of the 4th ACM Conference on Digital Libraries (DL'99),
Berkeley, CA, August 11 - 14, 1999
A short description of the SOMLib digital library project, presenting an overview of the various components in the SOMLib Digital Library System and their relationship.
HTML,
gnu-zipped postscript,
gnu-zipped PDF,
Up to the SOMLib Digital Library Homepage
Comments: rauber@ifs.tuwien.ac.at