1. Data
The Time Magazine article collection consists of 420 articles from the TIME Magazine from the 1960's, or a total of 1.550 KB of text.
2. Text Representation
Parsing these files results in a pruned template vector of about 6000 words, depending on the sophistication of the word stemming and the degree of pruning of the full template vector.
3. Trained Self-Organizing Maps
A 10 x 15 SOM is trained to cluster the various news articles by topic on the map. The clustering can be veryfied by reading the news articles located on identical or neighboring units in the the map provided below.
For example, all articles mapped onto the units in the lower left corner of
the map deal with problems in South Vietnam, with some units representing articles on the Vietnam War and other units covering the
government crackdown on buddhist monks. As another example, consider articles T024, T096, T242, T461, which are located onto
one single unit in the first row of the map, and which all deal with the relationship between India and Pakistan and the Kashmir
conflict.
4. Integration of Distributed Self-Organizing Maps
For a second set of experiments, the Time Magazine Article collection was split into 6 independent subsets of articles to simulate the subsequent release of various editions.
Each subset was then parsed separately and used to train a single map.
We again find a similar topical organization of the various documents on the individual maps. Let us, for example, consider the first of the 6 SOMs trained with the subsets of the TIME Magazine articles.
For example, on unit (0/0) we find article T042 entitled The View from Lenin Hills dealing with a discussion between
Nikita Khrushchev and Soviet artists at the Lenin Hill Reception Palace, next to article T018 - Who's in Charge Here? about the
failure of Khrushchev's virgin land plan for agriculture on unit (1/0) or T032 - Party Time on unit (0/1) on the New Year's Eve
party at the Kremlin. On the opposite corner of the map on unit (5/9) we find documents dealing with the problems of the
reintegration of Kolwezi into the Congo discussed at a meeting between officials in article T065 - Tea and Harmony, next to three
articles on unit (4/9) (T021, T048, T058 entitled The India-Rubber Man; Round 3; and Tshombe's Twilight), providing more
detailed information on the background of the Congo troubles. Other groups of documents found on this map deal, for example, with
the war in Vietnam, the relation between India, Pakistan and China etc. We leave it up to the reader to explore the other topical
sections found in this and the remaining library maps.
While additional attributes such as resort information could be hand-crafted, we refrained from doing so, as assignments would be more or less arbitrarily and hard to justify apart from the topical classification performed by the SOM. Furthermore, this set of attributes suffices to create a rather nice-looking and intuitively interpretable representation of the TIME Magazine article collection. Based on the limited number of attributes a mapping was designed to provide a graphical representation of the articles.
The textual metaphors title, author and description are assigned as usual, allowing the user to readily compare the libViewer representation with the previous representation of the TIME Magazine article collection SOM's by comparing the document numbers. The size of the articles is again mapped onto the spine width to make articles of different length easily distinguishable. The same default mapping is performed for the well-thumbed attribute. The artificially created date attribute is mapped onto both the position within each shelf as well as onto the dust level, with older articles being pushed to the back of the shelf and having more dust settled on them. The region attribute extracted from the articles is used to set a country flag logo on the spine, or, as for regions, a textual logo describing the region, such as mid. East for the middle East or SE Asia for South East Asia. The assignment of articles to the corresponding shelves obviously derives from the SOMLib classification as presented in the preceding chapters with the labels determined by the LabelSOM method being depicted as shelf labels. To allow convenient comparison we furthermore added the shelf location to the set of labels, again starting with shelf id (0/0) in the upper left corner, making the shelf numbering identical to the numbering used for identifying the units in the SOM so far. Last, but not least, we colored the documents according to their cluster membership. Based on the cluster identification presented with the labeling we assigned all articles that are part of the same cluster an identical color, where the actual assignment of colors to clusters was chosen arbitrarily.
As can easily be seen from this discussion, different mappings are possible and might prove even better suited for a given text collection or for a specific type of usage. We will thus use the mapping defined above for the initial experiments presentation of the TIME Magazine article collection, followed by an analysis of how changing specific mappings influences both the resulting visualization as well as the information gained with differing representations.
The figures depict the lower part of the TIME Magazine article collection SOM from the preceding chapters using the libViewer representation metaphors. Using this representation we obtain a good overview of the various topical sections in the library, as clusters of documents on identical topics are assigned the same color. For example, we find the documents on the war in Vietnam, located in the lower left corner of the library, to be colored yellow, and we can immediately see the amount of library space they occupy, as well as the fact that they are located next to a cluster of green documents on Africa. The fact, that these documents cover the war in Vietnam, or African matters respectively, can be told from the labels ( south, viet, saigon etc.) on the bookshelves in that area.
If we move further to the right we find the cluster of documents on Vietnam and Africa to continue up to shelves in columns 4 and 6 respectively, where a small cluster of pink documents on Tunisia and Algeria indicates the shift to the section of blue documents covering Middle-East topics to the right. Please note that the actual color used does not indicate any special meaning other than that documents having the same color cover the same topic. This allows the user to decide whether a whole area of the bookshelf is of any interest to her of him after having scanned the labels of one of the shelves or after having taken a look at one of the documents in the specific section. A somewhat different approach was chosen for the very small clusters of topics, each of which inhabit only one single shelf. Coloring these documents following the same principle would result in an overload in colors. We thus decided to color all documents, that are not part of any larger cluster than their own single shelf, grey. Although this initially raises the impression of one large section of coherent grey documents, this metaphor turns out to be learned easily, posing no major problems to users, as they merely took it for a kind of section on ``other topics''.
Still further to the right we reach the lower right corner of the SOMLib library map, where the rest of the section of Middle-East documents is located, next to the red documents on the Profumo-Keeler scandal and British politics in general.
A few additional pieces of information can be noticed with the distant representation of the TIME Magazine article collection. One is the differing position of documents within the library, which is used as an indicator of their age, with older documents being pushed further to the back. For example, we can easily see, that on shelf (0/13) in the lower left corner we have two newer and two older documents on the political situation in Vietnam, or that, for example, on the lower right corner unit (9/14) all documents date from the same period. Another feature visible even from the distant view is the size of the articles, with some documents having smaller spines than others, as for example on shelf (2/14). Again, information on any selected document is depicted in the status bar of the system,
The additional metaphors only become visible as we zoom into the library. Starting again with the units on the lower left corner of the map we now find more labels immediately available from the shelf. Furthermore, we find the flag or region indicator depicted as a logo on the spine of every document, together with the dust metaphor being a somewhat stronger indicator of a document's age than merely the shelf position. We also now have the textual information on the spine available, listing the ``general topic'' TIME Magazine 1960's plus the actual article title in the form of its filename.
The remaining figures depict different library sections of the map visited while walking along the shelves, such as moving up from the Vietnam section to the African section. If we instead move to the right, we arrive at a shelf containing more documents from the Vietnam cluster, whereas moving further up and to the right we eventually reach the area of the SOMLib library where the Austrian documents are located (shelf (3/10)).
Walking along the lower edge of the library Shelves, we find the pink documents on Tunesia next to the blue section containing Middle East related documents, above which we find several shelves each representing a different topic and thus not being assigned any specific color to set themselves apart).
Continuing along the lowest row of shelves we arrive at the bottom right corner of the map, where we find still more documents on the Middle East, with, for example, older documents located in the lower shelf (9/14), whereas the two documents in the shelf above (9/13) are newer ones. Taking a look at the shelves 3 rows up we find a new topical section to start, namely red books from the British cluster with the dominant Profumo-Keeler scandal. All documents in this cluster now have the British Union Jack Flag assigned as country logo. We further find on shelf (9/11) a rather short, new document (T529) next to two older and longer documents (T342, T354).