Fifth Workshop on Data Analysis (WDA2004)

June 24-27 2004
Tatranska Polianka, Slovak Republic

Overview

Summary
Final program
Some photos from the seminar

Summary

2004 saw the 5th Anniversary of the Workshop on Data Analysis. Such an occasion naturally requires an appropriate setting, which was found in the \emph{Sliezsky Dom, Tatranska Polianka}, in the High Tatras region - at 1.670 meters elevation on of the highest located hotel in the Slovak Republic. The Workshop, held June 24-27 2004, consisted of three main thematic sessions, focusing on supervised learning, on Cluster Analysis, as well as general machine learning applications and representational issues. Within the supervised learning domain, a strong focus was on methods applied to automatic text classification (ATC), comparing different feature selection or document pre-processing methods or the integration of thesaurus information, to increase the quality of the trained classifiers, with a wide range of technologies, such as Rocchio, Support Vector Machines, Multi-Layer Perceptrons or Decision Trees being used. These were applied to general text classification tasks, as well as more specific domains like the automatic population of ontologies from text, or the automatic genre classification of music based on frequency spectra analysis. The second thematic area on Cluster Analysis focused mainly on a prominent technology for topology preserving mappings, namely the self-organizing maps. A range of quality measures and visualizations was discussed, followed by a presentation of extensions to the basic SOM model to create map spaces of different shapes. Furthermore, improvements in feature space representation by incorporating part-of-speech tagging for textual data was presented. The third section focused on various application domains as well as representational questions. Specifically, genetic algorithms for function recognition, as well as algorithms for the integration distributed ranked value data were presented, followed by approaches to object-oriented representations of XML-structured data. As last year, a specific break-out session was held to reflect on the limits of data analysis in general, on particular the limits of text mining and the World Wide Web, by continuing last year's discussion based on the two pieces of literary work, namely \emph{The Library of Babel} by Jorge Luis Borges, as well as a short story by Umberto Eco, namely \emph{On the Impossibility to Draw a Map of the Empire on a Scale of 1 to 1}. Given the splendid location for this year's WDA, following the scientific summits climbed during the sessions, an excursion took the participants to another summit, namely Vychodna Vysoka at 2.428 meters. The high-quality presentations in this Workshop spawned intensive discussions, resulting in a fascinating, dense scientific program. (It also managed to solve the puzzle about the first appearance and usage of the famous IRIS data set as a datamining benchmark.) WDA 2004 was also pleased to welcome several long-term participants to this 5th anniversary. Generally, we would like to again thank all participants for joining and contributing to the tremendous success of this workshop. Special thanks also again go to the Austrian Exchange Service and the Slovak Academic Information Agency, who under project number 45s10 generously supported this workshop within the Austrian-Slovak cooperation program.

Program

Thursday, June 24, 2004
Arrival to hotel Sliezsky dom, High Tatras
18:00 Dinner
Friday, June 25, 2004
- 8:00 Breakfast
- 9:00 Opening: Jan Paralic, Andreas Rauber
- 9:10 - 10:25 Session 1
  Chair: Peter Butka (Technical University of Kosice)
  - Peter Smatana (Technical University of Kosice):
    Two different ways of specific domain document preprocessing
  - Jozef Rjasko (University of Pavol Jozef Safarik, Kosice):
    Object oriented approach to formalization of XML
  - David Celjuska (Technical University of Kosice):
    Semi-automatic Population of Ontologies from Text
- 10:25 - 10:40 Coffee break
- 10:40 - 11:55 Session 2
  Chair: Tomas Horvath (University of Pavol Jozef Safarik, Kosice)
  - Nataliya Sokolovska (Vienna University of Technology):
    Text processing according to the parts of speech based on the SOMLib Digital Library and TreeTagger
  - Rudolf Mayer (Vienna University of Technology):
    Recognisable Shapes for Self-Organizing Maps
  - Georg Pölzlbauer (Vienna University of Technology):
    Survey and Comparison of Quality Measures for Self-Organizing Maps
- 12:30 - 13:45 Lunch
- 13:45 - 15:00 Session 3
  Chair: Michael Dittenbach (Vienna University of Technology, Vienna)
  - Robert Neumayer (Vienna University of Technology):
    Musical Genre Classification using a multi layer perceptron
  - Karol Bucek, Peter Grilli (University of Pavol Jozef Safarik, Kosice):
    F-ReC: Function Re-Cognition
  - Peter Gursky (University of Pavol Jozef Safarik, Kosice):
    Algorithms for integration of distributed valued data
- 15:00 - 15:15 Coffee break
- 15:15 - 17:45 Plenum: Discussion
  Chair: Andreas Rauber Selected Readings:
  - Jorge Luis Borges: The Library of Babel
  - Umberto Eco: On the impossibility to create a map on the scale 1:1
- 18:00 Dinner
Saturday, June 26, 2004
- 7:30 Breakfast
- 8:00 - 14:00 Trip to Vychodna Vysoka (2.428m)
- 14:00 - 15:00 Lunch
- 15:00 - 16:15 Session 4
  Chair: Peter Bednar (Technical University of Kosice)
  - Andreas Pesenhofer (Vienna University of Technology):
    Comparison of feature selection methods for text classification
  - Miroslav Puszta (Technical University of Kosice):
    Boosting of decision trees
  - Martin Sarnovsky (Technical University of Kosice):
    Use of semantic information for text classification improvement
- 19:00 Conference Dinner: 5 years of WDA
Sunday, June 27, 2004
- 8:30 Breakfast
- 10:00 Departure from hotel Sliezsky dom

Some Photos

Below some photos, taken during the seminar and the trip up Vychodna Vysoka (2.428m), and the Conference dinner celebrating 5 years of WDA, and Kosice. (Click on the images to enlarge the thumbnails.)