Seminar Kosice - Vienna _____

(link to homepage of first seminar: WDA 2000)

Overview

Kurzbeschreibung
Short Fact Sheet
General Information, goals
Detailed Goals of the Seminar
Methods Used
Experiments Data
Paper
Results
Comments and Questions

Kurzbeschreibung

Allgemeines: Seminar in englischer Sprache in Zusammenarbeit mit der Technischen Universität Kosice, in Kosice, Slovakei
Anrechenbarkeit: Seminar aus Informatik bzw. Seminar aus Artificial Intelligence, beide in englischer Sprache
Inhalt: TextMining Verfahren, Analyse von Textkollektionen mit Hilfe von Neuronalen Netzen, Ontologien, sowie Analyse von Musikdaten, Strukturierung von Audiarchiven (MP3s), Genre-Erkennung in Musik, etc., sowie ausgiebige Diskussionen
Vorbesprechung: Mi., 7.3.2001, 16:00 Uhr, am Institut
Datum: voraussichtlich 12.-14. Juni 2003.
Seminar-Ort: Univ. of Technology, Budapest, Hungary
Sprache: Englisch
Teilnehmerzahl: limitiert: max. 6 StudentInnen von der TU Wien sowie 6 StudentInnen der TU Kosice
Anmeldung: per e-mail an rauber@ifs.tuwien.ac.at

Overview

General: English-language Seminar in cooperation with the Technical University of Kosice in Kosice, Slovakia
Credits: "Seminar aus Informatik" or "Seminar aus Artificial Intelligence"
Content: Text Mining, Text Analysis, Audiomining, Analysis of MP3 Archives, Genre Detection in Music, etc., as well as lots of discussion
First Meeting: Tue., 7.3.2001, 16:00, at my office
Date: planned for June 12-14 2003
Seminar Place: Univ. of Technology, Budapest, Hungary
Language: English
Number Participants: limited: max. 6 students from the Vienna University of Technology plus 6 students from TU Kosice
Registration: send e-mail to rauber@ifs.tuwien.ac.at

General Information

Following the great success of the first turn of this international seminar on data mining and clustering algorithms in Kosice in 2000 (WDA 2000), we will again offer this seminar this year.

The seminar will be organized as a Student Workshop with participants from the Vienna University of Technology, Austria, and the Technical University of Kosice, Slovakia as a cooperation between the Department of Software Technology (IfS) at VUT Vienna and the Department of Artificial Intelligence at TUKE, Kosice. The main goal of this seminar is to bring together students who are interested in the field of data mining, to discuss and exchange ideas and experiences.
We will analyze and compare a set of data analysis techniques based on some reference data set. We'll then make a two-day trip to Budapest, Hungary, where the individual results of the various approaches will be presented and discussed in an inspiring atmosphere. Thus, every participant will gain a good knowledge and overview of the strengths, weaknesses and applicabilities of the various approaches.

Apart from that, we will defintely also have time for some 'social program' apart from the seminar itself, as one of the central ideas of this seminar is to get people together and have fun while doing some reasonable and interesting work :)

For details on last year's seminar as well as for some pictures, see the WDA 2000 Homepage.

Goal of the Seminar

The goal is to analyze and compare a set of text mining techniques based on some reference data set. The individual results of the various approaches will be presented at this seminar, followed by a comparison of these results. Thus, every pasrticipant will gain a good knowledge and overview of the strengths, weaknesses and applicabilities of the various approaches.

How easy was the system to use ?
(data preprocessing, number of parameters to be configured, simplicity of the system, do we understand what the system does, are the results easily interpretable,...)
How stable was the system ?
(parameter sensitivity - die the results vary a lot when you slightly changed some of the parameters, time required to perform analysis, ...)
Which kind of information canbe mined?
How useful is the information?
Did the system perform equally well on the different data sets, or did it work better with one set or the other?
Would you use this system again? If so, for which tasks?
Further comments....

Methods Used

A set of different methods will be used for analysis, namely

Experiments Data

We will use 3 different data sets for our experiments, each of which has different characteristics. Thus, we should be able to analyze the strengths and weaknesses of the various approaches with respect to different types of data to be analyzed. The 3 datasets are as follows:

TIME Magazine Article Data
Newspaper articles of the TIME Magazine from the 1960's,
tba

The Paper

Each participant shall write a paper to be presented at our Workshop meeting in May/June. Basically, the paper shall comprise the following:

Title
without comment ;-)
Abstract
a short abstract of about 250 words describing the gist of your paper: (1) what is the paper about, (2) what is the problem (3) how are you trying to solve it, and (4) what are the results.
Introduction
A bit more detailed description of the problem: questions pertaining to text-mining, which of these are addressed by the tool you are analyzing, what can the results be used for, ...
Related Work
a short review of some work others have done in this field: data and mining tools and applicatiosn, various methods, ...
The Method
a description of the too and method you are using: what does it do, how does it do it - the technical stuff
Experiments
description of the experiments set-up: data sets used, size, preprocessing, ...
description of the results obtained, screenshots, ...
brief comparison of results with those of other methods
Conclusion
a short summary and outlook
References
a short list of references: papers about the method used, manuals, etc.

The length of the paper shall be between 5 to 12 pages in the ACM style. Style files for MS Word and LaTeX and other word processors can be downloaded from http://www.acm.org/pubs/submitting_accepted_articles/au_dl.htm

R E S U L T S

Kosice:
- to be added
Vienna:
- to be added

Preliminary Schedule

February / March 2001: First meeting, discussion of various issues concerning the seminar, presentation of the reference data sets, presentation of a set of text mining techniques that will be analyzed in the course of the seminar. Each participant may then select one or two of the proposed text mining technologies she or he wants to analyze.
March/April 2001: The selected methods will be studied in some detail, and the reference data sets will be analyzed using these specific techniques. Programs for this analysis will be provided.
End of April 2001: By the end of April, a first report describing the methods used and the results obtained, will be handed in. We will also discuss these findings in an internal meeting.
May/June 2001: The final report will be written and disseminated to all seminar participants. We will then meet for 2 days to present the results, discuss the findings, and most probably also have lots of fun :-)

Comments / Questions

In case you have questions, please contact:

Kosice:
- Jan Paralic
Vienna:
- Andreas Rauber

BACK