Department of Software Technology
Vienna University of Technology
Knowledge discovery in literature data bases
The concept of knowledge discovery as defined through "establishing
previously unknown and unsuspected relations of features in a data base"
is, cum grano salis, relatively easy to implement for a data bases
containing numerical data.
Increasingly we find at our disposal data bases containing scientific literature.
Computer assisted detection of unknown relations of features in such data
bases would be extremely valuable and would lead to new scientific insights.
However, the current representation of scientific knowledge in such data
bases is not conducive to computer processing. Any correlation of features
still has to be done by the human reader, a process which is plagued by
ineffectiveness and incompleteness.
On the other hand we note that considerable progress is being made in an
area where reading all available material is totally prohibitive:
the World Wide Web. Robots and web crawlers mine the Web continuously
and construct data bases which allow the identification of pages of
interest in near real time.
An obvious step is to categorize and classify the documents in the text data
base. This can be used to identify papers worth reading, or which are of
unexpected cross-relevance. We show the results of first experiments using
unsupervised classification based on neural networks.
Up
Comments: rauber@ifs.tuwien.ac.at