Multimodal Analysis of Text and Audio Features for Music Information Retrieval

R. Neumayer, A. Rauber:
"Multimodal Analysis of Text and Audio Features for Music Information Retrieval";
in:"Multimodal Processing and Interaction", P. Maragos, A. Potamianos, P. Gros (Hrg.); herausgegeben von: National Technical University of Athens; Springer, Berlin/Heidelberg, 2008, ISBN: 978-0-387-76315-6.

[ Publication Database ]

Abstract:


Multimedia content can be described in different ways as its essence is not
limited to one view. For audio data those multiple views are, for instance,
a song’s audio features as well as its lyrics. Both of those modalities have
their advantages: text may be easier to search in and could cover more of the
“semantics” of a song while it does not say much about“sonic similarity”.
Psychoacoustic feature sets, on the other hand, provide the means to identify
tracks that“sound” similar while they provide little information for seman-
tic categorization of any kind. Discerning requirements for different types of
feature sets are expressed by users’ differing information needs. Particularly
large collections invite users to explore them interactively in a loose way of
browsing, whereas specific searches are much more feasible, if not only possible
at all when supported by textual data.
This chapter describes how audiofiles can be treated in a multimodal
way, pointing out the specific advantages of two kinds of representations. A
visualization method based on audio features and lyrics data and the Self-
Organizing Map is introduced. Moreover, quality metrics for such multimodal
clusterings are introduced. Experiments on two audio collections show the
applicability of our techniques.