R. Neumayer, A. Rauber:
"Multimodal Analysis of Text and Audio Features for Music Information Retrieval
in:"Multimodal Processing and Interaction
", P. Maragos, A. Potamianos, P. Gros (Hrg.); herausgegeben von: National Technical University of Athens; Springer, Berlin/Heidelberg, 2008, ISBN: 978-0-387-76315-6.
[ Publication Database
Multimedia content can be described in diﬀerent ways as its essence is not
limited to one view. For audio data those multiple views are, for instance,
a song’s audio features as well as its lyrics. Both of those modalities have
their advantages: text may be easier to search in and could cover more of the
“semantics” of a song while it does not say much about“sonic similarity”.
Psychoacoustic feature sets, on the other hand, provide the means to identify
tracks that“sound” similar while they provide little information for seman-
tic categorization of any kind. Discerning requirements for diﬀerent types of
feature sets are expressed by users’ diﬀering information needs. Particularly
large collections invite users to explore them interactively in a loose way of
browsing, whereas speciﬁc searches are much more feasible, if not only possible
at all when supported by textual data.
This chapter describes how audioﬁles can be treated in a multimodal
way, pointing out the speciﬁc advantages of two kinds of representations. A
visualization method based on audio features and lyrics data and the Self-
Organizing Map is introduced. Moreover, quality metrics for such multimodal
clusterings are introduced. Experiments on two audio collections show the
applicability of our techniques.