Building Ensembles of Audio and Lyrics Features to Improve Musical Genre Classification

R. Mayer, A. Rauber:
"Building Ensembles of Audio and Lyrics Features to Improve Musical Genre Classification";
Vortrag: International Conference on Distributed Framework&Applications (DFmA'10), Yogyakarta; 02.08.2010 - 03.08.2010; in:"Proceedings of the International Conference on Distributed Framework&Applications (DFmA'10)", (2010), ISBN: 978-602-9747-9-0-4; S. 165 - 170.

[ Publication Database ]

Abstract:


Digital audio is an increasingly spread medium, and for many consumers, it has become the major form of distribution and storage of music. Especially for portable and mobile devices, digital audio has become almost the singular manifestation. Numerous on-line music stores, which account for a growing share of the total record sales, underline this trend.
However, with their ever-growing size, effectively and efficiently handling both private and commercial collections becomes increasingly difficult. There is therefore a need for computer algorithms that can understand and interpret different characteristics of music, so that they can subsequently assist a user by organising music collections or recommending certain pieces of music.

Music is an inherently multi-modal type of data, and the lyrics associated with the music are as essential to the reception and the message of a song as the audio itself. Often, album covers are carefully designed by artists to convey a message consistent with the music and image of a band. Music videos, fan sites and other sources of information add to that in a coherent manner.

However, there is often a focus on utilising the audio information only.
In this paper, we thus explore the lyrics domain of music, and investigate how the information obtained from lyrics can be combined with the acoustic domain. We evaluate our approach by means of musical genre classification, a common task in music information retrieval.
Advancing over our previous work that successfully showed improvements with simple feature fusion by combining representations obtained from lyrics and audio, we apply a more sophisticated machine learning technique, namely ensemble classification.
The experiments on our test collections show that the approach is always superior to the best choice of a single classification algorithm on a single feature set. Moreover, it releases the user from making this choice explicitly.