A. Schindler,R. Mayer
, A. Rauber:
"Facilitating Comprehensive Benchmarking Experiments on the Million Song Dataset
Vortrag: 13th International Society for Music Information Retrieval Conference (ISMIR 2012), Porto, Portugal; 08.10.2012 - 12.10.2012; in:"Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR 2012)
", (2012), ISBN: 978-972-752-144-9; S. 469 - 474.
[ Publication Database
The Million Song Dataset (MSD), a collection of one
million music pieces, enables a new era of research of Mu-
sic Information Retrieval methods for large-scale applica-
tions. It comes as a collection of meta-data such as the
song names, artists and albums, together with a set of fea-
tures extracted with the The Echo Nest services, such as
loudness, tempo, and MFCC-like features.
There is, however, no easily obtainable download for
the audio files. Furthermore, labels for supervised machine
learning tasks are missing. Researchers thus are currently
restricted on working solely with these features provided,
limiting the usefulness of MSD. We therefore present in
this paper a more comprehensive set of data based on the
MSD, allowing its broader use as benchmark collection.
Specifically, we provide a wide and growing collection of
other well-known features in the MIR domain, as well as
ground truth data with a set of recommended training/test
We obtained these features from audio samples provided
by 7digital.com, and metadata from the All Music Guide.
While copyright prevents re-distribution of the audio snip-
pets per se, the features as well as metadata are publicly
available on our website for benchmarking evaluations. In
this paper we describe the pre-processing and cleansing
steps applied, as well as feature sets and tools made avail-
able, together with first baseline classification results.