Million Song Dataset

From Chorus

**Million Song Dataset**
Domain	Music
Media	Audio
Size	280 GB
Instances	1,000,000
File Format	HDF5
Creation Date	2011-02-08
Task	Retrieval
Copyright
URL	http://labrosa.ee.columbia.edu/millionsong/

Description

The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.

Its purposes are:

To encourage research on algorithms that scale to commercial sizes
To provide a reference dataset for evaluating research
As a shortcut alternative to creating a large dataset with The Echo Nest's API
To help new researchers get started in the Music IR field

The core of the dataset is the feature analysis and metadata for one million songs, provided by The Echo Nest. The dataset does not include any audio, only the derived features.

We also provide a subset of 10,000 songs (1%, 1.8 GB compressed) for a quick taste.

Quality

The data set is available in HDF5 data format + a number of SQLite files and .TXT index files.

Source

The Million Song Dataset started as a collaborative project between The Echo Nest and LabROSA. It is supported in part by the NSF.

Ground Truth Annotation

The data set contains both metadata (artist, album, track title, tags etc.) as well as a variety of annotations done through the The Echo Nest's Analysis API (see below).

An Example Track Description showing the available fields is provided here.

Additional data has been added from other sources, e.g. lyrics from musixmatch.

Features

Numerous features (through audio analysis) and additional meta-data, tags and links to additional resources are available for this dataset.

The list of fields is provided here.

Licensing / Copyright

Citation

references or publications

External Links

http://labrosa.ee.columbia.edu/millionsong/

Million Song Dataset

Description

Quality

Source

Ground Truth Annotation

Features

Licensing / Copyright

Citation

External Links

Views

Personal tools

Navigation

CHORUS+

Search

Toolbox