Million Song Dataset Benchmarks - Collection Characteristics

We successfully downloaded 994,960 audio samples from the content provider. The list of samples from the original MSD that we were unable to retrieve is provided in missing_samples.txt. A full list of all sample properties (track_id, title, artist_name, duration, 7digital_Id, sample_bitrate, sample_length, sample_rate, sample_mode, sample_version, filesize) is provided in sample_properties.csv.gz.

Sample lengths

Song-length statistics as CSV file.

Please note that the scale is logarithmic. It can be observed that there are two peaks at sample lengths of 30 and 60 seconds with 366,130 and 596,630 samples, respectively, for a total of 96,76% of all the samples.

MP3 encoding

Sampling rate

Sampling rate # samples % samples
22 768,710 77.26%
44 226,169 22.73%
other 81 0.01%

Sample rate statistics as CSV file.

Bitrate

Bitrate # samples % samples
64 343,344 34.51%
128 646,120 64.94%
other (VBR) 5494 0.55%

Bitrate statistics as CSV file.

Stereo/mono

# samples % samples
Mono 6,342 0.6%
Stereo 150,779 15.2%
Joint stereo / dual channel 5494 0.55%

Channel statistics as CSV file.