Skip to content

Music-ML Data

tl;dr

 Dataset  Archive

Description

We use a dataset collected by Aljanaki et al., consisting of 400 MP3 music files, each having a playtime of one minute and labeled with one of four genres: rock, pop, classical and electronic, each genre contains 100 files, the genre will be used as label for the ML model. Then by generating MFCC vectors and training a SVM, the ML-model can classify emotions of the provided .mp3 files with and accuracy of 76.25%.

Figure 1: Accuracy of predictions matrix in Jupyter Notebook.

Solution

DBRepo is used as relational data storage of the raw- and aggregated features, prediction results and the splits of the training- and test data. For each of the 400 .mp3 files, 40 MFCC feature vectors are generated. This data is stored in aggregated form in the aggregated_features table.

DBRepo Features

  • Database as storage for machine learning data
  • System versioning
  • Subset exploration
  • Precise & PID of database tables
  • External data access for analysis