In order to cope with this challenge, methods for automatically organizing music by genre gain importance. Due to the difficulties of analyzing the content of music itself, most approaches reverted to text-based analysis of pieces of music, relying on title and author information, metadata description, or the lyrics of songs for automatic classification. These features form the core of the search facilities of the MPEG7 standard currently under development [7]. Similar to manual classification, these approaches to finding and organizing music rely heavily on manually created descriptions. A different line of research is constituted by content-based music analysis, trying to organize and locate pieces of music based on the similarity of melodies. The digital music library [4,1] extracts melody-information from a hummed query and matches it against a database of musical tunes for which the actual scores are available. Similar approaches are reported in [6], using the scores provided by MIDI-files to index and retrieve musical documents, and in [3], focusing on beat detection.
Yet, for the majority of music documents available today, such as the prominent MP3 files, no musical scores are provided. What we would thus like to have is a way to provide content-based organization and retrieval of musical documents based on the actual sound rather than on score transcripts. However, with the huge amounts of data used for describing sound information as well as the inherent noise in musical sound representation, conventional retrieval techniques are of only limited use. This makes it a challenging arena for neural networks, which are particularly suited for generalizing from noisy data and for extracting key features from large datasets.
In this paper we propose a content-based clustering of musical documents based on the actual sound. Rather than trying to extract precise scores, frequency spectra are used to describe the characteristics of a specific piece of music. We then use the Self-Organizing Map (SOM) [5], a popular unsupervised neural network, to automatically cluster pieces of music according to their similarity. After the unsupervised training process, similar pieces of music are found in neighboring areas on the two-dimensional map display. This allows a user to easily orient herself within an unknown music collection, by finding, say, classical music in the upper left corner of the map, whereas disco-style music may be found in a different region. Selecting a cluster of music according to ones current preferences, rather than having to specify a list of songs based on textual descriptions provides a more intuitive and direct access to music libraries. These concepts have successfully been applied to text clustering [2,8].
The remainder of this paper is structured as follows: Section 2 presents the architecture of our system, detailing feature extraction, vector creation and music clustering using the Self-Organizing Map. We then provide experimental results using a collection of MP3 files in Section 3 and finally some conclusions as well as an outlook on future work in Section 4.