Department of Software Technology
Vienna University of Technology
The SOMeJB Music Digital Library - Experiments: SOMeJB 2: Big
Below we provide some experimental results of music organization using the SOMeJB Prototype 2 system (Islands of Music) with a copyrighted collection of 359 pieces of popular music (approx. 23 hours total playing time).
Audio Data
In this section we will briefly describe the results obtained from our experiments with a music collection representing a broad spectrum of musical taste consisting of 359 pieces with a total play length of 23 hours.
(Up to the top of the page)
- col359_segments.in.gz: 3.940 segment vectors representing the segments of music in a 1.200-dimensional feature space (gnu-zipped,18.2MB)
- col359_median.in.gz: 359 vectors representing the pieces of music based on the median of their segments' vectors, again represented in 1.200 dimensional feature space. (gnu-zipped, 1.7MB)
- somejb2.tv: an artificial template vector describing the dimensions, used for the SOMLib DL system input.
(Up to the top of the page)
Below we provide links to SOMs and GHSOMs for the Collection 359. All maps provide links to MP3-files allowing you to listen and analyze the characteristics of the pieces of music. However, for the maps representing pieces of music based on their media, we provide links to one 6-segment piece of music only (to to file size and copyright limitations). The piece linked is usually the 4. segment of the title, i.e. seconds 48-56 of any given piece of music, with the exceptions of titles that are shorter than this. Thus, the links provided are mainly used as an indicator for the type of music - the segment is not necessarily representative of the complete piece of music.
For the Segment-map, links to all individual segments are provided, allowing you to analyze the characteristics of the various segments, and their variability within one title.
(Please note: due to copyright-reasons we do NOT provide full MP3-files of titles, but rather segments of a few seconds length, downsampled to phone-line quality)
Since we have frequently been asked to provide training time estimates, we below also provide those times, taken on a Pentium 4 2GHz Workstation under Linux under normal working conditions, including all file input and output operations using system time and without any optimizations of the training process with respect to performance, just to provide a rough estimate.
- SOM 2: The parameters for this GHSOM training run were set such, that all the data details had to be explained on a single layer map. We thus find no hierarchical structuring of the data on this map. The required absolute granularity of data representation was set to a lower value than with SOM 1 listed below, so the resulting SOM is more compact, having grown to a size of 9x4 units, i.e. using 25 units to represent the 359 songs. An overview image of the resulting SOM is provided below.
(Training time: less than 6 seconds)(GHSOM property file)
|
|
SOM2:
Image of SOM 2 (Click on the image to go to the actual map)
- SOM 1: The parameters for this GHSOM training run were set such, that all the data details had to be explained on a single layer map. We thus find no hierarchical structuring of the data on this map. The required absolute granularity of data representation is identical to the GHSOM below, thus this SOM basically resembles the GHSOMs listed above, but projected onto one layer. The resulting map is rather large, having grown to 10x7 units.An overview image of the resulting SOM is provided below.
(Training time: less than 12 seconds) (GHSOM property file)
|
|
SOM1:
Image of SOM 1 (Click on the image to go to the actual map)
- GHSOM 1: This map evolved into a 2x4 top-layer map with
one further layer added to all but one unit (the one in the bottom left
corner) in the hierarchical organization. It provides a somewhat more
strongly structured view of the data set. In the top layer, soft classical
music is in the upper right corner, with titles becoming more dynamic and
aggressive as you move towards the bottom left corner. The organization of the second-level maps follows that overall orientation.
(Training time: less than 3 seconds) (GHSOM property file)
Integrated representation: This link takes you to the same GHSOM, yet in this representation the first two layers of the GHSOM have been integrated into a single map, allowing you to get a flat overview of the entire first two layers. (We recommend, though, to ste the font size of your browser very small to allow you to view the whole map on a single screen).
|
|
GHSOM1 - Integrated View:
Image of GHSOM 1 with the first 2 layers of the hierarchy integarted into a single top-layer map. (Click on the image to go to the actual map.)
- Segment-GHSOM 1: In order to see in how far the individual segments of the songs differ, we present a GHSOM trained with the individual segments of the pieces of music, rather than using their median as vector representation. Thus, each segment is located according to its sound characteristics. Although most segments of a given piece of music are mapped together onto one unit, we do find some segments that differ sufficiently from the remaining segments as to be mapped onto differing locations.
The map evolves to a 4x2 map on the top layer (with all segments being listed at the individual units, making the individual cells uncomfortably large, but it helps in analyzing the maps). All units are expanded onto a second layer, with 90 units being expanded to a 3rd-layer map.
The links present in the GHSOM point to the individual 6-second segments as they are used for feature vector representation.
(Training time: 38 seconds, with probably the major part being devoted to input and output operations rather than training.) (GHSOM property file)
4. Islands of Music
Below, we provide a somewhat more detailed analysis of a 10x14 SOM in its Islands of Music representation.
Figure 1 depicts an overview of the collection. The trained SOM consists of 14x10 map units and the clusters are visualized using the SDH with n=3 linear interpolation. Several clusters can be identified immediately. We will discuss the 6 labeled clusters in more detail. Some general observations at this level are that the islands are spread out rather evenly on on the map in a complex arrangement with a relatively high mountain in the south-east.
Figure 1:
The visualization of the music collection consisting of 359 pieces of music trained on a SOM with 14x10 map units. The rectangular boxes mark areas into which the subsequent figures zoom into. The islands labeled with numbers from 1 to 6 are discussed in more detail in the text.
|
|
Figure 1:
The visualization of the music collection consisting of 359 pieces of music trained on a SOM with 14x10 map units. The rectangular boxes mark areas into which the subsequent figures zoom into. The islands labeled with numbers from 1 to 6 are discussed in more detail in the text.
Figure 2 depicts simplified weather charts. With these it is possible to obtain a first impression of the styles of music which can be found in specific areas. For example, music with strong bass can be found in the west, and in particular in the north-west. The bass is strongly correlated with the maximum fluctuation strength, i.e. pieces with very strong beats can also be found in the north-west, while pieces without strong beats nor bass are located in the south-east, together with non-aggressive pieces. Furthermore, the south-east is the main location of pieces where the lower frequencies are dominant. However, the north-west corner of the map also represents music where the low frequencies dominate. As we will see later, this is due to the strong bass contained in the pieces.
|
|
|
|
|
Figure: 2
Simplified weather charts. White indicates areas with high values while dark gray indicates low values. The charts represent from left to right, top to bottom the maximum fluctuation strength, bass, non-aggressiveness, and domination of low frequencies.
A close-up of Cluster 1 in Figure 1 is depicted in the north of the map in Figure 3. This island represents music with very strong beats, in particular several songs of the group Bomfunk MCs (bfmc) are located here but also songs with more moderate beats such as Blue by Eiffel 65 (eiffel65-blue) or Let's get loud by Jennifer Lopez (letsgetloud). All but three songs of Bomfunk MCs in the collection are located on the west side of this island. One exception is the piece Freestyler (center-bottom Figure 3) which has been the group's biggest hit so far. Other songs which can be found towards the east of the island are Around the World by ATC (aroundtheworld), and Together again by Janet Jackson (togetheragain) which both can be categorized as a Electronic/Dance. Around the island other songs are located which have stronger beats, for example towards the south-west, Bongo Bong by Mano Chao (bongobong) and Under the mango tree by Tim Tim (themangotree), both with male vocals, an exotic flair and similar instruments.
|
|
Figure 3:
Close-up of Cluster 1 and 2 depicting 3x4 map units.
In the Figure 3 Cluster 2 is depicted in the south-east. This island is dominated by pieces of the rock band Red Hot Chili Peppers (rhcp). All but few of the band's songs which are in the collection are located on this island. To the west of the islands a piece is located which, at first does not appear to be similar, namely Summertime by Sublime (sl-summertime). This song is a crossover of styles such as Rock and Reggae but has a similar beat pattern as Freestyler. However, Summertime would make a good transition in a play-list starting with Electro/House and moving towards the style of Red Hot Chili Peppers which resembles a crossover of different styles such as Funk and Punk Rock, e.g. In Stereo, Freestyler, Summertime, Californication. Not illustrated in the close-up but also interesting is that just to the south of Summertime another song of Sublime can be found namely What I got.
|
|
Figure 4:
Close-up of Cluster 3 and 4 depicting 4x3 map units.
A close-up of Cluster 3 is depicted in the south-west of Figure 4. This cluster is dominated by aggressive music such as the songs of the band Limp Bizkit (limp) which can be categorized as Rap-Rock. Other similar pieces are Freak on a Leash by Korn (korn-freak), Dead Cell by Papa Roach (pr-deadcell), or Kryptonite by 3 Doors Down (d3-kryptonite). In the north of this cluster, for example, the Punk Rock Song by Bad Religion (br-punkrock) can be found. To the west of this cluster, just beyond the borders of this close-up, several other songs by Limp Bizkit are located together with songs by Papa Roach and to the south-west Rock is dead by Marilyn Manson.
The pieces arranged around Cluster 4 are depicted in the east of Figure 4. Generally the pieces in Cluster 4 sound less aggressive than those in Cluster 3. However, those in the south of this cluster are closely related to those of Cluster 3, including pieces such as Wandering by Limp Bizkit (limp-wandering), Binge by Papa Roach (pr-binge), and the two songs by Guano Apes (ga) which are a mixture of Punk Revival, Alternative Metal, and Alternative Pop/Rock. To the north of the cluster the songs Addict by K's Choice and Living in a Lie by Guano Appes are mapped next to each other. Living in a Lie deals with the end of a love story, and is dominated by a mood, which sounds very similar to the mood of Addict which deals with addiction and includes lines such as ``I am falling" and ``I am cold, alone". The other pieces in the north of the cluster are modern interpretations of classical pieces by Vanessa Mae (vm).
|
|
Figure 5:
Close-up of Cluster 5 and 6 depicting 3x4 map units.
The final two clusters which we will describe in detail are depicted in Figure 5. Cluster 5 represents concert music and classical music used for films, including the well known Starwars theme (starwars), the theme of Indiana Jones (indy), and the end credits of Back to the Future III (future). However, there are also two pieces in this cluster which do not fit this style, namely Yesterday by the Beatles (yesterday) and Morning has broken by Cat Stevens (morningbroken).
Cluster 6 represents peaceful classical pieces such as Für Elise by Beethoven (elise), Eine kleine Nachtmusik by Mozart (nachtmusik), Fremde Länder und Menschen by Schumann (kidscene), Air from Orchestral Suite #3 by Bach (air), and Trout Quintet by Schubert.
Although the results we obtained are generally very encouraging, we have come across some problems which point out the limitations of the approach. For example, the song Wild Wild West by Will Smith (wildwildwest) does not sound very similar to songs by Papa Roach or Limp Bizkit, however, they are located together in Cluster 3. Another problem in the same region is the song It's the end of the world by REM (rem-endoftheworld) which is located next to songs such as Freak on a Leash by Korn. Problems in different regions include, for example, Between Angles and Insect by Papa Roach (pr-angles) which is located in the south of the Cluster 5 which is definitely a poor match.
The main reason to these problems can be found in the feature extraction process. Although we analyze the dynamic behavior of the loudness in several frequency bands, we do not take the sound characteristics directly into account as could be done, for example, by analyzing the cepstrum which is a common technique in speech recognition. Another explanation is the simplified median approach. Many pieces usually consist of more than one typical rhythm pattern, combining these using the median can lead to a pattern which might be less typical for a piece than the individual ones.
|
|
Figure 6:
The model vectors of the 14x10 music SOM. Each subplot represents the rhythm pattern of a specific model vector. The horizontal axis represents modulation frequencies from 0-10Hz the vertical axis represents the frequency bands Bark 1-20. The range depicted to the left of each subplot depicts the highest and lowest fluctuation strength value within the respective rhythm pattern. The gray shadings are adjusted so that black corresponds to the lowest and white to the highest value in each pattern.
For detailed evaluations the model vectors of the SOM can be visualized as depicted in Figure 6. As indicated by the weather charts the lowest fluctuation strength values are located in the south-east of the map and can be found in map unit (14,1). It is interesting to note the similarity between the typical rhythm pattern of Für Elise as presented in the architecture description of prototype 2, and this unit.
Generally, the model vectors are a good representation of the rhythm patterns contained in the collection, as each model vector represents the average of all pieces mapped to it.
(Up to the top of the page)
Up to the SOMeJB Homepage
Comments: rauber@ifs.tuwien.ac.at