MIR group at IFS, TU Vienna

Vienna University of Technology
Institute of Software Technology and Interactive Systems
Information & Software Engineering Group

Music Information Retrieval

[Topics] [Projects] [Downloads] [People] [Publications] [Press] [Events]

Automatic Audio Segmentation:
Segment Boundary and Structure Detection in Popular Music

by Ewald Peiszer ([firstname].peiszer@gmx.at)

Automatic audio segmentation aims at extracting information on a songs structure, i.e., segment boundaries, musical form and semantic labels like verse, chorus, bridge etc. This information can be used to create representative song excerpts or summaries, to facilitate browsing in large music collections or to improve results of subsequent music processing applications like, e.g., query by humming.

This thesis features algorithms that extract both segment boundaries and recurrent structures of everyday pop songs. Numerous experiments are carried out to improve performance. For evaluation a large corpus is used that comprises various musical genres. The evaluation process itself is discussed in detail and a reasonable and versatile evaluation system is presented and documented at length to promote a common basis that makes future results more comparable.

Algorithm

Phase 1: Boundary detection

This phase tries to detect the segment boundaries of a song, i.e., the time points where segments begin and end. The output of this phase is used as the input for the next phase.

The classic similarity matrix / novelty score approach has been used. In addition, various attempts to further improve the result have been carried out.

The figure below shows the novelty score plot of KC and the Sunshine Band: That’s the Way I Like It. Vertical dotted lines indicate groundtruth boundaries.

Note that automatic boundary extraction worked very well for this song: all major segment boundaries have been found (red askerisks).

Phase 2: Structure detection

This phase tries to detect the form of the song, i.e., a label is assigned to each segment where segments of the same type (verse, chorus, intro, etc.) get the same label. The labels themselves are single characters like A, B, C, and thus not semantically meaningful.

The songs have been fully annotated. Both sequential-unaware approaches and an approach that takes temporal information into account have been used. In addition, cluster validity indices have been employed to find the correct number of segment types for each song.

The right figure (click to enlarge) shows clustering result of KC and the Sunshine Band: That’s the Way I Like It song segments. Numbered circles indicate segments, crosses mark cluster centroids.

The source code of the algorithm implemented in Matlab can be obtain from the download section. For information on how to use it, please refer to the included README file (or ask the author if there are still problems).

Evaluation setup

A significant amount of time has been invested in careful considerations about good evaluation. An easy-to-use evaluation program that produces both appealing and informative HTML reports has been designed and implemented.

You can download the source code from the download section at the bottom of this page.

A novel file format for audio segmentations (SegmXML) has been introduced. This format can contain information about hierarchical segments and alternative labels. See the example groundtruth file for Alanis Morisette: Thank You. A corresponding XML schema definition file for validating SegmXML files is available, too.

Selected evaluation reports

The evaluation reports of the following algorithm runs are available. Note that this table corresponds to Table 3.1 of the thesis. For an explanation of symbols and abbreviations used please refer to the thesis.

Parameter set	Parameter changed	Boundary extraction results / hyperlink
MFCC40	d_S: Euclidean	P=0.55+- 0.038, R=0.78+- 0.035, F=0.65
MFCC40	d_S: cosine	P=0.55+- 0.039, R=0.76+- 0.038, F=0.64
CQT1	n_H=8	P=0.45+- 0.04, R=0.77+- 0.037, F=0.56
CQT1	n_H=12	P=0.46+- 0.043, R=0.7+- 0.04, F=0.56
CQT1	n_H=16	P=0.52+- 0.044, R=0.64+- 0.042, F=0.58
CQT1	n_H=18	P=0.52+- 0.043, R=0.62+- 0.041, F=0.57
MFCC40	k_C=48, n_H=4	P=0.49+- 0.035, R=0.77+- 0.031, F=0.6
MFCC40	k_C=96, n_H=8	P=0.55+- 0.038, R=0.78+- 0.035, F=0.65
MFCC40	k_C=128, n_H=8	P=0.59+- 0.039, R=0.72+- 0.039, F=0.65
MFCC40	k_C=128, n_H=14	P=0.62+- 0.038, R=0.67+- 0.041, F=0.65
MFCC40	boundary removing heuristic	P=0.57+- 0.038, R=0.75+- 0.038, F=0.65
MFCC40	post processing	P=0.54+- 0.038, R=0.78+- 0.037, F=0.64

MFCC40 and CQT1 are names of two parameter value sets that are explained in Table 3.2 of the thesis. MFCC40 uses Mel Frequency Cepstrum Coefficients features whereas CQT1 employs Constant Q Transform with such parameter values for fundamental frequency, maximal frequency and number of bins that the feature vectors model the semitones of seven octaves, each octave containing twelve notes.

Corpus

The corpus on which this work is based contains 94 songs of various genres (Rock, Pop, Hiphop, RNB, etc). Final algorithm runs are conducted on a 109 song corpus which is the largest corpus used so far in this research field. The following table contains all songs of the corpus.

Unfortunately, the demonstration songs cannot be published due to copyright issues.

Artist	Title
A-HA	Take on me
ABBA	SOS
ABBA	Waterloo
Alanis Morissette	Head Over Feet
Alanis Morissette	Thank You
Artful Dodger Craig David	Rewind
Beastie Boys	Intergalactic
Beatles	All I've Got To Do
Beatles	All My Loving
Beatles	Devil In Her Heart
Beatles	Don't Bother Me
Beatles	Hold Me Tight
Beatles	I saw her standing there
Beatles	I Wanna Be Your Man
Beatles	It Won't Be Long
Beatles	Little Child
Beatles	Misery
Beatles	Money
Beatles	Not A Second Time
Beatles	Please Mister Postman
Beatles	Roll Over Beethoven
Beatles	Till There Was You
Beatles	You Really Got A Hold On Me
Beatles	Anna go to
Beatles	Please please me
Björk	It's Oh So Quiet
Black Eyed Peas	Cali To New York
Britney Spears	Hit Me Baby One More Time
Britney Spears	Oops I Did It Again
Chicago	Old Days
Chumbawamba	Thubthumping
Coolio	The Devil Is Dope
Cranberries	Zombie
Creedence Clearwater Revival	Have You Ever Seen the Rain
Depeche Mode	It's no good
Desmond Dekker	You Can Get It If You Really Want
Deus	Suds & Soda
Dire Straits	Money For Nothing
Eminem ft. Dido	Stan
Faith No More	Epic
Gloria Gayner	I Will Survive
KC and the Sunshine Band	That's the Way I Like It
KoRn	Got The Life
Lucy Pearl	Don't Mess With My Man
Madonna	Like a virgin
Madonna	Into the Groove
Marilyn Manson	Sweet Dreams
Michael Jackson	Bad
Michael Jackson	Black Or White
Nick Drake	Northern Sky
Nirvana	Smells like teen spirit
Nora Jones	Lonestar
Oasis	Wonderwall
Pet Shop Boys	Always On My Mind
Portishead	Wandering star
Prince	Kiss
Queen Yahna	Ain't It Time
R.E.M.	Drive
R Kelly	I Believe I Can Fly
Radiohead	Creep
Red Hot Chili Peppers	Parallel Universe
Salt N Pepa	Whatta Man
Saxon	The Great White Buffalo
Scooter	How Much Is The Fish
Seal	Crazy
Shania Twain	You're Still The One
Simply Red	Stars
Sinhead O Connor	Nothing compares to you
Spice Girls	Wannabe
Suede	Trash
The Beatles	A Day In The Life
The Beatles	A Hard Days Night
The Beatles	Being For The Benefit Of Mr. Kite
The Beatles	Fixing A Hole
The Beatles	Getting Better
The Beatles	Good Morning Good Morning
The Beatles	Help
The Beatles	I Should Have Known Better
The Beatles	If I Fell
The Beatles	I'm Happy Just To Dance With You
The Beatles	Lovely Rita
The Beatles	Lucy In The Sky With Diamonds
The Beatles	Sgt. Peppers Lonely Hearts Club Band
The Beatles	Sgt. Peppers Lonely Hearts reprise
The Beatles	She's Leaving Home
The Beatles	When I'm Sixty-Four
The Beatles	With A Little Help From My Friends
The Beatles	Within You Without You
The Clash	Combat Rock
The Jacksons 5	Can You Feel It
The Monkees	Words
The Police	Message In A Bottle
The Roots	The Next Movement
The Roots ft. Erykah Badu	You Got Me
Additional 15 songs ("test set")
Apollo 440	Stop The Rock
Eav	Wo Ist Der Kaiser
Kazuo Nishi	Eien no replica
Hiromi Yoshii	Magic in your eyes
Fevers	Jinsei konnamono
Kazuo Nishi	Doukoku
Kazuo Nishi	Kage-rou
Hisayoshi Kazato	Cool Motion
Rin	Feeling In My Heart
Mitsuru Tanimoto	Syounen no omoi
Hiromi Yoshii	Dream Magic
Hiromi Yoshii	Midarana kami no moushigo
The Crystal Method	Born Too Slow
Wise Guys	Kinder
Wise Guys	Powerfrau

Conclusions

Both boundary detection and structure extraction are quite acceptable, yet improvable.

The algorithm, however, proved to be robust in a negative and positive sense: Many experiments conducted with various parameter settings and heuristics applied did not lead to a statistically significant improvement of the mean performance.

On the other hand, cross validation and the performance on an independent test set did not show any decline in performance either. Thus, the algorithm presented seems suitable to be applied to a wide range of songs and genres.

Downloads

Master's thesis: Ewald Peiszer: Automatic Audio Segmentation: Segment Boundary and Structure Detection in Popular Music (pdf)
Poster (pdf)
Segmentation algorithm (Matlab) and Evaluation system (Perl) are available on request from the author
Beats files (Beat onsets of all songs extracted by Simon Dixon's BeatRoot. Plain text format.)
Ground truth files (SegmXML file format). Please note, that the groundtruth for the 36 files which originated from Jouni Paulus is not included. Please contact Jouni Paulus for obtaining the groundtruth for these files.

top	last edited 02.08.2007 by Ewald Peiszer, 20.08.2007 by Thomas Lidy

Automatic Audio Segmentation: Segment Boundary and Structure Detection in Popular Music

Algorithm

Evaluation setup

Selected evaluation reports

Corpus

Conclusions

Downloads

Automatic Audio Segmentation:
Segment Boundary and Structure Detection in Popular Music