Unsupervised Bayesian Musical Key and Chord Recognition




Wang, Yun-Sheng

Journal Title

Journal ISSN

Volume Title



Butler Lampson once said "All problems in computer science can be solved by another level of indirection." Many tasks in Music Information Retrieval can be approached using indirection in terms of data abstraction. Raw music signals can be abstracted and represented by using a combination of melody, harmony, or rhythm for musical structural analysis, emotion or mood projection, as well as efficient search of large collections of music. In this dissertation, we focus on two tasks: analyzing tonality and harmony of music signals. Tonality (keys) can be visualized as the "horizontal" aspect of a music piece covering extended portions of it while harmony (chords) can be envisioned as the "vertical" aspect of music in the score where multiple notes are being played or heard simultaneously. Our approach concentrates on transcribing western popular music into its tonal and harmonic content directly from the audio signals. While the majority of the proposed methods adopt the supervised approach which requires scarce manually-transcribed training data, our approach is unsupervised where model parameters for tonality and harmony are directly estimated from the target audio data. Our approach accomplishes this goal using three novel steps. First, raw audio signals in the time domain are transformed using undecimated wavelet transform as a basis to build an enhanced 12-dimensional pitch class profile (PCP) in the frequency domain as features of the target music piece. Second, a bag of local keys are extracted from the frame-by-frame PCPs using an infinite Gaussian mixture which allows the audio data to "speak-for-itself" without pre-setting the number of Gaussian components to model the local keys. Third, the bag of local keys is applied to adjust the energy levels in the PCPs for chord extraction.



Computer science