An Instance-Based Classification Approach to Automatic Transcription of Monophonic Melodies




Pishdadian, Fatemeh

Journal Title

Journal ISSN

Volume Title



Automatic music transcription (AMT) is a relatively new application in the field of music signal processing. The purpose of an AMT algorithm is to transform a raw acoustic musical signal into a written version, namely a score. The most basic pieces of information an AMT system aims to extract from a raw acoustic musical signal are the properties of individual note events, such as the starting time (onset), duration, and pitch. Because of its overwhelming complexity, the transcription problem has been broken down into sub-tasks, and separate algorithms have been developed over the years to address different operations in the overall system. Pitch detection is an important part of any transcription system, and has been the subject of a vast volume of research over the past two decades. Estimation of a single pitch at each time step is known as monophonic pitch detection. In this work, we present an instance-based classification approach to transcription of monophonic melodies. Depending on the size of training database, two different pitch classification methods are proposed. The conventional K-Nearest Neighbor algorithm is trained on a large database of piano notes and employed for pitch detection. A two-step algorithm, combining semi-KNN pitch candidate selection and note sequence tracking is suggested to deal with cases in which the training database is of minimum size, containing one sample per class. It is demonstrated that in the abundance of training data, the KNN algorithm along with a proper choice of the distance measure and K, yields high performance accuracy. Furthermore, while maintaining low computational complexity, the proposed two-step algorithm is capable of compensating for the shortage of data by incorporating prior musicological information in the transcription process. We note that monophonic pitch detection is a mature problem compared to polyphonic pitch detection, which is the main focus of current studies. Nevertheless, monophonic pitch detection can still be of interest since a considerable portion of music corpora is composed of single line melodies. One of the shortcomings of the available monophonic algorithms, which are mostly based on signal processing techniques such as autocorrelation function (ACF) or spectral peak picking, is that they are under-evaluated in terms of frequency range and melodic structure. Classification-based pitch detection algorithms have been proposed later on and particularly developed for polyphony. Despite their promising performance accuracy, the classification methods that have been employed to solve the multi-pitch detection problem, namely Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs), are computationally too demanding to be utilized for the monophonic case which is a simpler scenario. The work presented in this thesis was motivated by a need for monophonic transcription techniques with significantly reduced training time, low run-time complexity, and the capability to explore melodic contours.



Automatic Music Transcription, Sequence Detection, Classification, K-Nearest Neighbor