Abstract:
Automatic music transcription (AMT) is a relatively new application in the field of music signal processing. The purpose of an AMT algorithm is to transform a raw acoustic
musical signal into a written version, namely a score. The most basic pieces of information
an AMT system aims to extract from a raw acoustic musical signal are the properties of
individual note events, such as the starting time (onset), duration, and pitch. Because of its
overwhelming complexity, the transcription problem has been broken down into sub-tasks,
and separate algorithms have been developed over the years to address different operations
in the overall system. Pitch detection is an important part of any transcription system, and
has been the subject of a vast volume of research over the past two decades.
Estimation of a single pitch at each time step is known as monophonic pitch detection.
In this work, we present an instance-based classification approach to transcription
of monophonic melodies. Depending on the size of training database, two different pitch
classification methods are proposed. The conventional K-Nearest Neighbor algorithm is
trained on a large database of piano notes and employed for pitch detection. A two-step algorithm, combining semi-KNN pitch candidate selection and note sequence
tracking is suggested to deal with cases in which the training database is of minimum size,
containing one sample per class. It is demonstrated that in the abundance of training
data, the KNN algorithm along with a proper choice of the distance measure and K, yields
high performance accuracy. Furthermore, while maintaining low computational complexity,
the proposed two-step algorithm is capable of compensating for the shortage of data by
incorporating prior musicological information in the transcription process.
We note that monophonic pitch detection is a mature problem compared to polyphonic
pitch detection, which is the main focus of current studies. Nevertheless, monophonic pitch
detection can still be of interest since a considerable portion of music corpora is composed
of single line melodies. One of the shortcomings of the available monophonic algorithms,
which are mostly based on signal processing techniques such as autocorrelation function
(ACF) or spectral peak picking, is that they are under-evaluated in terms of frequency
range and melodic structure.
Classification-based pitch detection algorithms have been proposed later on and particularly
developed for polyphony. Despite their promising performance accuracy, the classification methods that have been employed to solve the multi-pitch detection problem, namely
Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs), are computationally
too demanding to be utilized for the monophonic case which is a simpler scenario. The
work presented in this thesis was motivated by a need for monophonic transcription techniques
with significantly reduced training time, low run-time complexity, and the capability
to explore melodic contours.