Abstract:
Pitch detection is a subset of automatic music transcription, which is the application of
various signal processing algorithms with the specific intent of automatically gathering
musical information from audio signals. This field, in various forms, has been the subject of
much research over the years, as it has virtually endless possibilities for application. Much
work has been done on monophonic signals, however, much less has been done to tackle the
problem of polyphonic music.
One major issue for polyphonic pitch detection systems is efficiency. Most existing algorithms
sacrifice efficiency for accuracy and robustness, while some others take the opposite
tradeoff. The purpose of this paper is to work toward new systems that are both robust
and efficient enough to run in realtime.
First, the relevant background information necessary to explore this topic is presented.
Musical terminology and concepts are explained, and some common analytical tools and
algorithms used in existing systems are described.
Three relatively efficient reference systems are then presented. The first is a multiple
fundamental frequency (F0) estimator based on the Auto-Correlation Function (ACF) that
utilizes a unique enhancement algorithm to easily identify individual pitch components.
The second is a multiple F0 estimator based on the Fast Fourier Transform (FFT) that
exploits the harmonic nature of musical sounds.
The third reference system outputs directly to pitch numbers by using a modified form
of an unsupervised learning algorithm called Non-Negative Matrix Factorization (NMF).
It is clear from investigation of the first two reference systems that the two opposing
camps on fundamental frequency estimation (FFT vs. ACF) are actually quite complementary.
Therefore, the remainder of the paper explores the inherent high-frequency versus
low-frequency accuracy tradeoffs and proposes potential solutions. A novel analysis tool
called the Combined ACF/FFT Representation (CAFR) is developed and three new pitch
detection algorithms are devised from it. These algorithms are then evaluated for both
robustness and efficiency and compared against results for the three reference systems.