Bartsch M.A., Wakefield G.W., Singing Voice Identification Using Spectral Envelope Estimation, IEEE Transactions on Speech and Audio Processing, vol.. 12, n. 2, pp. 100-109, March 2004

Abstract

In this paper, we present a spectrum-based system
for singer identification that operates for the ideal case in which
audio samples contain only the singer’s voice. Our method begins
with the computation of a robust estimate of the spectral envelope
called the composite transfer function (CTF). The CTF is derived
from the instantaneous amplitude and frequency of the sinusoidal
partials which make up the vocal signal. Unlike traditional sourcefilter
theory [1], the CTF does not explicitly separate the spectral
characteristics of the vocal source and the vocal tract filter.
The principal components of the CTFs are used as features for a
quadratic classifier to identify singers. The approach is validated
on a database containing samples from twelve classically trained
singers. In cross validation experiments, test set accuracies of approximately
95% are found for a baseline case. The classifier’s performance
is not degraded when different vowels are included in
classifier training and evaluation. Restricting the frequency range
of the CTFs and using a test set containing samples extracted from
solo performances of Italian arias reduces the test set accuracy to
70–80%.

Index Term

Music information retrieval, singer identification,
spectral analysis, vocal tract transfer function.