Music Information Retrieval
Abstract
Music Information Retrieval (MIR) is an emerging research area
that focuses on the content-based retrieval of musical documents
against musical queries. We developed an approach to
automatically extract music content descriptors from both
documents and queries in a notated form. This approach allows
also for tuning the exhaustivity and specificity of retrieval
results. Unlike pattern-matching based approaches, it is scalable
for large collections of documents. A further step has been done
in the direction of indexing of documents in audio format through
automatic alignment of the audio signal with the corresponding
events in the score.
Description
Music Information Retrieval (MIR) is an emerging research area
that focuses on the content-based retrieval of musical documents
against musical queries, where both documents and queries may be
in acoustic or notated form.
The approach to MIR carried out at IMS is based on the automatic
extraction of music content descriptors from both documents and
queries in a notated form. It is assumed that good descriptors
are musically relevant melodic profiles, which are addressed to as
"musical phrases". Extracted phrases are used to index documents
and to match queries against documents through classical
techniques of textual information retrieval. The methodology has
been tested on a collection of documents of art Western music, and
a prototype has been developed.
This approach allows also for tuning the exhaustivity and
specificity of retrieval results, because automatically extracted
phrases can be normalized with an increasing level of
generalization on the description of the melodic profile. The
normalization can also help overcoming the problems that may arise
from differences in different versions of the same work and from
errors that users may add when forming the queries.
The proposed methodology has some important features. Unlike
pattern-matching based approaches, it is scalable for large
collections of documents, because indexing is performed off-line
and retrieval can be computed efficiently. On the other hand,
unlike n-grams based approaches, content descriptors are musically
relevant and may characterize a particular music work in a similar
way that a given set of words may characterize a particular
textual document. Finally, being based on the off-line indexing
of documents, this methodology is scalable also for large
collections of documents.
A further step has been done in the direction of indexing of
acoustic recordings, i.e. documents in audio format. It is well
known that automatic extraction of melodic profiles of polyphonic
recordings is prone to errors and almost unfeasible. For this
reason the recognition of an unknown audio recording is a
difficult task, as well as its indexing for music information
retrieval.
It is proposed to partially overcome these problems by automatic
matching an unknown performance with the corresponding notated
document, carried out through automatic alignment of the audio
signal with the corresponding events in the score. Recognition is
performed using a hidden Markov model (HMM) that is automatically
created for each score and used to measure the probability that
the performance is a realization of a particular score. Once the
unknown performance is matched with a score, alignment is
performed through classical Viterbi decoding.
The automatic matching of an unknown performance with a score can
be used to recognize and to index an unlabeled audio flow, such as
a recording of a concert or of a broadcast radio transmission.
Once a performance is matched with a score the methodology
developed for notated music documents can be easily extended also
to acoustic recordings. Moreover, the alignment of the events in
the score with time positions in the recording may be used to add
a structure, taken from the score, to the unstructured audio flow.
This may help finding relevant passages inside a possibly long
audio documents, in order to ease the evaluation of document
relevance in a retrieval session.
The methodology is under test using a collection of documents,
both in acoustic and in notated form, in order to measure the
error rates in automatic recognition and alignment.
Nicola Orio
Last modified: Mon Nov 04 14:21:30 CEST 2002