Music Information Retrieval

Abstract

Music Information Retrieval (MIR) is an emerging research area that focuses on the content-based retrieval of musical documents against musical queries. We developed an approach to automatically extract music content descriptors from both documents and queries in a notated form. This approach allows also for tuning the exhaustivity and specificity of retrieval results. Unlike pattern-matching based approaches, it is scalable for large collections of documents. A further step has been done in the direction of indexing of documents in audio format through automatic alignment of the audio signal with the corresponding events in the score.

Description

Music Information Retrieval (MIR) is an emerging research area that focuses on the content-based retrieval of musical documents against musical queries, where both documents and queries may be in acoustic or notated form. The approach to MIR carried out at IMS is based on the automatic extraction of music content descriptors from both documents and queries in a notated form. It is assumed that good descriptors are musically relevant melodic profiles, which are addressed to as "musical phrases". Extracted phrases are used to index documents and to match queries against documents through classical techniques of textual information retrieval. The methodology has been tested on a collection of documents of art Western music, and a prototype has been developed. This approach allows also for tuning the exhaustivity and specificity of retrieval results, because automatically extracted phrases can be normalized with an increasing level of generalization on the description of the melodic profile. The normalization can also help overcoming the problems that may arise from differences in different versions of the same work and from errors that users may add when forming the queries. The proposed methodology has some important features. Unlike pattern-matching based approaches, it is scalable for large collections of documents, because indexing is performed off-line and retrieval can be computed efficiently. On the other hand, unlike n-grams based approaches, content descriptors are musically relevant and may characterize a particular music work in a similar way that a given set of words may characterize a particular textual document. Finally, being based on the off-line indexing of documents, this methodology is scalable also for large collections of documents. A further step has been done in the direction of indexing of acoustic recordings, i.e. documents in audio format. It is well known that automatic extraction of melodic profiles of polyphonic recordings is prone to errors and almost unfeasible. For this reason the recognition of an unknown audio recording is a difficult task, as well as its indexing for music information retrieval. It is proposed to partially overcome these problems by automatic matching an unknown performance with the corresponding notated document, carried out through automatic alignment of the audio signal with the corresponding events in the score. Recognition is performed using a hidden Markov model (HMM) that is automatically created for each score and used to measure the probability that the performance is a realization of a particular score. Once the unknown performance is matched with a score, alignment is performed through classical Viterbi decoding. The automatic matching of an unknown performance with a score can be used to recognize and to index an unlabeled audio flow, such as a recording of a concert or of a broadcast radio transmission. Once a performance is matched with a score the methodology developed for notated music documents can be easily extended also to acoustic recordings. Moreover, the alignment of the events in the score with time positions in the recording may be used to add a structure, taken from the score, to the unstructured audio flow. This may help finding relevant passages inside a possibly long audio documents, in order to ease the evaluation of document relevance in a retrieval session. The methodology is under test using a collection of documents, both in acoustic and in notated form, in order to measure the error rates in automatic recognition and alignment.


Nicola Orio
Last modified: Mon Nov 04 14:21:30 CEST 2002