검색 상세

적합성 피드백 및 내용기반 음악 검색 시스템

A Relevance Feedback and Content-based Music Retrieval System

초록/요약

Recently, with the explosive popularity of digital music, there has been tremendous interest in using technologies such as similarity measuring and filtering for the task of retrieving and recommending music. In the music information retrieval community, many researchers have been investigating and developing efficient transcription and retrieval methods for query by humming systems which has been considered as one of the most intuitive and effective query methods for music retrieval. For the voice humming to be a reliable query source, elaborate signal processing and acoustic similarity measurement schemes are necessary. In this dissertation, we developed a novel music retrieval system called MUSEMBLE (MUSic enEMBLE) based on several distinct features: (i) A sung or hummed query is automatically transcribed into a sequence of pitch and duration pairs with improved accuracy for music representation. More specifically, we developed two new and unique techniques called WAE (Windowed Average Energy) for more accurate offset detection and EFX (Energetic Feature eXtractor) for onset, peak, attack and transient detection in acoustic signal, respectively. The former improved energy-based approaches such as AE (Average Energy) by defining multiple windows with its own local threshold value instead of one global value. On the other hand, the latter improved the AF (Amplitude Function) that calculates the summation of the absolute values of signal differences for the clustering energy contour. For accurate note onset detection, we define a dynamic threshold curve that is similar to the decay curve in the previous onset detection model [56, 57]; (ii) for accurate acquisition of the fundamental frequency of each frame, we apply the CAMDF (Circular Average Magnitude Difference Function; (iii) For the indexing purpose, we proposed a popularity-adaptive indexing structure called FAI (Frequently Accessed Index) based on frequently queried tunes. This scheme is based on the observation that users have a tendency to memorize and query a small number of melody segments, and indexing such segments enables fast retrieval; (iv) A user query is reformulated using user relevance feedback with a genetic algorithm to improve retrieval performance. Even though we have especially focused on humming queries in this dissertation, MUSEMBLE provides versatile query and browsing interfaces for various kinds of users. To evaluate the performance of our proposed scheme, we have carried out extensive experiments on the prototype system to evaluate the performance of our voice query transcription and GA (Genetic Algorithm)-based RF (Relevance Feedback) schemes. For an extensive and accurate evaluation, we used the QBSH (Query by Singing/Humming) corpus, which was adopted in a MIREX 2006 contest data set. Experimental results show that our proposed schemes reduce note segmentation errors such as note drop, note add, pitch, and duration error, thus improving the transcription accuracy. We demonstrate that our proposed RF method improves the retrieval accuracy up to 20~40% compared with other popular RF methods. We also show that both WAE and EFX methods improve the transcription accuracy up to 95%.

more

목차

Chapter 1 Introduction = 1
1.1 Music Information Retrieval = 3
1.2 Scope and Goal of the Dissertation = 6
1.3 Main Results of the Dissertation = 7
1.3.1 AMTranscriber = 7
1.3.2 FMF = 8
1.3.3 MUSEMBLE and M-MUSICS = 9
1.4 Overview and Organizations = 10
Chapter 2 Literature Review = 12
2.1 Feature Extraction = 12
2.1.1 Power = 13
2.1.2 Fundamental Frequency (f0) = 14
2.1.3 Spectral Features = 15
2.1.4 Duration and Modulation = 16
2.1.5 Pitch = 17
2.1.6 Timbre = 18
2.1.7 Rhythm = 19
2.2 Music Retrieval System = 20
2.2.1 Audentify = 22
2.2.2 C-Brahms = 22
2.2.3 CubyHum = 24
2.2.4 GUIDO = 25
2.2.5 MELDEX = 25
2.2.6 PROMS = 26
2.2.7 Themefinder = 27
2.2.8 MIRACLE = 28
2.2.9 Other MIR Systems = 28
Chapter 3 Feature Extraction and Music Transcription = 31
3.1 Sysem Overview = 32
3.1.1 AMTranscriber = 32
3.1.2 eXAMT = 33
3.1.3 User Interface = 34
3.2 Onset Model and Feature Extraction = 36
3.2.1 Energetic Features = 37
3.2.2 Onset Model = 37
3.2.2.1 Amplitude-based Difference Function (ADF) = 40
3.2.2.2 Energetic Feature eXtractor (EFX) = 41
3.2.2.3 Dynamic Threshold Curve (DTC) = 44
3.2.2.4 Modified Windowed Average Energy (MWAE) = 47
3.2.3 Pitch Extraction = 50
3.3 Voice Query Transcription Scheme = 51
3.3.1 Preprocessing = 52
3.3.2 Note Detection = 53
3.3.3 Pitch Detection and Analysis = 53
Chapter 4 Database Schema and Indexing = 55
4.1 System Overview = 55
4.2 Database Schema = 56
4.3 Indexing Scheme = 58
4.4 Indexing Maintenance = 59
4.4.1 Entry Management = 59
4.4.2 Entry Expansion = 61
4.4.3 Entry Modification and Deletion = 62
4.4.4 Algorithm = 64
Chapter 5 Music Matching and Retrieval = 69
5.1 System Overview = 70
5.1.1 Architecture (MUSEMBLE) = 70
5.1.2 Architecture (M-MUSICS) = 71
5.1.3 User Interface = 72
5.2 Melody Representation = 76
5.3 Matching Algorithm = 78
5.3.1 Exact Matching = 78
5.3.1.1 Brute Force (BF) = 78
5.3.1.2 Knuth Morris Prats (KMP) = 80
5.3.1.3 Boyer-Moore (BM) = 81
5.3.2 Approximate Matching = 83
5.3.2.1 Dynamic Programming (DP) = 83
5.3.2.2 Longest Common Sequence (LCS) = 85
5.4 Query Refinement = 86
5.4.1 Relevance Feedback = 86
5.4.2 Genetic Algorithm = 87
Chapter 6 Evaluations = 92
6.1 Evaluation Method and Dataset (AMT) = 92
6.2 Evaluation Method and Dataset (eXAMT) = 93
6.3 Evaluation Results = 94
6.3.1 Transcription Performance (AMT) = 95
6.3.2 Transcription Performance (eXAMT) = 98
6.3.3 Retrieval Performance = 107
Chapter 7 Conclusion = 113
7.1 Summary = 113
7.2 Future Research = 114
Bibliography = 116

more