基于音频指纹和版本识别的音乐检索技术研究
发布时间:2018-11-28 19:42
【摘要】:基于内容的音乐检索是当前音频检索的热门领域,而且随着在线音乐量的不断增加,其应用价值也越来越大。另一方面,用户的检索需求也在变化,他们往往不满足于仅仅获得与查询完全相同的歌曲,还希望获得目标音乐的多个版本,比如不同歌手、不同场合演唱的版本。随着网络自媒体的发展和业余翻唱的普及,这种需求也越来越明显。 基于内容的音乐检索分别从查询音乐和样例音乐提取特征,然后进行特征匹配来检索与查询相同的样例音乐。在样例检索中使用的特征通常称为音频指纹,其追求格式紧凑简洁,倾向于匹配内容相同的音乐片段,而音乐版本特征表达复杂,倾向于匹配版本特征相同的片段,而内容并不一定相同。因此本文对两者分开处理,音乐版本识别可以在规范样例库中离线进行,而基于音频指纹的检索实时进行,对于指纹检索命中样例,可以根据版本识别结果马上给出相关样例(即该歌曲的其它版本)。 由于人类听觉性能良好,本文希望从基于听觉机理的特征出发来构建音频指纹。在分析人耳的生理特征后,本文使用余弦基和发放函数来仿真耳蜗对声音的处理流程,,然后使用稀疏分解得到特征系数。为了克服分解耗时较高的问题,提出了基于匹配追踪算法的快速特征提取方法。 由于基于听觉机理的稀疏特征形式复杂,并不适于直接用来检索,本文将其压缩转换为音频指纹。应用的主要方法包括使用最小哈希对高维二值序列特征进行降维,以及使用局部敏感哈希进行快速检索,然后给出相应的候选确认和样例检出方法。实验表明该指纹特征具有较好的检索效率和表达性,对于轻微噪声和时域全局性变化的鲁棒性较好,但对时域局部变化鲁棒性较差。 在音乐版本识别方面,本文首先分析了音乐版本领域内的基础定义、主要问题和通用处理方法。通过对识别流程梳理和各种方法比较分析,构建出完整的音乐版本识别方法。本文对常用的谐波音级轮廓特征进行了改进,加入节拍和调移信息并作为版本识别的核心特征,而且在特征计算前应用了必要的预处理步骤,包括峰值估计、节拍估计和参照频率估计等。实验结果显示本文构建的版本识别方法是有效的。
[Abstract]:Content-based music retrieval is a hot area of audio retrieval, and with the increasing of the amount of online music, its application value is increasing. On the other hand, the retrieval needs of users are also changing, they are often not satisfied with just getting the same songs as the query, and they also want to obtain multiple versions of the target music, such as different singers, different singing versions of different occasions. With the development of self-media and the popularity of amateur reproduction, this demand is becoming more and more obvious. Content-Based Music Retrieval (CBIR) extracts features from query music and sample music, and then performs feature matching to retrieve the same sample music as query. The features used in sample retrieval are usually called audio fingerprints, which pursue compact format and tend to match music segments with the same content, while the music version features are complex and tend to match segments with the same version features. And the content is not necessarily the same. Therefore, the music version recognition can be carried out offline in the canonical sample library, and the retrieval based on audio fingerprint can be carried out in real time. Depending on the version recognition result, you can immediately give the relevant sample (that is, other versions of the song). Because the human auditory performance is good, this paper hopes to construct audio fingerprint based on auditory mechanism. After analyzing the physiological characteristics of the human ear, the cosine basis and the firing function are used to simulate the processing process of the cochlea sound, and then the feature coefficients are obtained by sparse decomposition. In order to overcome the time-consuming problem of decomposition, a fast feature extraction method based on matching tracking algorithm is proposed. Because the sparse feature form based on auditory mechanism is complex, it is not suitable for direct retrieval. In this paper, the audio fingerprint is compressed and converted to audio fingerprint. The main methods of application include reducing dimension of high dimensional binary sequence features using minimum hash and fast retrieval using local sensitive hashes. Then the corresponding candidate validation and sample detection methods are given. Experiments show that the fingerprint feature has better retrieval efficiency and expressiveness, better robustness to slight noise and global variation in time domain, but less robust to local variation in time domain. In the aspect of music version recognition, this paper first analyzes the basic definition, main problems and general processing methods in the field of music version. By combing the identification process and comparing various methods, a complete music version recognition method is constructed. In this paper, the commonly used harmonic level contour features are improved by adding beat and modulation information as the core features of version recognition, and the necessary preprocessing steps, including peak estimation, are applied before feature calculation. Beat estimation and reference frequency estimation etc. Experimental results show that the proposed version recognition method is effective.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TN912.34
本文编号:2364070
[Abstract]:Content-based music retrieval is a hot area of audio retrieval, and with the increasing of the amount of online music, its application value is increasing. On the other hand, the retrieval needs of users are also changing, they are often not satisfied with just getting the same songs as the query, and they also want to obtain multiple versions of the target music, such as different singers, different singing versions of different occasions. With the development of self-media and the popularity of amateur reproduction, this demand is becoming more and more obvious. Content-Based Music Retrieval (CBIR) extracts features from query music and sample music, and then performs feature matching to retrieve the same sample music as query. The features used in sample retrieval are usually called audio fingerprints, which pursue compact format and tend to match music segments with the same content, while the music version features are complex and tend to match segments with the same version features. And the content is not necessarily the same. Therefore, the music version recognition can be carried out offline in the canonical sample library, and the retrieval based on audio fingerprint can be carried out in real time. Depending on the version recognition result, you can immediately give the relevant sample (that is, other versions of the song). Because the human auditory performance is good, this paper hopes to construct audio fingerprint based on auditory mechanism. After analyzing the physiological characteristics of the human ear, the cosine basis and the firing function are used to simulate the processing process of the cochlea sound, and then the feature coefficients are obtained by sparse decomposition. In order to overcome the time-consuming problem of decomposition, a fast feature extraction method based on matching tracking algorithm is proposed. Because the sparse feature form based on auditory mechanism is complex, it is not suitable for direct retrieval. In this paper, the audio fingerprint is compressed and converted to audio fingerprint. The main methods of application include reducing dimension of high dimensional binary sequence features using minimum hash and fast retrieval using local sensitive hashes. Then the corresponding candidate validation and sample detection methods are given. Experiments show that the fingerprint feature has better retrieval efficiency and expressiveness, better robustness to slight noise and global variation in time domain, but less robust to local variation in time domain. In the aspect of music version recognition, this paper first analyzes the basic definition, main problems and general processing methods in the field of music version. By combing the identification process and comparing various methods, a complete music version recognition method is constructed. In this paper, the commonly used harmonic level contour features are improved by adding beat and modulation information as the core features of version recognition, and the necessary preprocessing steps, including peak estimation, are applied before feature calculation. Beat estimation and reference frequency estimation etc. Experimental results show that the proposed version recognition method is effective.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TN912.34
【参考文献】
相关期刊论文 前1条
1 于永彦;;基于Jaccard距离与概念聚类的多模型估计[J];计算机工程;2012年10期
本文编号:2364070
本文链接:https://www.wllwen.com/kejilunwen/wltx/2364070.html