基于听觉滤波器的音频感知哈希算法及其在音乐检索中的应用

发布时间：2018-04-15 23:02

本文选题：音频感知哈希 + Gammachirp滤波器组　；参考：《华东理工大学》2015年硕士论文

【摘要】：随着互联网和多媒体技术的不断发展,人们能够越来越方便的获取更多的数字音频资源。由于人耳听觉系统对于音频具有卓越的分辨能力,即使在嘈杂的环境中,只需要几秒钟便可以识别出正在播放的歌曲。但问题是面对越来越多的音频资源,如何通过计算机实现自动音频识别。由此产生了基于内容进行识别的音频感知哈希技术。针对目前很多提出的音频感知哈希算法鲁棒性不够好,计算复杂度高的问题,本文提出一种新的音频感知哈希算法。首先,我们设计了一种新的音频时频域特征表示方法,用多通道Gammachirp滤波器组在人耳最敏感频带范围内对音频信号进行滤波,分帧后按频带计算能量谱,实验证明该音频特征具有很好的鲁棒性和抗几何失真能力。接着利用非负矩阵分解(Non-negative Matrix Factorization, NMF)提取出Gamamchirp耳蜗能量谱局部特征的同时对数据进行降维。最后对该局部特征进行差分和量化得到二值化的音频感知哈希,实验结果表明在经受音频编辑软件多种攻击和实际环境中录音检索时,所提出的音频感知哈希算法都具有很高的识别率。另一方面,检索速度在音频信息检索中也是一个很重要的问题。仅通过改变算法无法在短时间内获得显著的速度提升。因此,有必要利用其它计算设备加速音频检索算法。图形处理单元(Graphic Processing Unit, GPU)能够提供强大的并行计算能力,尝试利用GPU对已有音频检索算法进行加速具有重要的意义。本文中,通过利用CPU与GPU协同运算使得感知哈希匹配和整个音频信息检索过程的耗时得到了大幅度降低。最后,本文结合以上算法设计了一个交互式音乐检索系统,该系统可以通过录取几秒种的音频片段检索出其对应的曲名,歌手以及专辑封面图片等信息。
[Abstract]:With the continuous development of Internet and multimedia technology, people can obtain more and more digital audio resources more and more conveniently.Because the human auditory system has excellent audio discrimination, even in noisy environments, it takes only a few seconds to recognize the songs being played.But the problem is how to realize automatic audio recognition by computer in the face of more and more audio resources.Therefore, an audio perceptive hashing technique based on content recognition is produced.Aiming at the problem that many audio perceptive hashing algorithms are not robust enough and high computational complexity, a new audio perceptual hash algorithm is proposed in this paper.First of all, we design a new time and frequency domain feature representation method for audio frequency. We filter audio signals in the most sensitive frequency band of human ear by using multi-channel Gammachirp filter banks, and calculate the energy spectrum according to the frequency band after dividing frames.Experiments show that the audio feature has good robustness and anti-geometric distortion.Then the non-negative Matrix factorization (NMF) is used to extract the local features of the Gamamchirp cochlear energy spectrum and to reduce the dimension of the data.Finally, the binary audio perceptual hashes are obtained by differential and quantization of the local features. The experimental results show that, when the audio editing software is subjected to various attacks and the actual environment,The proposed audio perceptual hashing algorithm has a high recognition rate.On the other hand, retrieval speed is also an important problem in audio information retrieval.Only by changing the algorithm can not achieve a significant speed increase in a short period of time.Therefore, it is necessary to use other computing devices to speed up the audio retrieval algorithm.Graphic Processing Unit (GPU) can provide powerful parallel computing power. It is of great significance to use GPU to accelerate the existing audio retrieval algorithms.In this paper, the time consuming of perceptual hash matching and the whole audio information retrieval process is greatly reduced by using CPU and GPU cooperative operation.Finally, this paper designs an interactive music retrieval system based on the above algorithms. The system can retrieve the corresponding music titles, singers and album cover pictures by taking audio clips of several seconds.
【学位授予单位】：华东理工大学
【学位级别】：硕士
【学位授予年份】：2015
【分类号】：TN713;TP391.3

【参考文献】