基于MFCC的异常声音识别技术研究
发布时间:2018-07-16 08:49
【摘要】:随着社会的快速发展,一些潜在的矛盾冲突也伴随出现,异常声音逐渐引起人们的思索和重视。传统安防迫切要求视频监控引入对异常声音的判别,以提高智能性。长久以来,异常声音的识别研究发展缓慢,远远落后于其它声音的进展,主要是人们找不到刻画异常声音本质的特征参数。借鉴人耳听觉特性在听音辩物上的优势,越来越多模仿人耳听觉感知的特征参数被提出,在信息科学领域已显示出巨大发展潜力。本文针对采集的停车场几种典型异常声音,重点研究了一种基于Mel频率倒谱系数(MFCC)及其改进的特征提取方法,并通过支持向量机完成各类异常声音的识别分类。本文主要研究内容有:1、对采集到的异常声音信号预处理,主要包括归一化、预加重、分帧加窗。目的是消除采集过程中声音样本之间数值量级的差异,避免音量忽高忽低对整体声音质量造成的影响,突出各个样本的自身特征。由于异常声音是非线性非平稳的,要经过加窗分帧处理得到其短时信号。为了减少特征提取和识别阶段中的运算量,进行端点检测处理来确定异常声音信号的起始点和结束点。2、针对预处理后的异常声音信号,提取基于Mel频率的倒谱特征参数。在特征提取过程中,利用人耳对声音频率感知的特性,构造一组类似耳蜗作用的三角滤波器,其作用是将每帧声音信号频谱能量由线性频域映射到Mel频域。然后对三角滤波器输出的非线性频谱进行对数变换,最后通过离散余弦变换映射在倒谱域上,完成MFCC特征参数的提取。3、在求取MFCC过程中,针对傅里叶变换有限的时频分辨力以及计算过程中产生谐波干扰的缺陷,采用小波变换进行相应的改进,使得提取的特征参数更加符合人耳听觉特性,改善对噪声的鲁棒性。同时,在特征提取过程中,引入经验模态分解法,挖掘更多的动态特征,从而获得复合改进的MFCC特征提取方法。4、完成异常声音的分类识别。根据对异常声音提取的特征,用支持向量机完成异常声音模型的建立和测试。在训练和测试阶段,通过组合多个二分类SVM来实现多分类识别。根据参数对异常声音模型泛化能力大小的影响,选择合适的核函数类型,从而得到最佳的训练模型,完成对测试样本所属类别的确定。
[Abstract]:With the rapid development of society, some potential conflicts also appear, abnormal sound gradually aroused people's thinking and attention. Traditional security requires video surveillance to introduce the discrimination of abnormal sound in order to improve intelligence. For a long time, the research on the recognition of abnormal sound has been slow and far behind the progress of other sounds, mainly because people can not find the characteristic parameters to describe the nature of abnormal sound. Drawing on the advantages of human auditory characteristics in audible speech, more and more characteristic parameters imitating human auditory perception have been proposed, which has shown great potential in the field of information science. In this paper, a new feature extraction method based on Mel frequency cepstrum coefficient (MFCC) and its improved feature extraction method is studied for several typical abnormal sounds collected from parking lot, and the recognition and classification of abnormal sounds are realized by support vector machine (SVM). The main contents of this paper are as follows: 1, preprocessing the collected abnormal sound signals, including normalization, preweighting, framing and windowing. The purpose of this paper is to eliminate the difference of numerical magnitude between sound samples in the process of acquisition, to avoid the influence of volume fluctuation and fluctuation on the overall sound quality, and to highlight the characteristics of each sample. Because the abnormal sound is nonlinear and non-stationary, the short-time signal is obtained by windowing. In order to reduce the computation in the stage of feature extraction and recognition, the endpoint detection is performed to determine the starting and ending point of abnormal sound signal, and the cepstrum feature parameters based on Mel frequency are extracted for the preprocessed abnormal sound signal. In the process of feature extraction, a group of triangular filters similar to cochlear interaction are constructed by using the human ear's perception of sound frequency. The function of the triangular filter is to map the spectral energy of each frame of sound signal from linear frequency domain to Mel frequency domain. Then logarithmic transformation of the nonlinear spectrum of the triangular filter output is carried out. Finally, the MFCC feature parameters are extracted by discrete cosine transform mapping in the cepstrum domain, and the MFCC feature parameters are extracted in the process of obtaining MFCC. In view of the finite time-frequency resolution of Fourier transform and the defects of harmonic interference in the calculation process, wavelet transform is adopted to make the extracted feature parameters more in line with the auditory characteristics of human ears and improve the robustness to noise. At the same time, in the process of feature extraction, the empirical mode decomposition method is introduced to mine more dynamic features, so as to obtain the compound improved MFCC feature extraction method .4, and complete the classification and recognition of abnormal sound. According to the feature of abnormal sound extraction, support vector machine (SVM) is used to build and test the abnormal sound model. In the stage of training and testing, multi-classification recognition is realized by combining multiple binary SVM. According to the influence of the parameters on the generalization ability of the abnormal sound model, the appropriate kernel function type is selected, and the best training model is obtained, and the classification of the test sample is determined.
【学位授予单位】:哈尔滨工程大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TN912.34
[Abstract]:With the rapid development of society, some potential conflicts also appear, abnormal sound gradually aroused people's thinking and attention. Traditional security requires video surveillance to introduce the discrimination of abnormal sound in order to improve intelligence. For a long time, the research on the recognition of abnormal sound has been slow and far behind the progress of other sounds, mainly because people can not find the characteristic parameters to describe the nature of abnormal sound. Drawing on the advantages of human auditory characteristics in audible speech, more and more characteristic parameters imitating human auditory perception have been proposed, which has shown great potential in the field of information science. In this paper, a new feature extraction method based on Mel frequency cepstrum coefficient (MFCC) and its improved feature extraction method is studied for several typical abnormal sounds collected from parking lot, and the recognition and classification of abnormal sounds are realized by support vector machine (SVM). The main contents of this paper are as follows: 1, preprocessing the collected abnormal sound signals, including normalization, preweighting, framing and windowing. The purpose of this paper is to eliminate the difference of numerical magnitude between sound samples in the process of acquisition, to avoid the influence of volume fluctuation and fluctuation on the overall sound quality, and to highlight the characteristics of each sample. Because the abnormal sound is nonlinear and non-stationary, the short-time signal is obtained by windowing. In order to reduce the computation in the stage of feature extraction and recognition, the endpoint detection is performed to determine the starting and ending point of abnormal sound signal, and the cepstrum feature parameters based on Mel frequency are extracted for the preprocessed abnormal sound signal. In the process of feature extraction, a group of triangular filters similar to cochlear interaction are constructed by using the human ear's perception of sound frequency. The function of the triangular filter is to map the spectral energy of each frame of sound signal from linear frequency domain to Mel frequency domain. Then logarithmic transformation of the nonlinear spectrum of the triangular filter output is carried out. Finally, the MFCC feature parameters are extracted by discrete cosine transform mapping in the cepstrum domain, and the MFCC feature parameters are extracted in the process of obtaining MFCC. In view of the finite time-frequency resolution of Fourier transform and the defects of harmonic interference in the calculation process, wavelet transform is adopted to make the extracted feature parameters more in line with the auditory characteristics of human ears and improve the robustness to noise. At the same time, in the process of feature extraction, the empirical mode decomposition method is introduced to mine more dynamic features, so as to obtain the compound improved MFCC feature extraction method .4, and complete the classification and recognition of abnormal sound. According to the feature of abnormal sound extraction, support vector machine (SVM) is used to build and test the abnormal sound model. In the stage of training and testing, multi-classification recognition is realized by combining multiple binary SVM. According to the influence of the parameters on the generalization ability of the abnormal sound model, the appropriate kernel function type is selected, and the best training model is obtained, and the classification of the test sample is determined.
【学位授予单位】:哈尔滨工程大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TN912.34
【参考文献】
相关期刊论文 前10条
1 刘扬;张苗辉;郑逢斌;;听觉选择性注意的认知神经机制与显著性计算模型[J];计算机科学;2013年06期
2 胡峰松;曹孝玉;;基于Gammatone滤波器组的听觉特征提取[J];计算机工程;2012年21期
3 王再欢;唐云建;韩鹏;;一种利用声音识别的森林盗伐检测方法[J];计算机工程与应用;2012年30期
4 李勇;李应;余清清;;新型MFCC和波动模型相结合的二层环境声音识别[J];计算机工程与应用;2011年30期
5 张学锋;王芳;夏萍;;融合LPC与MFCC的特征参数[J];计算机工程;2011年04期
6 栾少文;龚卫国;;公共场所典型异常声音的特征提取[J];计算机工程;2010年07期
7 王s,
本文编号:2125880
本文链接:https://www.wllwen.com/kejilunwen/wltx/2125880.html