嘈杂声学环境下的时频语音出现概率与噪声功率谱估计

发布时间:2018-04-02 11:01

  本文选题:语音出现概率 切入点:噪声功率谱估计 出处:《北京理工大学》2016年博士论文


【摘要】:语音出现概率与噪声功率谱是语音增强所依赖的基本前提,它们对噪声消除的结果有着决定性的影响。语音出现概率与噪声功率谱估计是两个等效问题,从一个问题的解可以推导出另一个解。本文关注的焦点在于利用统计模型推导出两个最优解。传统的统计模型建模方法是启发式的,在模型参数的更新过程中采用了大量的经验规则,甚至某些重要的参数直接由经验给出。启发式的方法使得模型参数对数据的自适应能力差,难以保证最优解。此外,传统的建模方法是半监督式的。它们通常假定输入语音是以非语音起始的,起始部分的非语音可视作被标记的样本,用于监督式建模,在后续更新中采用决策导向的非监督方法更新模型,因而在整体上视为半监督式的建模。然而,在实际应用中输入语音经常以语音信号起始,因而半监督式建模方法不能满足实际需求。针对传统方法存在的问题,本文提出了一种基于非监督聚类的最优估计方法,在极大似然准则指导下求解聚类模型的参数,从而保证了语音出现概率和噪声功率谱的解是最优的。具体采用二元高斯混合模型(GMM)和隐马尔可夫模型(HMM)作为聚类模型,将语音和非语音聚类看作模型的两个“元”。本文中,聚类过程等同于模型参数的估计过程,噪声功率谱的解则由聚类均值表示,语音出现概率(SPP)则由聚类的统计特征导出。由于聚类是非监督式的建模方法,它不需要非语音起始假设,比传统的建模方式更贴近于实际应用。论文的具体贡献和创新性研究成果简述如下:1.提出了二元GMM的非监督离线建模方法,对每个子带上的对数功率谱包络建模,采用经典的EM方法实现最优估计。2.提出了二元HMM的离线建模方法。HMM相比于GMM的优势在于它考虑了谱包络的时间相关性,它将子带上的功率谱包络视作在语音和非语音状态之间动态转移的状态序列,EM方法使得时间相关性自适应于观察数据。3.在经典的EM方法基础上,实现了一个近似最优的GMM参数在线估计,GMM的参数集逐帧更新,同时逐帧输出检测与估计结果。4.提出HMM的在线似然函数,并在似然函数的基础上,根据牛顿迭代法推导出HMM参数集的一阶递归过程,实现参数的逐帧最优更新。5.针对功率谱包络的统计特征,提出约束二元GMM/HMM模型的方法,使得模型在语音长时缺失的情况仍然保持稳定。
[Abstract]:Speech appearance probability and noise power spectrum are the basic premise of speech enhancement, and they have a decisive effect on the result of noise elimination. The probability of speech appearance and the estimation of noise power spectrum are two equivalent problems. The focus of this paper is to deduce two optimal solutions from the solution of one problem. The traditional statistical model modeling method is heuristic. In the process of updating model parameters, a large number of empirical rules are used, and even some important parameters are given directly by experience. The heuristic method makes the model parameters' adaptive ability to data poor, so it is difficult to guarantee the optimal solution. The traditional modeling methods are semi-supervised. They usually assume that the input speech starts with non-speech, and the non-speech in the beginning part can be regarded as a marked sample for supervised modeling. Decision-oriented unsupervised method is used to update the model in the follow-up update, so it is regarded as semi-supervised modeling in the whole. However, in practical application, the input speech often starts with speech signal. Therefore, the semi-supervised modeling method can not meet the practical requirements. In order to solve the problems of traditional methods, an unsupervised clustering based optimal estimation method is proposed in this paper, which can solve the parameters of the clustering model under the guidance of maximum likelihood criterion. Therefore, it is ensured that the solution of speech appearance probability and noise power spectrum is optimal. In this paper, binary Gao Si mixed model (GMMM) and hidden Markov model (HMMM) are used as clustering models, and speech and non-speech clustering are regarded as two "elements" of the model. The clustering process is equivalent to the estimation of the model parameters, the solution of the noise power spectrum is represented by the clustering mean, and the speech appearance probability SPP is derived from the statistical features of the clustering. It does not require the assumption of non-speech initiation and is closer to practical application than the traditional modeling method. The specific contributions and innovative research results of this paper are summarized as follows: 1. An unsupervised offline modeling method for binary GMM is proposed. For the logarithmic power spectral envelope modeling of each subband, the classical EM method is used to realize the optimal estimation. 2. An off-line modeling method of binary HMM is proposed. The advantage of hmm over GMM is that it takes into account the temporal correlation of spectral envelope. It regards the power spectral envelope of the subband as a state sequence of dynamic transition between speech and non-speech states, which makes temporal correlation adaptive to observation data .3. based on the classical EM method, An approximate optimal on-line estimation of GMM parameters is implemented. The parameter set is updated frame by frame. At the same time, the detection and estimation results of HMM are outputted. 4. The online likelihood function of HMM is proposed and based on the likelihood function. According to Newton iteration method, the first order recursive process of HMM parameter set is deduced, and the optimal updating of parameters is realized by frame by frame. 5. According to the statistical characteristics of power spectrum envelope, a method of constrained binary GMM/HMM model is proposed. The model remains stable when the speech is absent for a long time.
【学位授予单位】:北京理工大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TN912.3

【参考文献】

相关期刊论文 前10条

1 许春冬;战鸽;应冬文;李军锋;颜永红;;基于隐马尔可夫模型的非监督噪声功率谱估计[J];数据采集与处理;2015年02期

2 许春冬;夏日升;应冬文;李军锋;颜永红;;HMM-based noise estimator for speech enhancement[J];Journal of Beijing Institute of Technology;2014年04期

3 何玉文;鲍长春;夏丙寅;;基于AR-HMM在线能量调整的语音增强方法[J];电子学报;2014年10期

4 许春冬;夏日升;应冬文;李军锋;;面向语音增强的序贯隐马尔可夫模型时频语音存在概率估计[J];声学学报;2014年05期

5 肖佳林;赵聿晴;王英;;基于HMM与SVM的语音活动检测[J];计算机工程;2014年01期

6 周建英;吴小培;张超;吕钊;;基于滑动窗的混合高斯模型运动目标检测方法[J];电子与信息学报;2013年07期

7 司华建;李辉;陈冠华;方昕;;最大后验概率自适应方法在口令识别中的应用[J];计算机工程与应用;2013年12期

8 梁岩;鲍长春;夏丙寅;何玉文;周璇;李娜;;基于高斯混合模型的压缩域语音增强方法[J];电子学报;2012年10期

9 张敏;曾晓辉;;基于优选信息熵的语音端点检测方法[J];计算机工程;2012年19期

10 SON Young-ho;LEE Sang-min;;Improved speech absence probability estimation based on environmental noise classification[J];Journal of Central South University;2012年09期



本文编号:1700075

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/1700075.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户d7bb3***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com