基于计算听觉场景分析的混合语音分离

发布时间：2019-01-27 10:49

【摘要】：基于计算听觉场景分析的语音分离技术在人工智能、机器感知和自动语音分离等领域有着非常广泛的应用,逐渐成为人们研究的热点,尤其是噪声环境下的语音信号分离最为困难。本文在基于计算听觉场景分析的理论上,对噪声环境下的混合语音信号分离进行了研究,主要针对原有的利用双耳时间差和双耳强度差作为语音分离线索的混合语音分离系统存在的问题进行了深入的研究与改进。首先,本文提出了一种将基音周期性特征与双耳时间差和双耳强度差特征相结合的分离算法,并设计了双重掩蔽模型。改进后的算法利用了两种语音分离线索,从两个不同的角度对混合语音信号进行分析处理,并经过双重掩蔽来实现对目标语音的纯净分离。其次,针对原有系统存在的掩蔽干扰声音不彻底的问题,本文加入了以基音周期性特征作为语音分离线索的分离方法,同时设计出合理的初次掩蔽模型,将混合语音中的噪声和杂音去除,并结合后续的二次掩蔽模型,达到了掩蔽更全面,去除干扰声音更彻底的效果。再次,针对原有系统存在的对于相对时延较大的一路语音不能精确分离的问题,本文在基于双耳时间差和双耳强度差特征进行混合语音分离的部分,对二次掩蔽模型进行了重新的定义与改进,使系统分离目标更加明确,能够精确分离任意一路语音信号。最后,通过大量实验对改进后的系统进行了性能评估,并且与原有语音分离系统进行了分析比较,能够明显的体现改进系统的有效性和优越性。改进后的混合语音分离系统对语音与噪声的分离、混叠语音的分离都是有效的,分离效果也有明显提高。
[Abstract]:Speech separation technology based on computational auditory scene analysis has been widely used in artificial intelligence, machine perception and automatic speech separation, and has gradually become a hot research topic. Especially in noisy environment, speech signal separation is the most difficult. In this paper, based on the theory of computational auditory scene analysis, the mixed speech signal separation in noisy environment is studied. The existing problems of hybrid speech separation system using binaural time difference and binaural intensity difference as cues of speech separation are studied and improved deeply. Firstly, this paper proposes a separation algorithm which combines pitch periodicity with binaural time difference and binaural intensity difference, and designs a double masking model. The improved algorithm uses two kinds of speech separation cues to analyze and process the mixed speech signal from two different angles and realizes the pure separation of the target speech by double masking. Secondly, aiming at the problem of incomplete masking interference in the original system, this paper adds the pitch periodic feature as the separation method of speech separation clue, and designs a reasonable initial masking model. The noise and noise in the mixed speech are removed and combined with the subsequent secondary masking model to achieve a more comprehensive masking effect and a more thorough removal of the interference sound. Thirdly, aiming at the problem that the original system can not be separated accurately for a group of speech with relatively long time delay, this paper presents a method of mixed speech separation based on the features of binaural time difference and binaural intensity difference. The quadratic masking model is redefined and improved to make the separation target more clear and to separate any speech signal accurately. Finally, through a large number of experiments, the performance of the improved system is evaluated, and compared with the original speech separation system, it can obviously reflect the effectiveness and superiority of the improved system. The improved mixed speech separation system is effective for the separation of speech and noise, and the separation effect is also improved.
【学位授予单位】：燕山大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TN912.3

【相似文献】