人机交互中的声源定位与增强方法研究

发布时间：2019-03-27 09:40

【摘要】：语音是人机交互中最自然的方式，既不需要接触或佩戴数据设备，也不存在视觉盲点。在基于语音的人机交互系统中，由于噪声的影响，，特别是交互环境中其他无关说话人语音的干扰，严重降低了交互系统的性能。本文对人机交互系统语音信号信噪比的提高展开研究。交互目标声源的定位是基于麦克风阵列的多通道语音增强法的关键，本文采用基于时延估计的声源定位方法。针对信号时延估计问题，采用先通过适当阈值过滤噪声再做相关处理的方式，提出一种基于阈值判决的声达时延差估计方法。仿真实验表明该方法优于广义互相关法，为进一步目标声源的空间定位提供更加准确的时延参数。为更好地模拟实际声源所在的空间场景，基于麦克风线性均匀阵列，采用双阵列空间三维定位的方法，提出了一种由六个麦克风构成的平行均匀线阵接收模型。结合基于阈值判决的声达时延差估计方法实现目标声源的三维定位。在目标声源的定位基础上，通过波束形成法来增强目标语音。并对固定波束形成法中各通道的权重设置提出改进方案，更好地实现目标语音的增强。本文通过MATLAB对所提出的算法进行了详细地仿真实验，结果表明环境信噪比大于1.5dB时，目标声源的定位精度即可达到98%以上，信噪比达到5dB左右的改善。同时算法使用的麦克风数较少，原理简单、易于硬件实现。
[Abstract]:Voice is the most natural way of human-computer interaction, neither contact or wear data devices, there is no visual blind spots. In the speech-based human-computer interactive system, the performance of the interactive system is seriously reduced due to the influence of noise, especially the interference of other unrelated speakers in the interactive environment. In this paper, the improvement of signal-to-noise ratio of speech signal in man-machine interaction system is studied. The key of multi-channel speech enhancement method based on microphone array is to locate the source of interactive target. In this paper, the sound source location method based on time delay estimation is adopted. In order to solve the problem of signal time delay estimation, a method based on threshold decision is proposed to estimate the time delay of sound arrival by filtering noise by appropriate threshold and then doing correlation processing. The simulation results show that the proposed method is superior to the generalized cross-correlation method and provides more accurate time-delay parameters for further spatial localization of target sound sources. In order to better simulate the spatial scene where the actual sound source is located, a parallel uniform linear array receiving model composed of six microphones is proposed based on the McPair linear uniform array and the two-array spatial three-dimensional positioning method. Combined with threshold decision-based acoustic arrival delay estimation method, three-dimensional localization of target sound source is realized. Based on the localization of target sound source, the target speech is enhanced by beamforming method. An improved scheme is proposed to improve the weight setting of each channel in the fixed beamforming method, so that the target speech enhancement can be achieved better. In this paper, the proposed algorithm is simulated by MATLAB in detail. The results show that when the SNR of the environment is greater than 1.5dB, the positioning accuracy of target sound source can reach above 98%, and the signal-to-noise ratio of the target can be improved to about 5dB. At the same time, the algorithm uses fewer McLead numbers, simple principle, and easy to implement with hardware.
【学位授予单位】：华南理工大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TN912.3

【参考文献】