基于机器学习的双麦克风手机语音增强算法研究

发布时间：2018-06-15 19:27

本文选题：神经网络 + 手机　；参考：《南京师范大学》2017年博士论文

【摘要】：手机作为目前市场最大,消费人群最广的便携式移动通讯设备,其通话质量的改善一直以来受到了广泛的关注。由于使用场合很广,需要应对的背景噪声环境也十分复杂,这就要求应用于手机平台上的消噪算法可以灵活地应对多种噪声,在保证语音通话质量的前提下,对背景噪声进行有效抑制,而且算法的性能不会因使用者握机姿势的不同或通话过程中手机的转动而下降,对真实环境具有良好的鲁棒性。近年来人工智能的应用已逐步覆盖各个领域,机器学习作为其核心,强调在不断的数据学习中改善算法的性能,这种特性使得机器学习相关算法(如神经网络)能够灵活应对复杂而多变的外部环境,如果能将机器学习应用于手机消噪算法中一定会显著提升算法在真实场景下的性能,然而相关研究工作却并不多。本文尝试将机器学习中的神经网络模型应用于手机消噪算法中,并针对消噪算法的各个部分进行了改进,提高了算法在真实使用场景下的灵活性和鲁棒性。全文工作及创新点主要包含下列几个方面:(1)针对现有的双通道VAD算法依赖于固定阈值难以在多种不同的噪声环境下准确地检测语音和噪声等问题。论文第二章结合神经网络提出了一种新的双通道VAD算法,该算法以分频带能量差和归一化互通道相关作为两类新的特征,采用神经网络对语音和噪声进行分类,不依赖于固定的阈值,可以灵活应对复杂而多变的噪声环境,较现有的基于互通道能量差及其改进的VAD算法准确性更高。(2)论文的第三章利用了手机两个麦克风接收带噪语音信号功率的比值在噪声段和语音段的不同,提出一种新的基于互通道功率比值的VAD算法,在此基础上,将第二章的神经网络VAD算法与基于互通道功率比值的VAD算法相结合,最终得到一种适用于手机消噪处理中的语音和噪声活动检测算法,该算法能够分别针对语音和噪声进行准确的检测,使用检测结果控制时域语音增强算法对带噪语音信号进行消噪处理,在滤除噪声的同时能够显著降低对语音信号造成的损伤,提高语音的可懂度,特别是对方向性的语音干扰也能够有很好的抑制效果。(3)为了进一步滤除第三章时域语音增强处理后残留的线性不相关噪声,论文的第四章将时域输出的增强语音信号和背景噪声信号转化到频域进行进一步的消噪处理,并分别针对消噪算法中两个重要的组成部分:噪声估计和噪声消除做了改进。首先结合单、双麦克风的噪声估计算法,提高了噪声估计的准确性,然后将基音检测与消噪处理相结合,在语音帧中估计语音基音频率确定语音和噪声频率点,针对语音和噪声频率点分别调整维纳滤波器的参数,在对噪声进行滤除的同时尽可能地保留语音频点,从而减少了语音失真。实验结果表明,与现有的双麦克风消噪算法相比,经过改进后的频域消噪算法能够更有效地减少对语音信号造成的损害,提高了手机的通话质量。(4)使用者握机姿势的不同或通话过程中手机的转动会对消噪算法的性能产生影响,如果能够实时确定手机的位置,并依据当前位置及时调整消噪算法的参数则能够提高算法的性能。现有的定位算法大多需要三个以上的麦克风阵列,无法直接用于双麦克风的手机上。论文第五章结合手机这一特定的应用场景提出了一种只使用两个麦克风在三维空间中定位手机位置的新方法,该方法使用互通道时延和通过对目标语音到达两个麦克风的传播路径进行分析提出的新特征子带互通道功率比作为输入,训练神经网络输出手机的空间位置。(5)当检测到手机偏离标准通话位置时,依据第五章神经网络定位的结果及时地对论文第三和第四章中的时域和频域消噪算法的参数进行调整,避免了算法因手机位置的移动而造成的通话性能下降。实验结果表明,现有的双麦克风消噪算法由于忽略了手机转动的问题,在真实场景下的性能无法得到保障,而本论文提出的消噪算法性能更加稳定也更具有实用性。论文的结尾概括了全文的主要工作和创新性的研究成果,并对进一步的研究进行了展望。
[Abstract]:Mobile phone, the largest portable mobile communication device in the market and the largest consumer in the market, has been widely concerned about the improvement of call quality. Because of the wide use of the mobile phone, the background noise environment that needs to be dealt with is very complex. This requires that the denoising algorithm applied to the flat platform of the mobile phone can be flexible to deal with many kinds of noise. On the premise of guaranteeing the quality of voice calls, the background noise is effectively suppressed, and the performance of the algorithm will not decline because of the different positions of the user and the rotation of the mobile phone during the call process. It has good robustness to the real environment. In recent years, the application of artificial intelligence has been gradually covered in various fields, and machine learning is used as its application. The core is to improve the performance of the algorithm in continuous data learning. This feature makes the machine learning related algorithms (such as neural networks) flexible to cope with complex and changeable external environments. If the machine learning is applied to the mobile phone denoising algorithm, the performance of the algorithm will be significantly improved in the real scene. This paper tries to apply the neural network model in machine learning to the algorithm of mobile phone noise elimination, and improves the flexibility and robustness of the algorithm in the real use scene. The main package of full text work and innovation includes the following aspects: (1) for the existing dual channel In the second chapter, a new dual channel VAD algorithm is proposed in the second chapter of the paper. The second chapter combines the energy difference of the frequency band and the normalized cross channel correlation as two new features, and the neural network is used for speech and noise. The classification of sound is not dependent on the fixed threshold, and it can handle complex and changeable noise environment flexibly. The VAD algorithm based on the existing mutual channel energy difference and its improved algorithm is more accurate. (2) the third chapter of the paper uses the difference of the ratio of the power of the noisy speech signals received by the two microphone of the mobile phone, and the difference between the noise and the speech segments is proposed. A new VAD algorithm based on the ratio of mutual channel power is proposed. On this basis, the second chapter neural network VAD algorithm is combined with the VAD algorithm based on the power ratio of mutual channel. Finally, a speech and noise detection algorithm suitable for mobile phone noise elimination can be obtained. The algorithm can be used to correct speech and noise respectively. Detection, using the detection results to control the time domain speech enhancement algorithm to denoise the noisy speech signal. While filtering the noise, it can significantly reduce the damage to the speech signal and improve the intelligibility of the speech, especially for the directional speech interference. (3) in order to further filter the third chapters The fourth chapter of this paper transforms the enhanced speech signal and background noise signal in the time domain to the frequency domain for further de-noising. The two important components of the denoising algorithm: noise estimation and noise elimination are improved. First, single, double Mike is combined. The algorithm of wind noise estimation improves the accuracy of noise estimation. Then the pitch detection and noise elimination are combined. The speech and noise frequency points are estimated in the speech frame, and the parameters of the Wiener filter are adjusted to the speech and noise frequency points. While the noise is filtered, the speech is preserved as much as possible. The experimental results show that compared with the existing double microphone denoising algorithm, the improved frequency domain denoising algorithm can reduce the damage to the speech signal more effectively and improve the call quality of the mobile phone. (4) the rotation of the mobile phone in the different position of the user's grip or the call process will eliminate the noise. The performance of the algorithm has an impact. If it can determine the location of the mobile phone in real time and adjust the parameters of the denoising algorithm in time according to the current position, the algorithm can improve the performance of the algorithm. Most of the existing location algorithms need more than three microphone arrays and can not be used directly on the two microphone mobile phones. The fifth chapter of the paper combines with the specific mobile phone. In the application scenario, a new method of locating the mobile phone in a three-dimensional space with only two microphones is used. This method uses the mutual channel time delay and the new characteristic subband power ratio as input by analyzing the propagation path of the target speech to two microphones, and trains the space of the neural network to output the cell phone space. Position. (5) when the mobile phone is detected to deviate from the standard call position, the parameters of the time domain and frequency domain denoising algorithm in the third and fourth chapters of the paper are adjusted in time according to the results of the fifth chapter neural network positioning, which avoids the call performance degradation caused by the mobile location of the mobile phone. The experimental results show that the existing dual microphone is used. Because of ignoring the problem of mobile phone rotation, the performance of the noise elimination algorithm can not be guaranteed in the real scene, and the performance of the denoising algorithm proposed in this paper is more stable and more practical. The end of this paper summarizes the main work and innovative research results of the full text, and looks forward to the further research.
【学位授予单位】：南京师范大学
【学位级别】：博士
【学位授予年份】：2017
【分类号】：TN912.3;TP181

【参考文献】