心理声学研究及其在语音增强上的应用

发布时间：2018-03-04 10:03

本文选题：波动强度　切入点：心理声学掩蔽　出处：《南京师范大学》2017年硕士论文　论文类型：学位论文

【摘要】：心理声学主要研究人耳对声音信号的处理过程,建立声音的物理性质与其在听觉产生的主观感觉之间的联系,从而得到相应的心理声学模型,最终在科学研究和工程实践中加以应用。本文主要研究心理声学的掩蔽效应、频带划分以及参量模型,并给出了一种新的计算心理声学波动强度的参量模型。在此基础上,本文将心理声学掩蔽效应以及频带划分应用到了单通道语音增强上,给出了两种基于心理声学的单通道语音增强改进算法:基于心理声学掩蔽的单通道语音增强改进算法以及基于心理声学等矩形带宽(ERB)尺度划分的多子带谱减改进算法。心理声学波动强度与粗糙度一样,是一种基本的心理声学听觉感知。一个重要的区别在于波动强度反映了声音信号更慢的振幅波动。本文给出了一种新的基于ERB的波动强度计算模型。通过在ERB尺度上划分75个滤波通道,并且将每个通道的广义调制度(GMD)加权、滤波以及求和得到最终的波动强度值。新算法给出了一种新的将每个ERB中的GMD转变成特性波动强度的方式。另外,新算法给出了用ERB尺度替代Bark尺度。比较基于ERB尺度划分的新算法和基于Bark尺度划分的新算法可以发现,基于ERB尺度划分的新算法降低RMSE值到73%并且提高相关系数值到17%。另外,本文还给出了关于为何在最终计算特性波动强度时需要加权的原因。实验仿真结果表明,与已有的Zwicker波动强度模型[1]相比较,本文的波动强度模型能够降低RMSE值超过90%并且提高相关系数值高达23%。因此,本文的波动强度模型所计算得到的结果与主观测试结果更加吻合。在心理声学理论研究的基础上,本文给出了基于听觉掩蔽效应和频带划分的两种单通道语音增强改进算法。本文首先实现了 Virag提出的基于心理声学掩蔽的单通道语音增强算法[2],然后通过改进Virag算法中的频域掩蔽模型并引入时域掩蔽模型,得到改进后的算法。在改进的算法中,根据频域掩蔽模型和时域掩蔽模型计算得到噪声掩蔽阈,进而调节谱减参数以得到噪声降低、语音失真和残留音乐噪声这三者在人耳听觉上的最好折中。另外,本文还给出了一个基于心理声学ERB尺度划分多子带谱减改进算法,即根据ERB尺度先将带噪信号的频谱划分成多个子带,然后在不同的子带中分别进行单通道语音增强算法处理。仿真结果表明,改进的算法在客观评价指标和主观听音测试方面均优于用于比较的单通道语音增强算法。
[Abstract]:Psychoacoustics mainly studies the processing process of sound signal in human ear, and establishes the relationship between the physical properties of sound and the subjective feeling produced by hearing, and obtains the corresponding psychoacoustical model. Finally, it is applied in scientific research and engineering practice. This paper mainly studies the masking effect, frequency band division and parametric model of psychoacoustics, and gives a new parametric model to calculate the intensity of psychoacoustics fluctuation. In this paper, the psychoacoustic masking effect and band division are applied to single channel speech enhancement. In this paper, we present two improved single-channel speech enhancement algorithms based on psychoacoustics: an improved speech enhancement algorithm based on psychoacoustics masking and an improved multi-subband spectral subtraction based on the scale partition of rectangular bandwidth (ERB) of psychoacoustics. Algorithm. The intensity of psychoacoustic fluctuation is the same as roughness. An important difference is that the wave intensity reflects the slower amplitude fluctuation of sound signal. This paper presents a new calculation model of wave intensity based on ERB. Dividing 75 filtering channels in degrees, And the generalized modulation scheme of each channel is weighted, filtered and summed to obtain the final wave intensity value. The new algorithm gives a new way to transform the GMD in each ERB into characteristic fluctuation intensity. Compared with the new algorithm based on ERB scale and Bark scale partition, it is found that the new algorithm based on ERB scale partition reduces RMSE value to 73% and increases correlation coefficient value to 17%. In this paper, the reasons why we need weighting in the final calculation of characteristic fluctuation intensity are also given. The experimental results show that compared with the existing Zwicker wave intensity model [1], The wave intensity model in this paper can reduce the RMSE value by more than 90% and increase the correlation coefficient by 23. Therefore, the results calculated by the volatility intensity model in this paper are in better agreement with the subjective test results. In this paper, we present two improved single-channel speech enhancement algorithms based on auditory masking effect and band division. Firstly, we implement a single channel speech enhancement algorithm based on psychoacoustic masking proposed by Virag [2]. The frequency domain masking model in Virag algorithm and the time domain masking model are introduced. In the improved algorithm, the noise masking threshold is calculated according to the frequency-domain masking model and the time-domain masking model, and then the spectral subtraction parameters are adjusted to obtain the noise reduction. Speech distortion and residual music noise are the best compromise in human hearing. In addition, an improved algorithm based on psychoacoustics ERB scale is presented to subtract the multi-subband spectrum. According to the ERB scale, the spectrum of the noisy signal is divided into several sub-bands, and then the single-channel speech enhancement algorithm is processed in different sub-bands. The simulation results show that, The improved algorithm is superior to the single channel speech enhancement algorithm in objective evaluation index and subjective audition test.
【学位授予单位】：南京师范大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN912.3

【相似文献】