语音非线性特性分析及其应用
发布时间:2018-07-01 13:31
本文选题:语音 + 非线性分析与处理 ; 参考:《南京大学》2014年博士论文
【摘要】:语言的声学表现形式——语音一直是人们探究的对象。空气动力学研究表明语音产生过程是非线性的。通过语音信号的非线性动力学特性研究以及语音信号的非线性处理,我们基本了解了语音信号的分形维、Lyapunov指数等“宏观”的非线性特征。但语音是短时非平稳信号。那些基于信号平稳、数据量足够多之假设所得到的分析结果,不能准确细致地刻划语音信号的非线性特征,特别是一些时域或其子空间的微结构特征。语音的非线性分析与非线性信号处理正转向精细结构特性的分析。因此,本文围绕语音的时域及分解子空间的非线性微结构开展研究。这既是认识语音的需要,也是目前电子技术、信号处理技术及计算机科学高度发展的条件下,更有效地应用语音信号处理技术的需要。语音信号的声学原理是研究的基础。首先根据音素的发声机理,讨论浊音的声门振荡模式、清音的湍流声源模式和交互作用模式这三种不同的非线性模式。然后回顾总结已知的语音信号非线性特性。在语音信号分析模型方面,介绍了语音的线性预测模型(Linear Prediction,LP)、非线性回归模型及非线性振子模型,从非线性振子的动力学方程导出了一阶和二阶的局部近似模型,研究了这些模型与LP模型、非线性回归模型之间的关系。这使得由非线性回归模型导出的局部线性预测模型(Local Linear Prediction, LLP)和二阶Volterra模型有了语音声学解释。波形随幅度变化,这是非线性信号的一个特点。语音音素含有振幅时变的起始和结束部分。递归图分析方法是一种适用于短时非平稳信号的图形分析方法。用这种方法分析元音及鼻音信号的起始和结束等暂态部分的特性。这有益于提高那些基于相点距离的非线性分析方法。为了更细致分析语音起始和结束部分的递归特性,我们提出了一种多级阂值递归图的递推方法。这种方法的计算复杂性低于原递归图分析算法。通过分析状态演化进程,提出一种部分自适应多步局部线性预测算法(Partially Adaptive Multi-step Local Linear Prediction, paLLP),并且分析了算法的精度和计算复杂性。和已有的两种非线性递推预测算法比较表明,这种算法有理想的预测精度。而计算复杂性分析表明,这种算法计算量远低于LLP算法。在实验中,以Lorenz混沌序列验证算法的可行性、精度、计算复杂性及抗干扰能力。对元音和鼻音信号的比较性实验结果则表明,在语音的非线性预测中,paLLP算法是一种高效的、高精度算法。和LP算法相比,paLLP算法不仅精度高,而且预测残差中周期性大大减小,这将有益于基于paLLP的码本激励编解码中码书性能的提高。受LD-CELP的启发,我们设计出一种基于paLLP算法的A-B-S(Analysis-by-Synthesis)语音编解码器,介绍了这种编码器的实施原理。作为非线性非平稳信号分析方法,经验模式分解(Empirical Mode Decompo-sition, EMD)也应用于语音信号处理中。EMD的应用使得语音信号的分析可以在其本征模态函数(Intrinsic Mode Function, IMF)子空间中进行,但很多应用中只是直观地选择部分IMF作为后续处理的对象。为了合理选择和应用IMF,本文分析了IMF的非线性特性。由于原始EMD算法筛分过程不稳定,分析中应用加窗平均经验模式分解(Windowed Average-EMD, WA-EMD)方法作语音信号分解。通过预先指定一组期望频率,用WA-EMD算法将语音信号稳定地分解为一组指定个数的IMF。通过估计IMF功率谱的Hurst指数,区分出包含原语音中重要信息最多的IMF。用高阶奇异谱分方法分析各IMF的嵌入维信息。结果表明,除了少数高频IMF,其它的IMF嵌入维都低于原语音信号的嵌入维。最后估计各元音所有IMF的三阶谱和归一化三阶谱,分析IMF的非线性。实验结果表明,包含原语音中信息最多的IMF基本上是线性的。这将简化诸如语音瞬时基音频率的估计等语音处理。本文的研究成果让我们更加深入地认识语音信号的非线性特性,提高语音信号的非线性处理性能。
[Abstract]:The acoustical expression of language - speech has always been the object of inquiry. Aerodynamics research shows that the process of speech production is nonlinear. Through the study of the nonlinear dynamic characteristics of the speech signal and the nonlinear processing of the speech signal, we basically understand the fractal dimension of the speech signal, the Lyapunov exponent and so on. But speech is a short-time nonstationary signal. The analysis results based on the assumption that the signal is stable and the amount of data are sufficient, can not accurately and meticulously depict the nonlinear characteristics of the speech signal, especially some time domain or its subspace microstructural features. The nonlinear analysis of speech and the positive steering of the nonlinear signal processing. The analysis of the fine structure characteristics. Therefore, this paper studies the nonlinear microstructures of the speech time domain and the decomposed subspace. This is not only the need of speech recognition, but also the need of the current electronic technology, signal processing technology and the high development of computer science. It is more effective to use speech signal processing technology. The principle of acoustics is the basis of research. Firstly, according to the sound mechanism of phoneme, the glottal oscillation mode of voiced sound, the turbulent sound source mode of the voiceless sound and the interaction mode are three different nonlinear modes. Then the nonlinear characteristics of the known speech signal are reviewed and summarized. The linear Preview of speech signal analysis model is introduced. Linear Prediction (LP), nonlinear regression model and nonlinear oscillator model, the first and two order local approximation models are derived from the dynamic equations of nonlinear oscillator, and the relationship between these models and the LP model and the nonlinear regression model is studied. This makes the local linear prediction model derived from the nonlinear regression model (Lo). Cal Linear Prediction, LLP) and the two order Volterra model have acoustic acoustic interpretation. The waveform varies with amplitude. This is a characteristic of nonlinear signals. The phoneme contains the starting and ending parts of the amplitude time variation. The recursive graph analysis method is a graphical analysis method for short time nonstationary signals. The characteristics of the transient parts, such as the beginning and end of the sound signal, are beneficial to improving the nonlinear analysis methods based on the phase point distance. In order to more carefully analyze the recursion characteristics of the speech start and end parts, we propose a recursive method of multilevel threshold recursion. The computational complexity of this method is lower than the original recursive graph. By analyzing the process of state evolution, a partial adaptive multi step local linear prediction algorithm (Partially Adaptive Multi-step Local Linear Prediction, paLLP) is proposed, and the accuracy and computational complexity of the algorithm are analyzed. Compared with the two existing nonlinear recursive prediction algorithms, this algorithm has an ideal prediction. The computational complexity analysis shows that the computational complexity of the algorithm is far lower than the LLP algorithm. In the experiment, the feasibility, accuracy, complexity and anti-interference ability of the algorithm are verified by Lorenz chaotic sequence. The comparative experimental results on vowel and nasal sound signals show that the paLLP algorithm is efficient in the nonlinear prediction of speech. The high precision algorithm. Compared with the LP algorithm, the paLLP algorithm not only has high precision, but also greatly reduces the periodicity in the prediction residual. This will be beneficial to the improvement of codebook performance in codebook based on paLLP. Inspired by LD-CELP, we design a A-B-S (Analysis-by-Synthesis) speech codec based on paLLP algorithm, which is introduced in this paper. The implementation principle of the seed encoder. As a nonlinear non-stationary signal analysis method, Empirical Mode Decompo-sition (EMD) is also applied to the application of.EMD in speech signal processing so that the analysis of speech signal can be carried out in its eigenmode function (Intrinsic Mode Function, IMF) subspace, but in many applications, only a number of applications are used. In order to choose and apply IMF, the nonlinear characteristics of IMF are analyzed in order to select and apply the IMF. Because the screening process of the original EMD algorithm is unstable, the Windowed Average-EMD (WA-EMD) square method is used to decompose the speech signal in the analysis. A set of expectations is given in advance by specifying a set of expectations. Frequency, the WA-EMD algorithm is used to decompose the speech signal steadily into a set of specified number of IMF. by estimating the Hurst exponent of the IMF power spectrum, and differentiating the IMF. which contains the most important information in the original speech to analyze the embedded dimension information of each IMF by the high order singular spectral method. The result shows that the other IMF embedding dimensions are lower than the original one, except for a few high frequency IMF. The embedded dimension of the speech signal. Finally, the three order spectrum and the normalized three order spectrum of all the vowels are estimated and the nonlinearity of the IMF is analyzed. The experimental results show that the IMF containing the most information in the original speech is basically linear. This will simplify the speech theory such as the estimation of the instantaneous pitch frequency of the speech. The results of this paper make us more thorough. We should recognize the nonlinear characteristics of speech signals and improve the nonlinear processing performance of speech signals.
【学位授予单位】:南京大学
【学位级别】:博士
【学位授予年份】:2014
【分类号】:TN912.3
【参考文献】
相关期刊论文 前1条
1 孟庆芳;彭玉华;曲怀敬;韩民;;基于信息准则的局域预测法邻近点的选取方法[J];物理学报;2008年03期
,本文编号:2087945
本文链接:https://www.wllwen.com/kejilunwen/wltx/2087945.html