语音与音频信号的通用编码方法研究

发布时间：2018-05-22 13:56

本文选题：语音编码 + 音频编码　；参考：《北京工业大学》2014年博士论文

【摘要】：随着网络通信、移动通信和多媒体技术的快速发展，不同网络、系统和服务平台之间的相互融合已经成为一种必然。在这一趋势下，通信与娱乐之间已不再具有明显的界限，人们已经不满足于单一的语音通信需求，更希望享受兼容语音与音频的通信服务所带来的愉悦。但是，传统语音与音频编码由于算法模型的限制，无法同时对语音、音频及其混合信号取得理想的编码效果，从而限制了移动多媒体技术的进一步发展。基于此背景，运动图像专家组(Moving Picture Expert Group, MPEG)提出了构建语音与音频通用编码器的倡议。尝试利用统一的编码模型，实现对语音、音频及其混合信号的通用编码，以克服传统语音和音频编码器仅适合处理单一类型信号的弊端。因此，该倡议一经提出就成为语音频编码研究的热点问题，目前多家研究机构均参与了对通用编码算法的研究。针对这一问题，本文对现有语音和音频编码技术展开深入研究，从语音和音频信号共有的谐波特征出发，提出了两种通用编码框架，并最终在24kbps和32kbps码率下实现了对宽带语音和音频信号的通用编码。本文的主要成果体现为如下几个方面： 1.本文基于信号特征成分分离的思想，通过发掘语音和音频信号共有的谐波特性来搭建通用编码框架。该框架抛开现有通用编码技术基于类型判别和选择的编码机制，利用统一模型对输入信号进行分析，，通过保持量化前后信号概率密度分布的一致性实现通用编码，有效地解决了现有通用编码器过分依赖信号类型判别和对混合信号量化机制选择不合理等缺点和不足； 2.本文将经验模态分解算法(Empirical Mode Decomposition, EMD)引入语音与音频编码领域，基于输入信号本征模态函数的感知重要性和周期性特征，利用EMD分解的自适应滤波特性，提出了一种基于信号特征的谐波分离算法，通过提取输入信号的谐波成分，提高了正弦模型参数估计的准确性； 3.提出了一种基于谐波分离的正弦参数通用编码算法，该算法采用混合编码的方式对输入信号的不同特征成分进行分别编码，以发挥参数编码和变换编码的不同优势，从而达到系统的整体最优。对于谐波成分，本文采用基于感知梯度加权的匹配追踪算法进行正弦参数建模和多分辨率量化编码；对于非谐波成分，本文提出了一种基于RE8格的抖动格型矢量量化方法，使得量化噪声表现为独立于原始信号的高斯白噪声，从而提升了合成信号的主观感知质量； 4.为了提升所提正弦参数通用编码算法对语音信号的编码质量，本文将基频同步分析技术与功率谱保持量化相结合，提出了一种基于基频同步的语音量化方法。该算法利用输入信号的基频信息，将输入信号规整为具有固定周期的规整信号，并对规整后的周期信号进行稀疏变换，通过能量集中的方式实现对浊音语音调制变换系数的稀疏化，从而提升了编码器对语音信号的压缩效率； 5.在原有基频同步分析算法基础上，提出了一种基于能量加权归一化互相关的自适应分析窗长判决方法，使其能够实现对语音、音频及其混合信号的统一分析，并与概率分布保持量化技术相结合，搭建了一种基于概率分布保持的语音与音频通用编码算法，该算法以变换域编码为基础，通过保持编码前后信号间概率分布特征的一致性，实现了对语音和音频信号的通用编码。最终测试表明，所提算法对宽带语音和音频信号的编码质量，均优于AMR-WB和ITU-T G.722.1编码标准。
[Abstract]:With the rapid development of network communication, mobile communication and multimedia technology, the integration of different networks, systems and service platforms has become a necessity. In this trend, communication and entertainment no longer have obvious boundaries. People are not satisfied with the single one voice communication needs, and more want to enjoy the compatible voice and the voice. However, the traditional voice and audio coding, due to the limitation of the algorithm model, can not simultaneously achieve the ideal coding effect on speech, audio and its mixed signals, thus restricting the further development of mobile multimedia technology.
Based on this background, the Moving Picture Expert Group (MPEG) proposed the initiative to build a universal audio and audio encoder. A unified coding model is used to implement the universal coding of voice, audio and mixed signals, so as to overcome the disadvantages of traditional voice and audio encoders which are only suitable for single type signals. Therefore, this initiative has become a hot issue in the study of speech and audio coding. Many research institutes have participated in the research of universal coding algorithm.
In order to solve this problem, the present speech and audio coding techniques are studied in depth. From the harmonic characteristics of voice and audio signals, two common coding frameworks are proposed. At the end of the 24kbps and 32kbps code rates, the universal coding of wide-band voice and audio signals is realized.
The main achievements of this paper are as follows:
1. based on the idea of signal feature component separation, this paper builds a universal coding framework by exploring the common harmonic characteristics of speech and audio signals. The framework is free from the existing universal coding technology based on type discrimination and selection, and uses a unified model to analyze the input signal, and to keep the signal probability density before and after quantization. The consistency of the degree distribution is realized by universal coding, which effectively solves the shortcomings and shortcomings of the existing universal encoders, which are too dependent on the discrimination of signal types and the improper selection of the mixed signal quantization mechanism.
2. in this paper, the Empirical Mode Decomposition (EMD) is introduced into the field of speech and audio coding. Based on the perceptual importance and periodic characteristics of the input signal eigenmode function, a harmonic separation algorithm based on the signal character feature is proposed by using the adaptive filtering characteristics of the EMD decomposition. Harmonic components improve the accuracy of sinusoidal model parameter estimation.
3. a universal coding algorithm for sinusoidal parameters based on harmonic separation is proposed. The algorithm uses a mixed coding method to encode the different features of the input signal separately in order to give play to the different advantages of the parameter coding and transform coding, thus achieving the overall optimal of the system. The weight matching tracking algorithm is used for Sinusoidal Parameter Modeling and multi-resolution quantization coding. For non harmonic components, a jitter vector quantization method based on RE8 lattice is proposed, which makes the quantization noise as the Gauss white noise independent of the original signal, thus improving the subjective perceptual quality of the synthetic signal.
4. in order to improve the quality of the speech signal encoding the universal coding algorithm of the sinusoidal parameters, this paper combines the basic frequency synchronization analysis technique with the power spectrum keeping quantization, and proposes a speech quantization method based on the fundamental frequency synchronization. The algorithm regularization of the input signal into a regular cycle of regular pattern using the basic frequency information of the input signal. The signal is sparsely transformed by the regular periodic signal, and the modulation conversion coefficient of voiced speech is sparse by the way of energy concentration, thus the compression efficiency of the speech signal is improved.
5. based on the original basic frequency synchronization analysis algorithm, an adaptive analysis window length decision method based on energy weighted normalization and cross correlation is proposed. It can realize the unified analysis of speech, audio and its mixed signals, and combine with the probability distribution preserving quantization technique, and build a kind of speech based on probability distribution. The universal audio coding algorithm, based on the transform domain encoding, achieves universal coding for speech and audio signals by keeping the consistency of the probability distribution characteristics between the signals before and after the coding. The final test shows that the coding quality of the proposed algorithm for wideband speech and audio signals is superior to the AMR-WB and ITU-T G.722.1 coding standards.
【学位授予单位】：北京工业大学
【学位级别】：博士
【学位授予年份】：2014
【分类号】：TN912.3

【参考文献】