基于MFCC字典和SL0算法的语音压缩感知研究

发布时间：2018-05-08 04:15

本文选题：语音信号 + 压缩感知　；参考：《南京邮电大学》2017年硕士论文

【摘要】：压缩感知理论框架下,采样率低于传统奈奎斯特采样定理,实现了压缩和采样的同步进行,同时用较少的观测值实现信号高质量重构。语音信号在频域和离散余弦变换域等都具有良好的稀疏特性,满足压缩感知的先验条件,因此可以基于压缩感知对语音信号进行处理。将压缩感知应用于语音信号处理,这对信号的采样、存储、传输等都带来了很大方便。将压缩感知理论用到语音信号中来探求语音处理的新方法具有很好的理论意义和实际价值。本文的研究目标是设计并优化语音压缩感知的稀疏分解基和重构算法,使得语音信号获得更高的重构质量、减少重构时间,奠定语音压缩感知在实际应用中的理论基础。论文主要对语音压缩感知中稀疏表示和重构算法部分进行了研究:提出了基于语音MFCC参数的过完备字典,提出了一种基于平滑L0算法的语音压缩重构模型,丰富了语音压缩感知理论。本文主要研究内容和创新成果包括:(1)介绍了压缩感知与传统奈奎斯特采样的区别与联系,分析了压缩感知的理论框架。详述了语音处理中的压缩感知应用,包括语音压缩感知中常用稀疏基、观测矩阵和重构算法。并实验验证了语音信号在DCT基、过完备DCT字典、K-SVD字典下的稀疏性,对比了语音信号基于不同稀疏基、观测矩阵和重构算法时的重构效果。实验结果表明,语音压缩感知中稀疏基、观测矩阵和重构算法的选取、以及语音帧长、压缩比对语音信号重构都会产生影响。(2)提出了基于语音MFCC参数的过完备字典构造方法。介绍了语音信号MFCC参数的提取过程,以及基于过完备MFCC字典的语音压缩感知的实现过程。实验证明了语音信号在过完备MFCC字典上具有稀疏性;在相同的训练语音数目和字典规模的情况下,相比于传统的K-SVD字典,过完备MFCC字典训练时间大大减少,使得字典训练更容易实现。这种优势在语料比较多的情况下更为明显。过完备MFCC字典应用于语音压缩感知中是可行的并具有重要意义。(3)提出了一种基于平滑L0算法的语音压缩重构模型。平滑L0算法是用平滑函数逼近L0范数,它不需要提前知道信号的稀疏度,具有计算量低、重构质量高等优点。此外,提出了一种新的平滑函数,并基于高斯函数和新的平滑函数来验证平滑L0算法在语音压缩重构中的优越性。实验结果证明,基于两种平滑函数的SL0算法,对语音信号进行重构时,性能均优于传统常用的OMP算法、BP算法等。并且,在压缩比高于0.4时,基于新的平滑函数的SL0重构模型的语音重构质量要高于使用标准高斯函数的SL0重构模型。
[Abstract]:In the frame of compressed sensing theory, the sampling rate is lower than that of the traditional Nyquist sampling theorem, which realizes the synchronization of compression and sampling, and the reconstruction of high quality signal with less observations. Speech signals have good sparseness in frequency domain and discrete cosine transform domain, which satisfy the prior condition of compression perception, so speech signals can be processed based on compression perception. Compression sensing is applied to speech signal processing, which brings great convenience to signal sampling, storage, transmission and so on. It is of great theoretical significance and practical value to apply the theory of compression perception to the speech signal to explore the new method of speech processing. The research goal of this paper is to design and optimize the sparse decomposition basis and reconstruction algorithm of speech compression perception, so that the speech signal can obtain higher reconstruction quality, reduce the reconstruction time, and lay the theoretical foundation of speech compression perception in practical application. In this paper, the sparse representation and reconstruction algorithm in speech compression perception is studied. An overcomplete dictionary based on speech MFCC parameters is proposed, and a speech compression and reconstruction model based on smooth L0 algorithm is proposed. It enriches the theory of speech compression perception. In this paper, the main research contents and innovative achievements include: (1) introduce the difference and relation between compressed sensing and traditional Nyquist sampling, and analyze the theoretical framework of compressed sensing. The application of compression sensing in speech processing is described in detail, including sparse basis, observation matrix and reconstruction algorithm. The sparsity of speech signal in DCT basis and over complete DCT dictionary K-SVD dictionary is verified experimentally. The reconstruction effect of speech signal based on different sparse basis, observation matrix and reconstruction algorithm is compared. The experimental results show that the sparse basis in speech compression perception, the selection of observation matrix and reconstruction algorithm, and the influence of speech frame length and compression ratio on speech signal reconstruction are all affected. (2) an overcomplete dictionary construction method based on speech MFCC parameters is proposed. The extraction process of speech signal MFCC parameters and the realization of speech compression perception based on overcomplete MFCC dictionary are introduced. The experiment proves that the speech signal is sparse in the over-complete MFCC dictionary, and the training time of the over-complete MFCC dictionary is greatly reduced compared with the traditional K-SVD dictionary with the same number of trained speech and the same size of the dictionary. Make dictionary training easier to implement. This advantage is more obvious in the case of more data. It is feasible and significant to apply overcomplete MFCC dictionary to speech compression perception. (3) A speech compression reconstruction model based on smooth L0 algorithm is proposed. Smoothing L0 algorithm approximates L0 norm by smoothing function. It does not need to know the sparse degree of signal in advance and has the advantages of low computation and high reconstruction quality. In addition, a new smoothing function is proposed, and the superiority of smooth L0 algorithm in speech compression reconstruction is verified based on Gao Si function and new smoothing function. The experimental results show that the performance of the SL0 algorithm based on two smoothing functions is better than that of the traditional OMP algorithm. Moreover, when the compression ratio is higher than 0.4, the SL0 reconstruction model based on the new smoothing function has better speech reconstruction quality than the SL0 reconstruction model using the standard Gao Si function.
【学位授予单位】：南京邮电大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN912.3

【参考文献】