语音信号压缩感知关键技术研究

发布时间：2018-05-28 23:54

本文选题：语音信号 + 压缩感知　；参考：《南京邮电大学》2014年博士论文

【摘要】：信号的稀疏性是压缩感知理论的应用前提，压缩感知用最少的观测数来对信号进行压缩采样，实现了信号的降维处理，节约了采样和传输的成本，给信号采样技术带来一场新的革命。对于语音信号而言，由于其具有近似稀疏性，可以将压缩感知理论与语音信号处理技术结合，打破了传统的建立于奈奎斯特采样的语音信号处理经典模式。用压缩感知理论中的观测序列来代替传统奈奎斯特语音采样值，将导致信号特征发生根本性的变化，从而影响语音信号处理应用的各个领域。本课题在对压缩感知理论深入研究的基础上，研究了语音信号的压缩感知稀疏域和基于观测序列的语音端点检测算法，提出一种适合语音的观测矩阵，并对该观测矩阵投影下的观测序列模型进行研究，针对语音压缩感知，提出一种码本映射联合l1重构算法。论文的主要工作和创新如下：（1）研究语音观测序列在不同稀疏域下的压缩感知重构技术，对比语音信号在DCT、DFT、DWT及K-L变换下的稀疏性。研究表明，虽然在K-L变换下语音系数是最稀疏的，但由于重构时需要用到原信号的自相关矩阵，实际应用困难，而在前三种稀疏域下，DCT变换的稀疏性最好。研究了在随机高斯矩阵投影下，压缩感知BP重构和OMP重构的原理及性能。实验结果显示对语音信号而言，在相同观测点数下，BP重构性能优于OMP，但运算复杂度大。研究了语音观测在过完备余弦字典及KSVD字典下的压缩感知，由于系数稀疏性的增强，其重构效果比DCT基均有提高，且KSVD字典重构性能优于过完备余弦字典。根据语音帧和非语音帧压缩感知观测序列频谱幅度分布分散且差异较大的特性，提出一种基于压缩感知观测序列倒谱距离的语音端点检测算法，以直接根据观测序列特性分析判断出原始输入语音的属性。对不同信噪比下的带噪语音进行端点检测仿真实验，其性能与传统奈奎斯特采样下的倒谱端点检测相当，但可以降低运算量。（2）针对DCT稀疏基下，语音信号采用随机高斯观测矩阵投影时，压缩感知重构零（近似零）系数定位能力差，导致对重构质量起主导作用的系数样值发生较大误差的问题，提出一种适合于语音信号压缩采样的行阶梯观测矩阵，并对压缩观测序列采用对偶仿射尺度内点算法进行重构。仿真实验结果显示，行阶梯矩阵做观测矩阵，能够对语音信号的零（近似零）系数进行较好的定位，从而得到明显优于高斯观测矩阵下语音压缩感知的重构性能，并且行阶梯观测矩阵与随机高斯观测矩阵相比，相应的数据量和运算量都大大减小。因此，作者认为，行阶梯观测矩阵是适合语音信号压缩感知采样的比较理想的投影矩阵。（3）鉴于行阶梯矩阵投影下得到的语音压缩观测序列仍具有较强的相关性，提出对观测序列采用Volterra级数二次建模，分析输入序列维数和模型阶数对语音行阶梯观测序列预测的效果，并联合使用Wiener滤波器以提高预测准确程度，实现了基于部分CS观测序列、Volterra模型、Wiener滤波器的CS重构。（4）论文最后针对CS重构算法计算量大的问题，，提出一种基于观测序列与原始序列关系的码本映射重构方法，该方法与l1重构相比，对稀疏系数的位置估计较为准确，且不需要优化算法进行重构，而是从训练得到的码本中直接得到重构系数，重构时需要的计算量比BP和OMP算法明显下降。但由于系数大小估计不够准确，综合考虑重构性能和运算量，采用码本映射联合l1重构。该算法训练阶段得到语音码本和观测码本，测试阶段先估计测试语音的SNR，然后根据SNR和CS压缩比选择相应的能量门限，观测序列帧能量大于采用l1重构，小于l1采用码本重构。实验表明，在中低SNR环境下，码本映射联合l1重构算法在一定的能量门限下重构性能优于l1重构，在高SNR和无噪环境下，码本映射联合l1算法在码本帧数为总帧数3/10左右时，可获得与l1重构相当的性能。联合算法中码本重构部分由于不需要计算量很大的非线性优化算法，能够节省相应的运算量。
[Abstract]:The sparsity of the signal is the application premise of the compression perception theory. Compressed sensing uses the least observation number to compress the signal, realizes the signal reduction processing, saves the cost of sampling and transmission, and brings a new revolution to the signal sampling technology. For the speech signal, because of its approximate sparsity, it can press the pressure. The combination of contraction sensing theory and speech signal processing technology breaks the classic model of speech signal processing established in Nyquist sampling. Using the observation sequence in the compressed sensing theory to replace the traditional Nyquist voice sampling value, it will lead to the fundamental change of signal characteristics, thus affecting the application of speech signal processing. On the basis of deep research on the theory of compressed sensing, this paper studies the compressed sensing sparse domain of speech signal and the algorithm of speech endpoint detection based on observation sequence, proposes an observation matrix suitable for speech, and studies the observation sequence model under the projection of the observation matrix, and proposes the speech compression perception. A codebook mapping combined with L1 reconstruction algorithm. The main work and innovations of the paper are as follows:
(1) study the compression sensing reconstruction techniques of speech observation sequences under different sparse domains, compare the sparsity of speech signals under DCT, DFT, DWT and K-L transform. The study shows that although the speech coefficients are the thinnest in the K-L transformation, the autocorrelation matrix of the original signal is used in the reconstruction, and the actual application is difficult, and the first three sparse domains are used. The sparsity of DCT transformation is the best. The principle and performance of the compressed sensing BP reconstruction and OMP reconstruction under the random Gauss matrix projection are studied. The experimental results show that the performance of the BP reconstruction is better than that of OMP for the same observation points, but the computational complexity is large. The speech observation is studied under the overcomplete cosine dictionary and the KSVD dictionary. Because of the enhancement of coefficient sparsity, the reconstruction effect of the compressed sensing is better than that of the DCT base, and the performance of the KSVD dictionary is better than the overcomplete cosine dictionary. According to the characteristics of the spectral amplitude distribution of the speech frame and the non speech frame compression perceptual observation sequence, a kind of language based on the cepstrum distance of the compressed sensing observation sequence is proposed. The speech endpoint detection algorithm is used to determine the attributes of the original input speech directly according to the characteristics of the observation sequence. The simulation experiment on the endpoint detection of the noisy speech under different signal to noise ratio is simulated. Its performance is equivalent to the inverse spectrum endpoint detection under the traditional Nyquist sampling, but it can reduce the computation.
(2) under the DCT sparse basis, when the speech signal is projected by random Gauss observation matrix, the ability of the compressed sensing to reconstruct the zero (approximate zero) coefficient is poor, which leads to the large error of the coefficient sample which plays a leading role in the reconstruction quality, and proposes a row step observation matrix suitable for the compression sampling of the speech signal and the compression observation. The sequence is reconstructed by the dual affine scale interior point algorithm. The simulation results show that the row step matrix is an observation matrix, which can better locate the zero (approximate zero) coefficients of the speech signal, and the reconstruction performance of the speech compression perception under the Gauss observation matrix is obviously better than that of the step observation matrix and the random Gauss. Compared with the observation matrix, the corresponding amount of data and the amount of operation are greatly reduced. Therefore, the author thinks that the row step observation matrix is an ideal projection matrix suitable for the perceptual sampling of speech signal compression.
(3) in view of the strong correlation of the speech compression observation sequence obtained under the row ladder matrix projection, the two time modeling of Volterra series is adopted for the observation sequence, and the effect of the dimension of the input sequence and the model order to the prediction of the speech step observation sequence is analyzed, and the Wiener filter is combined to improve the accuracy of the prediction. The CS reconstruction is based on partial CS observation sequence, Volterra model and Wiener filter.
(4) at the end of the paper, a new method of codebook mapping reconstruction based on the relationship between the observation sequence and the original sequence is proposed, which is based on the relationship between the observation sequence and the original sequence. Compared with the L1 reconstruction, this method is more accurate for the position estimation of the sparse coefficient and does not need to be reconstructed by the optimization algorithm, but the reconfiguration system is directly obtained from the codebook trained by the CS. Number, the amount of computation needed in reconfiguration is significantly lower than that of BP and OMP algorithm. But because the estimation of the coefficient is not accurate enough, the reconfiguration performance and computation are considered synthetically, the codebook mapping combined with L1 is used. The training phase of the algorithm gets the phonetic codebook and the observational codebook. The test phase first estimates the SNR of the test speech, and then according to the SNR and CS compression ratio selection. According to the corresponding energy threshold, the frame energy of the observation sequence is larger than the L1 reconfiguration, and the L1 is less than the codebook reconstruction. The experiment shows that under the low SNR environment, the codebook mapping combined with L1 reconstruction algorithm is better than the L1 reconstruction under certain energy threshold. Under the high SNR and noise free environment, the codebook mapping combined with L1 algorithm is the total frame number 3/1 in the codebook frame number. At about 0, the performance of the L1 reconfiguration can be obtained. The codebook reconfiguration part of the joint algorithm can save the corresponding computation due to the nonlinear optimization algorithm which does not require a large amount of computation.
【学位授予单位】：南京邮电大学
【学位级别】：博士
【学位授予年份】：2014
【分类号】：TN912.3

【相似文献】