语音信号压缩感知关键技术研究
本文选题:语音信号 + 压缩感知 ; 参考:《南京邮电大学》2014年博士论文
【摘要】:信号的稀疏性是压缩感知理论的应用前提,压缩感知用最少的观测数来对信号进行压缩采样,实现了信号的降维处理,节约了采样和传输的成本,给信号采样技术带来一场新的革命。对于语音信号而言,由于其具有近似稀疏性,可以将压缩感知理论与语音信号处理技术结合,打破了传统的建立于奈奎斯特采样的语音信号处理经典模式。用压缩感知理论中的观测序列来代替传统奈奎斯特语音采样值,将导致信号特征发生根本性的变化,从而影响语音信号处理应用的各个领域。本课题在对压缩感知理论深入研究的基础上,研究了语音信号的压缩感知稀疏域和基于观测序列的语音端点检测算法,提出一种适合语音的观测矩阵,并对该观测矩阵投影下的观测序列模型进行研究,针对语音压缩感知,提出一种码本映射联合l1重构算法。论文的主要工作和创新如下: (1)研究语音观测序列在不同稀疏域下的压缩感知重构技术,对比语音信号在DCT、DFT、DWT及K-L变换下的稀疏性。研究表明,虽然在K-L变换下语音系数是最稀疏的,但由于重构时需要用到原信号的自相关矩阵,实际应用困难,而在前三种稀疏域下,DCT变换的稀疏性最好。研究了在随机高斯矩阵投影下,压缩感知BP重构和OMP重构的原理及性能。实验结果显示对语音信号而言,在相同观测点数下,BP重构性能优于OMP,但运算复杂度大。研究了语音观测在过完备余弦字典及KSVD字典下的压缩感知,由于系数稀疏性的增强,其重构效果比DCT基均有提高,且KSVD字典重构性能优于过完备余弦字典。根据语音帧和非语音帧压缩感知观测序列频谱幅度分布分散且差异较大的特性,提出一种基于压缩感知观测序列倒谱距离的语音端点检测算法,以直接根据观测序列特性分析判断出原始输入语音的属性。对不同信噪比下的带噪语音进行端点检测仿真实验,其性能与传统奈奎斯特采样下的倒谱端点检测相当,但可以降低运算量。 (2)针对DCT稀疏基下,语音信号采用随机高斯观测矩阵投影时,压缩感知重构零(近似零)系数定位能力差,导致对重构质量起主导作用的系数样值发生较大误差的问题,提出一种适合于语音信号压缩采样的行阶梯观测矩阵,并对压缩观测序列采用对偶仿射尺度内点算法进行重构。仿真实验结果显示,行阶梯矩阵做观测矩阵,能够对语音信号的零(近似零)系数进行较好的定位,从而得到明显优于高斯观测矩阵下语音压缩感知的重构性能,并且行阶梯观测矩阵与随机高斯观测矩阵相比,相应的数据量和运算量都大大减小。因此,作者认为,行阶梯观测矩阵是适合语音信号压缩感知采样的比较理想的投影矩阵。 (3)鉴于行阶梯矩阵投影下得到的语音压缩观测序列仍具有较强的相关性,提出对观测序列采用Volterra级数二次建模,分析输入序列维数和模型阶数对语音行阶梯观测序列预测的效果,并联合使用Wiener滤波器以提高预测准确程度,实现了基于部分CS观测序列、Volterra模型、Wiener滤波器的CS重构。 (4)论文最后针对CS重构算法计算量大的问题,,提出一种基于观测序列与原始序列关系的码本映射重构方法,该方法与l1重构相比,对稀疏系数的位置估计较为准确,且不需要优化算法进行重构,而是从训练得到的码本中直接得到重构系数,重构时需要的计算量比BP和OMP算法明显下降。但由于系数大小估计不够准确,综合考虑重构性能和运算量,采用码本映射联合l1重构。该算法训练阶段得到语音码本和观测码本,测试阶段先估计测试语音的SNR,然后根据SNR和CS压缩比选择相应的能量门限,观测序列帧能量大于采用l1重构,小于l1采用码本重构。实验表明,在中低SNR环境下,码本映射联合l1重构算法在一定的能量门限下重构性能优于l1重构,在高SNR和无噪环境下,码本映射联合l1算法在码本帧数为总帧数3/10左右时,可获得与l1重构相当的性能。联合算法中码本重构部分由于不需要计算量很大的非线性优化算法,能够节省相应的运算量。
[Abstract]:The sparsity of the signal is the application premise of the compression perception theory. Compressed sensing uses the least observation number to compress the signal, realizes the signal reduction processing, saves the cost of sampling and transmission, and brings a new revolution to the signal sampling technology. For the speech signal, because of its approximate sparsity, it can press the pressure. The combination of contraction sensing theory and speech signal processing technology breaks the classic model of speech signal processing established in Nyquist sampling. Using the observation sequence in the compressed sensing theory to replace the traditional Nyquist voice sampling value, it will lead to the fundamental change of signal characteristics, thus affecting the application of speech signal processing. On the basis of deep research on the theory of compressed sensing, this paper studies the compressed sensing sparse domain of speech signal and the algorithm of speech endpoint detection based on observation sequence, proposes an observation matrix suitable for speech, and studies the observation sequence model under the projection of the observation matrix, and proposes the speech compression perception. A codebook mapping combined with L1 reconstruction algorithm. The main work and innovations of the paper are as follows:
(1) study the compression sensing reconstruction techniques of speech observation sequences under different sparse domains, compare the sparsity of speech signals under DCT, DFT, DWT and K-L transform. The study shows that although the speech coefficients are the thinnest in the K-L transformation, the autocorrelation matrix of the original signal is used in the reconstruction, and the actual application is difficult, and the first three sparse domains are used. The sparsity of DCT transformation is the best. The principle and performance of the compressed sensing BP reconstruction and OMP reconstruction under the random Gauss matrix projection are studied. The experimental results show that the performance of the BP reconstruction is better than that of OMP for the same observation points, but the computational complexity is large. The speech observation is studied under the overcomplete cosine dictionary and the KSVD dictionary. Because of the enhancement of coefficient sparsity, the reconstruction effect of the compressed sensing is better than that of the DCT base, and the performance of the KSVD dictionary is better than the overcomplete cosine dictionary. According to the characteristics of the spectral amplitude distribution of the speech frame and the non speech frame compression perceptual observation sequence, a kind of language based on the cepstrum distance of the compressed sensing observation sequence is proposed. The speech endpoint detection algorithm is used to determine the attributes of the original input speech directly according to the characteristics of the observation sequence. The simulation experiment on the endpoint detection of the noisy speech under different signal to noise ratio is simulated. Its performance is equivalent to the inverse spectrum endpoint detection under the traditional Nyquist sampling, but it can reduce the computation.
(2) under the DCT sparse basis, when the speech signal is projected by random Gauss observation matrix, the ability of the compressed sensing to reconstruct the zero (approximate zero) coefficient is poor, which leads to the large error of the coefficient sample which plays a leading role in the reconstruction quality, and proposes a row step observation matrix suitable for the compression sampling of the speech signal and the compression observation. The sequence is reconstructed by the dual affine scale interior point algorithm. The simulation results show that the row step matrix is an observation matrix, which can better locate the zero (approximate zero) coefficients of the speech signal, and the reconstruction performance of the speech compression perception under the Gauss observation matrix is obviously better than that of the step observation matrix and the random Gauss. Compared with the observation matrix, the corresponding amount of data and the amount of operation are greatly reduced. Therefore, the author thinks that the row step observation matrix is an ideal projection matrix suitable for the perceptual sampling of speech signal compression.
(3) in view of the strong correlation of the speech compression observation sequence obtained under the row ladder matrix projection, the two time modeling of Volterra series is adopted for the observation sequence, and the effect of the dimension of the input sequence and the model order to the prediction of the speech step observation sequence is analyzed, and the Wiener filter is combined to improve the accuracy of the prediction. The CS reconstruction is based on partial CS observation sequence, Volterra model and Wiener filter.
(4) at the end of the paper, a new method of codebook mapping reconstruction based on the relationship between the observation sequence and the original sequence is proposed, which is based on the relationship between the observation sequence and the original sequence. Compared with the L1 reconstruction, this method is more accurate for the position estimation of the sparse coefficient and does not need to be reconstructed by the optimization algorithm, but the reconfiguration system is directly obtained from the codebook trained by the CS. Number, the amount of computation needed in reconfiguration is significantly lower than that of BP and OMP algorithm. But because the estimation of the coefficient is not accurate enough, the reconfiguration performance and computation are considered synthetically, the codebook mapping combined with L1 is used. The training phase of the algorithm gets the phonetic codebook and the observational codebook. The test phase first estimates the SNR of the test speech, and then according to the SNR and CS compression ratio selection. According to the corresponding energy threshold, the frame energy of the observation sequence is larger than the L1 reconfiguration, and the L1 is less than the codebook reconstruction. The experiment shows that under the low SNR environment, the codebook mapping combined with L1 reconstruction algorithm is better than the L1 reconstruction under certain energy threshold. Under the high SNR and noise free environment, the codebook mapping combined with L1 algorithm is the total frame number 3/1 in the codebook frame number. At about 0, the performance of the L1 reconfiguration can be obtained. The codebook reconfiguration part of the joint algorithm can save the corresponding computation due to the nonlinear optimization algorithm which does not require a large amount of computation.
【学位授予单位】:南京邮电大学
【学位级别】:博士
【学位授予年份】:2014
【分类号】:TN912.3
【相似文献】
相关期刊论文 前10条
1 周开利;基于子波变换的语音信号压缩[J];海南大学学报(自然科学版);2002年02期
2 胡峻辉,王蓓蕾,李晶皎;基于凌阳单片机的语音信号实时采集[J];单片机与嵌入式系统应用;2003年04期
3 田丽平;;基于混沌复合映射的语音信号流安全通信仿真实现[J];计算机与现代化;2007年02期
4 张达敏;;小波包分析在语音信号压缩中的应用[J];现代机械;2007年06期
5 徐光洁;高清维;许亚男;;微振动语音信号检测的干扰背景分析研究[J];计算机与数字工程;2009年03期
6 吕钊;吴小培;张超;李密;;卷积噪声环境下语音信号鲁棒特征提取[J];声学学报;2010年04期
7 韩丽娟;;混沌背景下语音信号提取算法的研究[J];电子技术;2010年05期
8 高悦;王改梅;陈砚圃;闵刚;杜佳;;基于差分变换的语音信号压缩感知[J];信号处理;2011年09期
9 徐倩;季云云;;基于最优观测的语音信号压缩感知[J];南京邮电大学学报(自然科学版);2011年06期
10 刘毅强;刘昱;段继忠;刘亚峰;;压缩感知处理语音信号的性能分析及比较[J];电声技术;2012年02期
相关会议论文 前10条
1 赵力;曾毓敏;邹采荣;吴镇扬;;基于子空间分析的语音信号寂声语声段识别方法[A];第十届全国信号处理学术年会(CCSP-2001)论文集[C];2001年
2 杜安丽;王茜;余磊;孙洪;;基于小波树结构的语音信号压缩感知恢复算法[A];2010年通信理论与信号处理学术年会论文集[C];2010年
3 张云翼;崔杰;肖灵;;一种改进的语音信号去混响算法[A];泛在信息社会中的声学——中国声学学会2010年全国会员代表大会暨学术会议论文集[C];2010年
4 陈韬;莫福源;李昌立;;语音信号的自动分段方法研究[A];第三届全国人机语音通讯学术会议论文集[C];1994年
5 沙宗先;卢绪刚;秦兵;李吉民;;语音信号的混沌现象研究[A];第四届全国人机语音通讯学术会议论文集[C];1996年
6 沙宗先;韩俊涛;陈惠鹏;秦兵;;语音信号的混沌现象研究[A];第五届全国人机语音通讯学术会议论文集[C];1998年
7 刘佳;师硕;李锡杰;王旭;;语音信号的分析方法和应用[A];第八届全国人机语音通讯学术会议论文集[C];2005年
8 于水源;陈玉东;;语音信号非线性动力学特性与语音学特性之间的关系[A];中国声学学会2006年全国声学学术会议论文集[C];2006年
9 吕苗荣;古德生;彭振斌;;语音信号基本处理单元的选择与应用[A];2007通信理论与技术新发展——第十二届全国青年通信学术会议论文集(上册)[C];2007年
10 高畅;李海峰;马琳;;基于压缩感知理论的语音信号压缩与重构方法[A];第十一届全国人机语音通讯学术会议论文集(一)[C];2011年
相关重要报纸文章 前9条
1 西安邮电学院 王娜;企业IP电话解决方案探讨[N];通信信息报;2005年
2 成都 史为;红外光语音通信实验[N];电子报;2005年
3 记者 杨柳纯;HYT携手清华大学研发语音信号技术[N];深圳特区报;2009年
4 ;什么是信号分离器?[N];中国电脑教育报;2003年
5 NMS国际公司供稿;StudioSound:高性能的语音质量[N];通信产业报;2003年
6 陕西 朱亚伟 编译;一款半双工对讲机电路[N];电子报;2012年
7 湖海;美推出一次性手机[N];中国电子报;2002年
8 ;YS-608型学习耳机原理与维修[N];电子报;2002年
9 山东 吕建国;鹦鹉学话、复读两用电路[N];电子报;2002年
相关博士学位论文 前10条
1 薛丽芳;语音信号动态特征分析及其可视化的关键技术研究[D];东北大学 ;2010年
2 韩志艳;语音信号鲁棒特征提取及可视化技术研究[D];东北大学;2009年
3 刘柏森;基于HHT复杂环境下低信噪比语音检测及增强方法研究[D];哈尔滨工程大学;2011年
4 叶蕾;语音信号压缩感知关键技术研究[D];南京邮电大学;2014年
5 金学成;基于语音信号的情感识别研究[D];中国科学技术大学;2007年
6 陈为国;实时语音信号处理系统理论和应用[D];浙江大学;2004年
7 谭丽丽;语音信号盲分离算法的研究[D];华南理工大学;2001年
8 闫润强;语音信号动力学特性递归分析[D];上海交通大学;2006年
9 覃爱娜;基于非线性理论的汉语语音编码技术研究[D];中南大学;2012年
10 郭海燕;基于稀疏分解的单通道混合语音分离算法研究[D];南京邮电大学;2011年
相关硕士学位论文 前10条
1 苏秦;基于声场景分析的混叠语音信号分离[D];苏州大学;2004年
2 牛国君;神经网络方法在语音信号检测中应用的研究[D];西南交通大学;2003年
3 张健;基于压缩感知的语音信号建模技术的研究[D];南京邮电大学;2012年
4 宋杨洁;基于LabVIEW与MATLAB的语言信号的采集与分析[D];武汉理工大学;2012年
5 赵翠;基于压缩感知的语音信号压缩[D];浙江工业大学;2013年
6 高静;压埋人员呼救语音信号处理方法研究[D];成都理工大学;2013年
7 吕丽鹏;基于时频分析的语音信号多脊提取算法研究[D];五邑大学;2013年
8 王帅;基于压缩感知的语音信号压缩重构算法研究[D];中北大学;2014年
9 李智海;基于语音信号监测脑疲劳的微电子系统设计与优化[D];苏州大学;2011年
10 郭海燕;基于小波变换的语音信号增强研究[D];燕山大学;2012年
本文编号:1948685
本文链接:https://www.wllwen.com/kejilunwen/wltx/1948685.html