基于压缩感知的语音信号处理及优化研究
发布时间:2018-07-20 21:17
【摘要】:传统的奈奎斯特采样理论要求采样频率不得小于信号最大带宽的两倍,采集到的信息少量被保存下来,大部分信息被忽略掉,对硬件设备要求高且造成了资源浪费。随着信息科技的迅速发展,以奈奎斯特采样定理进行信息采集的技术越来越不能满足人们对信号处理效率的要求,压缩感知理论的提出解决了这一问题,它可以将采样和压缩同时进行。该理论是指信号在满足稀疏性的前提下,信号的观测序列由信号与观测矩阵的乘积得到,且由较少个数的观测序列可以准确地恢复高维原始信号。语音信号具有良好的可压缩性,所以压缩感知理论可以实现语音信号的压缩重构。不同于图像领域,国内外将CS用于语音信号处理领域的研究还比较少,处于起步阶段。本文研究了压缩感知理论,并将压缩感知理论应用到语音信号处理中,描述了语音信号压缩重构的实现过程,介绍了重构语音评价方法,并提出了几点优化改进,具体工作如下:首先,本文围绕信号稀疏表示、观测矩阵的构造和重构算法研究三个主要方面阐述了压缩感知理论。以专业语音库中的男生和女生朗读语音作为实验对象,对比两种常用重构算法BP算法和OMP算法在语音信号压缩重构中的表现,并研究了压缩比和帧长对语音重构效果的影响。实验结果表明,同一重构算法下,男声的重构质量比女声的要好;针对同一实验语音,BP算法要比OMP算法的重构效果好。其次,本文分析和比较了几种压缩感知常见的观测矩阵在语音信号压缩重构过程中的性能,并对在不同的实验条件下观测矩阵的选取提出建议。实验表明,压缩比和帧长是观测矩阵选取的关键因素。在不同的压缩比和帧长下,需要选取不同的观测矩阵,以达到最好的语音重构效果。第三,文中从信号的稀疏表示着手,引入冗余字典中的紧框架算法,可以使得信号得到更加稀疏的表示,并与压缩感知理论常用的高斯随机矩阵进行重构语音质量的比较。实验结果表明,紧框架矩阵相对于传统常用的高斯随机矩阵,在语音重构过程中取得了更好的效果。第四,文中加入心理声学模型中的绝对听阈,把一些人耳听不见、无用的信号过滤掉,减少信号非零值,增加信号的稀疏度,以达到提高重构语音质量的目的。实验表明,将绝对听阈加入到传统的语音信号压缩感知后,重构语音取得了更好的效果。
[Abstract]:The traditional Nyquist sampling theory requires that the sampling frequency should not be less than twice the maximum bandwidth of the signal, the information collected is preserved in a small amount, most of the information is ignored, and the hardware equipment is required and the resources are wasted. With the rapid development of information technology, the technology of collecting information based on Nyquist sampling theorem is more and more unable to meet the demand of signal processing efficiency. The theory of compressed perception solves this problem. It can sample and compress simultaneously. This theory means that under the condition that the signal is sparse, the observation sequence can be obtained from the product of the signal and the observation matrix, and the high dimensional original signal can be accurately recovered from a small number of observation sequences. Speech signal has good compressibility, so compression sensing theory can realize speech signal compression and reconstruction. Different from the field of image, the research of CS in speech signal processing field is still few, and it is still in its infancy. In this paper, the theory of compression perception is studied and applied to speech signal processing. The realization process of speech signal compression and reconstruction is described, the method of speech reconstruction evaluation is introduced, and some optimization improvements are put forward. The main work is as follows: firstly, the theory of compressed sensing is discussed in this paper, which focuses on the sparse representation of signal, the construction of observation matrix and the research of reconstruction algorithm. Taking male and female students in professional speech corpus as experimental objects, the performance of two common reconstruction algorithms, BP algorithm and OMP algorithm, in speech signal compression and reconstruction are compared, and the effects of compression ratio and frame length on speech reconstruction effect are studied. The experimental results show that the reconstruction quality of male voice is better than that of female voice under the same reconstruction algorithm, and the reconstruction effect of BP algorithm is better than that of OMP algorithm for the same experimental speech. Secondly, this paper analyzes and compares the performance of several common observation matrices in speech signal compression and reconstruction, and gives some suggestions on the selection of observation matrices under different experimental conditions. The experimental results show that compression ratio and frame length are the key factors in the selection of observation matrix. In order to achieve the best speech reconstruction effect, different observation matrices should be selected under different compression ratio and frame length. Thirdly, starting with the sparse representation of signals and introducing the compact frame algorithm in redundant dictionaries, the signal can be represented more sparsely, and compared with the Gao Si random matrix commonly used in compression perception theory to reconstruct speech quality. The experimental results show that the compact frame matrix is more effective than the conventional Gao Si random matrix in speech reconstruction. Fourthly, the absolute hearing threshold in the psychoacoustic model is added to filter out some unusable signals, reduce the non-zero value of the signals, increase the sparsity of the signals, so as to improve the quality of the reconstructed speech. The experimental results show that after adding the absolute hearing threshold to the traditional speech signal compression perception, the speech reconstruction can achieve better results.
【学位授予单位】:西华大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN912.3
[Abstract]:The traditional Nyquist sampling theory requires that the sampling frequency should not be less than twice the maximum bandwidth of the signal, the information collected is preserved in a small amount, most of the information is ignored, and the hardware equipment is required and the resources are wasted. With the rapid development of information technology, the technology of collecting information based on Nyquist sampling theorem is more and more unable to meet the demand of signal processing efficiency. The theory of compressed perception solves this problem. It can sample and compress simultaneously. This theory means that under the condition that the signal is sparse, the observation sequence can be obtained from the product of the signal and the observation matrix, and the high dimensional original signal can be accurately recovered from a small number of observation sequences. Speech signal has good compressibility, so compression sensing theory can realize speech signal compression and reconstruction. Different from the field of image, the research of CS in speech signal processing field is still few, and it is still in its infancy. In this paper, the theory of compression perception is studied and applied to speech signal processing. The realization process of speech signal compression and reconstruction is described, the method of speech reconstruction evaluation is introduced, and some optimization improvements are put forward. The main work is as follows: firstly, the theory of compressed sensing is discussed in this paper, which focuses on the sparse representation of signal, the construction of observation matrix and the research of reconstruction algorithm. Taking male and female students in professional speech corpus as experimental objects, the performance of two common reconstruction algorithms, BP algorithm and OMP algorithm, in speech signal compression and reconstruction are compared, and the effects of compression ratio and frame length on speech reconstruction effect are studied. The experimental results show that the reconstruction quality of male voice is better than that of female voice under the same reconstruction algorithm, and the reconstruction effect of BP algorithm is better than that of OMP algorithm for the same experimental speech. Secondly, this paper analyzes and compares the performance of several common observation matrices in speech signal compression and reconstruction, and gives some suggestions on the selection of observation matrices under different experimental conditions. The experimental results show that compression ratio and frame length are the key factors in the selection of observation matrix. In order to achieve the best speech reconstruction effect, different observation matrices should be selected under different compression ratio and frame length. Thirdly, starting with the sparse representation of signals and introducing the compact frame algorithm in redundant dictionaries, the signal can be represented more sparsely, and compared with the Gao Si random matrix commonly used in compression perception theory to reconstruct speech quality. The experimental results show that the compact frame matrix is more effective than the conventional Gao Si random matrix in speech reconstruction. Fourthly, the absolute hearing threshold in the psychoacoustic model is added to filter out some unusable signals, reduce the non-zero value of the signals, increase the sparsity of the signals, so as to improve the quality of the reconstructed speech. The experimental results show that after adding the absolute hearing threshold to the traditional speech signal compression perception, the speech reconstruction can achieve better results.
【学位授予单位】:西华大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN912.3
【参考文献】
相关期刊论文 前10条
1 郭训香;;框架的强分离性与紧框架的构造[J];数学学报(中文版);2015年04期
2 张近;夏凌;李光瑞;;基于压缩感知和图像分块的遮挡人脸识别[J];西华大学学报(自然科学版);2015年03期
3 宁矿凤;王景芳;;压缩感知分组分离语音增强[J];计算机工程与应用;2014年24期
4 党殭;马林华;田雨;张海威;茹乐;李小蓓;;m序列压缩感知测量矩阵构造[J];西安电子科技大学学报;2015年02期
5 朱志臻;周崇彬;刘发林;李滨兵;张志达;;用于压缩感知的二值化测量矩阵[J];微波学报;2014年02期
6 王学伟;崔广伟;王琳;贾晓璐;聂伟;;基于平衡Gold序列的压缩感知测量矩阵的构造[J];仪器仪表学报;2014年01期
7 张波;刘郁林;王开;;稀疏随机矩阵有限等距性质分析[J];电子与信息学报;2014年01期
8 李s,
本文编号:2134806
本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/2134806.html