语谱图傅立叶变换的二字汉语词汇语音识别
发布时间:2018-06-21 06:34
本文选题:语音识别 + 语谱图 ; 参考:《东北师范大学》2017年硕士论文
【摘要】:本文通过对宽窄带语谱图进行傅立叶变换,得到了频域图像分带投影特征值,并对宽窄带语谱图特征进行融合,形成一种二字汉语词汇语音识别算法。该算法不采用以往语音识别算法对语音信号逐帧识别,而是利用语谱图的整体特性逐词地进行语音整体识别,能够突显语音信号的整体时频特性。本方法利用语谱图作为可视化图像的性质,借助于图像识别技术来进行语音识别。因为语谱图表征语音特性体现在纹络结构上,因此图像纹络结构更容易由图像的频域描述,本文对宽窄带语谱图进行再次傅立叶变换,将其语谱图图像空域转换至其图像频域,从而对二字汉语词汇进行语音识别。本文主要是通过MATLAB R2013a软件对算法进行研究、编程、仿真和实现。首先使用CoolEditPro2.0软件对录制好的语音样本进行预处理,并对其进行量化归一。然后使用MATLAB R2013a软件进行编程,通过傅立叶时频分析构造宽窄带语谱图,并对其进行再次傅立叶变换,得到的图像频域进行二进倍增宽度分带行投影和列投影,借助于支持向量机实现二字汉语词汇语音识别。仿真实验表明:该算法对特定人二字汉语词汇语音的识别率可达96.8%,对非特定人二字汉语词汇语音的识别率可达98.8%,为解决二字汉语词汇整体语音识别提供了一种新的研究思路。因为小波变换是一种时间窗和频率窗都可以改变的时频分析方法,因此在本文中我们尝试构造小波语谱图对二字汉语词汇进行语音识别。由于录制大量样本的工作较为繁琐,所以我们尝试通过单模版实现对非特定人语音进行识别。但在实际过程中遇到了各种问题,实验结果并不理想,后续仍需做进一步研究和讨论。
[Abstract]:In this paper, the band projection eigenvalues of the frequency domain images are obtained by Fourier transform, and a two-character Chinese lexical speech recognition algorithm is formed by the fusion of the broad and narrow band spectrum features. This algorithm does not use the previous speech recognition algorithms to recognize the speech signal frame by frame, but uses the whole character of the spectrum map to recognize the speech signal word by word, which can highlight the overall time-frequency characteristic of the speech signal. In this method, the speech spectrum is used as the character of the visual image, and the image recognition technology is used to carry out the speech recognition. Because the speech characteristic of the speech spectrum is reflected in the texture structure, so the image texture is more easily described by the frequency domain of the image. In this paper, the broad and narrow band spectrum image is transformed again by Fourier transform, and the spatial domain of the spectrum image is converted to the frequency domain of the image. Therefore, the speech recognition of two-word Chinese vocabulary is carried out. In this paper, the algorithm is studied, programmed, simulated and implemented by MATLAB R2013a software. Firstly, CoolEditPro2.0 software is used to preprocess the recorded speech sample, and to quantify it. Then using MATLAB R2013a software to program, through Fourier time-frequency analysis to construct the broad narrow band spectrum, and carry on the Fourier transform to it again, the obtained image frequency domain carries on the binary multiplication width banding line projection and the column projection. Second-word Chinese vocabulary speech recognition is realized by support vector machine (SVM). The simulation results show that the recognition rate of the algorithm can reach 96.8 for the specific two-character Chinese vocabulary speech and 98.8 for the non-specific two-character Chinese vocabulary speech, which provides a new research idea for the whole speech recognition of the two-character Chinese vocabulary. Because wavelet transform is a time-frequency analysis method which can be changed both in time window and frequency window, we try to construct wavelet spectrum to recognize two-character Chinese vocabulary in this paper. Because the work of recording a large number of samples is tedious, we try to realize the recognition of independent speech by single template. However, various problems have been encountered in the practical process, the experimental results are not satisfactory, and further research and discussion are needed.
【学位授予单位】:东北师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN912.34
【参考文献】
相关期刊论文 前10条
1 吴迪;赵鹤鸣;陶智;张晓俊;肖仲U,
本文编号:2047602
本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/2047602.html