当前位置:主页 > 科技论文 > 网络通信论文 >

基于呼吸的身份识别研究

发布时间:2017-12-27 08:24

  本文关键词:基于呼吸的身份识别研究 出处:《电子科技大学》2015年硕士论文 论文类型:学位论文


  更多相关文章: 身份识别 文本无关型 呼吸 MFCC


【摘要】:说话人识别指的是通过语音进行身份识别,它在自动电话服务、法庭音频取证等领域都有着非常广泛的用途。前人对于说话人识别的研究主要集中于文本相关型系统,它通过限定语音的内容来进行识别,然而这种系统的应用范围相当有限。对比而言,本文致力于研究一种用途更为广泛的说话人识别系统,即文本无关型系统。文本无关的系统不会限定说话者说话的内容,因此具有更高的难度。这种系统最大的一个挑战就是如何从变化丰富的语音中提取出本文需要的特征,前人从频谱特征到声源特征到韵律、词汇等越来越高层的特征上都有一些尝试,但是其识别精度和系统可用度都有待提高。同时在提取高维度的特征时,提取特征所用到的语音识别的方法越来越复杂,对计算能力的要求也逐渐增加。本论文提出了一种高效的文本无关的说话人识别方案,首次利用呼吸音来识别说话人,使得识别系统完全不会受到语音信号变化多样性的影响。本文针对呼吸音在能量和频谱上的特点提出了高效的呼吸提取方案和基于呼吸的说话人识别方案。呼吸提取的过程主要使用了MFCC(mel-frequency cepstrum coefficient)、过零率和能量参数,采用两步检测的方案(初步探测和误报检测)提高了提取呼吸的精度;特征提取的过程中对呼吸音的各项语音参数进行了提取;最后通过轻量级的高斯模型和贝叶斯理论进行建模和决策,为了减少系统的复杂度,本文中使用的都是用轻量级的模型和方法。本文首先简要介绍了相关的语音识别技术,随后在此基础上完成了基于呼吸的身份识别系统的详细设计,最后通过自己收集的34人340段的语音数据上,利用matlab中进行了仿真和测试,得到实验结果。实验结果表明,以呼吸中的MFCC为参数,得到的FAR(False Accept Rate)和FRR(False Rejection Rate)均在10%以下。而且当测试语音时长较长时,精度会进一步提高,对于大于一分钟的语音数据,FAR和FRR可以达到5%以下。
[Abstract]:Speaker recognition refers to identity recognition through speech. It has a very wide range of uses in the fields of automatic telephone service, forensic audio forensics and other fields. Previous researches on speaker recognition mainly focus on text dependent system. It can recognize speech content by restricting the content of speech, but the application scope of this system is rather limited. In contrast, this paper is devoted to a more widely used speaker recognition system, that is, a text independent system. A text - independent system does not limit the content of the speaker's speech, so it has a higher degree of difficulty. A challenge to this system is how to change from the speech rich extracts feature needed in this paper, from the previous spectrum to the sound source characteristics to rhythm, vocabulary more and more high-level have some attempts, but its accuracy and availability of the system has to be improved. At the same time, when the features of high dimension are extracted, the method of speech recognition used to extract features is becoming more and more complex, and the demand for computing ability is increasing gradually. In this paper, an efficient text independent speaker recognition scheme is proposed. First, we use breath sounds to identify speakers, so that the recognition system is not affected by the diversity of speech signals. In this paper, an efficient breathing extraction scheme and a speaker recognition scheme based on respiration are proposed in view of the characteristics of respiratory sound on energy and spectrum. The extraction process of respiration mainly use the MFCC (Mel-frequency cepstrum coefficient), zero crossing rate and energy parameters, the two step detection scheme (preliminary detection and false alarm detection improves the accuracy of extraction of breath); the feature extraction process of the speech parameters of respiratory sounds were extracted; finally through the Gauss model and the theory of Bias lightweight modeling and decision-making, in order to reduce the complexity of the system, this paper use a lightweight model and method. This paper briefly introduces the speech recognition technology, then based on the detailed design of identity recognition system based on the last breath, through the voice data 34 people own collection of 340 segments, the use of MATLAB in the simulation and test, the experimental results obtained. The experimental results show that the FAR (False Accept Rate) and FRR (False Rejection Rate) obtained from the MFCC in the respiration are below 10%. And when the test speech is long, the accuracy will be further improved, for more than one minute of speech data, FAR and FRR can reach less than 5%.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TN912.34

【参考文献】

相关期刊论文 前1条

1 甄斌,吴玺宏,刘志敏,迟惠生;语音识别和说话人识别中各倒谱分量的相对重要性[J];北京大学学报(自然科学版);2001年03期



本文编号:1340926

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/wltx/1340926.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户e605e***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com