当前位置:主页 > 科技论文 > 信息工程论文 >

基于特征子空间量化的文本无关说话人年龄识别

发布时间:2018-05-04 09:10

  本文选题:特征子空间量化 + MFCC ; 参考:《苏州大学》2016年硕士论文


【摘要】:说话人年龄识别是指通过语音信号分析,识别出说话人的年龄特征。随着人机语音交互技术的研究不断深入,其应用也越来越广泛,对交互的自然度要求也随之提高。说话人年龄识别技术可以使得系统在人机交互过程中正确了解说话人的年龄特征,从而自适应地提供合理的交互方式,例如合适的音量、语速、语调、语气等。该技术可以广泛应用于自动语音信息查询、健康护理、娱乐场合等领域。本文提出一种特征子空间量化(FSSQ,Feature Subspace Quantization)方案进行文本无关的说话人年龄识别,主要思想是通过对同一年龄段说话人语音的声学特征空间基于聚类技术进行子空间划分并对子空间进行量化来减小模式类的分布散度,提高总体识别精度。同一年龄段说话人的语音信号首先提取梅尔倒谱参数(MFCC),然后采用K-Means算法对特征矢量进行聚类,完成特征子空间划分,进一步采用LBG算法对每一子空间进行量化,形成量化码本,每一年龄段的语音最终表示为一组量化码本。年龄识别基于最小平均码本距离进行判决分类。实验结果表明,提出的特征子空间量化说话人年龄识别方法相对矢量量化(VQ)和高斯混合模型(GMM)等典型方法具有更好的识别性能,总体集内和集外识别率分别达到了89.8%和58.6%。
[Abstract]:Speaker age identification refers to the identification of the speaker's age characteristics through speech signal analysis. With the continuous development of the research on human computer speech interaction technology, its application is becoming more and more extensive, and the requirement of the nature of the interaction is also increased. The speaker age recognition technology can make the system understand the speaker correctly in the process of human-computer interaction. This technique can be widely used in the fields of automatic voice information query, health care, entertainment and other fields. This paper proposes a FSSQ (Feature Subspace Quantization) scheme for text independence. The main idea of the speaker's age recognition is to reduce the distribution divergence and improve the overall accuracy by quantizing the subspace based on the clustering technique of the acoustic feature space of the speaker's voice in the same age group and quantizing the subspace to improve the overall recognition accuracy. The speech signal of the speaker in the same age segment is first extracted by the Mel cepstrum parameter (MFCC Then, the K-Means algorithm is used to cluster the feature vectors to complete the feature subspace division. The LBG algorithm is used to quantify each subspace, and the quantization codebook is formed. The speech at each age section is finally represented as a set of quantized codebooks. The age recognition is based on the most Xiaoping average distance for the decision classification. The experimental results show that, The characteristic subspace quantization speaker age recognition method has better recognition performance, such as the relative vector quantization (VQ) and the Gauss hybrid model (GMM), and the overall and the collection recognition rates are 89.8% and 58.6%., respectively.

【学位授予单位】:苏州大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TN912.34

【参考文献】

相关期刊论文 前1条

1 王书诏;邱天爽;;说话人识别研究综述[J];电声技术;2007年01期

相关硕士学位论文 前3条

1 张伟伟;说话人识别技术的研究[D];燕山大学;2010年

2 周昆湘;基于矢量量化的与文本无关的说话人确认系统的研究[D];中南大学;2007年

3 黄文辉;基于矢量量化的说话人识别技术研究[D];西安电子科技大学;2006年



本文编号:1842454

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/1842454.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户fbde7***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com