基于性别预分类的年龄自动估计研究

发布时间：2018-01-05 12:00

本文关键词：基于性别预分类的年龄自动估计研究　出处：《江苏师范大学》2014年硕士论文　论文类型：学位论文

【摘要】：年龄估计技术是以计算机作为辅助工具,根据说话人语音,利用已设计好的年龄估计系统自动判别说话人所属年龄段。该技术在信息检索、人机通信、刑事侦查等领域都有重要应用价值和广泛的应用前景。目前,在研究基于语音的年龄估计时主要采用单一语音特征或者单一分类器构成的系统来处理多个年龄段分类任务;虽然相关学者在特征提取和分类算法方面做了大量卓有成效的工作,但是对于语音年龄估计技术特征不稳定、单层系统分类准确率低等问题还没有较好的解决方案,同时也缺乏标准评价平台即公认的年龄语音数据库。针对这些问题,论文从建立年龄语音数据库、提取特征到分类识别进行了系统研究,取得以下成果。1、建立年龄语音数据库以国际上语音语料库设计标准为参照,考虑话者年龄、性别分布选择。最终建立起一个包含三个年龄段、男女分布较均匀的年龄语音数据库。另外,对每段语音标注说话人的相关信息,如年龄、性别、录制时间。此工作有利于扩展语音数据库功能,例如年龄标注可用于年龄估计,性别标注可使数据库用于性别分类。2、建立融合性别预分类的年龄估计系统目前年龄估计系统大多使用单一特征、单一分类器进行分类,分类准确率普遍较低。论文先进行性别预分类,根据分层分类思想优先判断是否为儿童;然后在特定性别下进行青壮年、老年的估计。结合每个子任务的特点选用多种特征和分类器,以提高系统最终的分类效果。3、提出基于改进Citation-kNN算法的成人性别分类方法Citation-kNN算法多用于图像处理,对其改进并首先引入到成人性别分类研究中。提出了基于GMM的语音多示例包生成方法;对Citation-kNN算法的距离测度改进进行模式分类,简化了系统训练方法。实验结果表明,改进后的Citation-kNN算法应用到说话人性别分类是可行的,系统的平均分类准确率与传统的算法相比略有提高。4、提出基于频带加权MFCC的年龄子类别估计系统语音信号经离散傅里叶变换后得到的各个频带信息对年龄估计任务有不同的贡献度,以频带能量为参数,依据F-ratio准则设计区分度函数来计算各个频带的整体贡献度。计算MFCC特征时,在Mel滤波之后对各个滤波器输出的频带能量按贡献度不同进行加权,以强化或削弱相应频带。基于性别信息的年龄子类别估计实验结果表明,改进后的MFCC特征比传统MFCC更能体现语音年龄信息。
[Abstract]:Age estimation technique is based on the computer as a tool, according to the speaker, the speaker is estimated the system automatically determine the age by age. The design has good technology in information retrieval, human-computer communication, the field of criminal investigation have important application value and broad application prospect. At present, in the study of age estimation based on speech time the system mainly adopts a single speech feature or a single classifier to deal with multiple age classification tasks; although some scholars have done a lot of very fruitful work in feature extraction and classification algorithm, but for the voice of age estimation of technical characteristics is not stable, single system low classification accuracy is not a better solution, but also the lack of the standard evaluation platform known as the age of speech database. To solve these problems, this paper from the establishment of age speech database, feature extraction To sign recognition system research, obtains the following results.1, establish the age speech database based on international standard design of speech corpus for reference, then consider the age and gender distribution. Finally set up a three age, men and women in uniform distribution age speech database. In addition, the relevant information. Each speech tagging speaker such as age, gender, recording time. This work is conducive to the expansion of speech database functions, such as tagging can be used for age estimation of age, sex can make the annotation database for gender classification.2, establish the integration of the gender age estimation of pre classification system at present age estimation systems mostly use single feature single classifier., the classification accuracy rate is generally low. The first sex pre classification, according to the classification of priority to determine whether the idea of children; then in the specific nature of don't The young, elderly estimation. According to the characteristics of each sub task feature and classifier selection, in order to improve the effect of the final.3 classification system, put forward the adult gender classification method improved Citation-kNN algorithm based on Citation-kNN algorithm for image processing, to improve and first introduced to study the classification of adult sex. The GMM voice the multi instance bag generation method based on distance measure; on the improvement of Citation-kNN algorithm for pattern classification, simplify the system training methods. The experimental results show that the improved Citation-kNN algorithm is applied to speaker gender classification is feasible, the average classification accuracy of system and the traditional algorithm is compared to a slight increase of.4, the age estimation task different age weighted MFCC tribute band sub categories of speech signal estimation system by discrete Fourier transform obtained after each frequency band based on information In order to offer degrees, frequency band energy parameters, according to the F-ratio criteria for the design of the discrimination function to calculate the overall contribution of each band. In the calculation of MFCC features, Mel filter after the band energy of the output of each filter according to the contribution of different weights to strengthen or weaken the corresponding frequency band. The gender information age estimation of the sub categories the results show that MFCC based on improved feature can reflect the information age speech more than traditional MFCC.

【学位授予单位】：江苏师范大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TN912.3

【相似文献】