面向语音情感识别的IMFE特征提取算法和融合KELM识别算法研究

发布时间：2018-01-08 07:28

本文关键词：面向语音情感识别的IMFE特征提取算法和融合KELM识别算法研究　出处：《太原理工大学》2017年硕士论文　论文类型：学位论文

【摘要】：语音作为一种包含说话内容和情感状态的复杂信号,是人类进行交流和表达情感的有效形式。语音情感识别是计算机通过提取并分析情感语音的特征参数从而判别情感类别的一种信息处理技术,对于提高人机交互智能化程度具有重要意义。本文在语音情感识别的课题背景下,介绍了常用的语音库、情感特征和识别网络,将集合经验模态分解(Ensemble Empirical Mode Decomposition,EEMD)算法应用到语音情感特征提取中,提取了本征模态函数能量特征IMFE和边际谱幅值特征MSA,选择了IMFE、韵律特征、MFCC三种情感特征进行特征级融合,并提出了一种自适应融合核函数极限学习机(Extreme Learning Machine with Kernel,KELM)的决策级融合方法用于语音情感识别。本文所做的主要工作如下:(1)选择EEMD算法以非线性非平稳信号的处理方法提取情感语音特征。传统的情感特征提取方法均假定语音是短时平稳信号,针对传统方法的局限性,本文在EEMD算法分解语音信号的基础上提取了边际谱幅值特征MSA,并选择KELM为识别网络,基于柏林语音库设计仿真实验并对四种情感(高兴、悲伤、愤怒、中性)进行识别,通过与韵律特征、MFCC特征的识别结果对比,验证了MSA特征的有效性。(2)提出了一种基于EEMD算法的特征提取方法并应用于语音情感识别中。语音情感信号经EEMD算法分解为一组本征模态函数(IMF),通过Spearman Rank相关系数筛选出有效的IMF分量,并通过能量计算得到一个语音情感新特征IMFE,选择柏林语音库进行识别,并与韵律特征、MFCC特征的识别性能对比,结果表明IMFE可以有效识别情感,且对负性情感的识别效果最优。(3)将特征级数据融合应用于语音情感识别。针对单一语音情感特征识别效果不好的问题,本文选择了IMFE特征、韵律特征、MFCC特征进行融合,设计实验将这三种特征的不同组合分别输入到分类器中,在柏林语音库仿真并与输入的单一特征识别结果对比,结果表明特征融合在一定程度上提高了识别性能,证明了三种特征具有互补性,但也因为特征维数的简单相加造成了特征融合在部分情感的识别率低于单一特征识别率的问题。(4)提出了一种基于融合KELM的语音情感识别新方法。针对单一特征、单分类器识别性能不佳的问题,本文将决策级数据融合应用于语音情感识别的研究中,首先提取三种语音情感特征,并分别训练对应的单分类器,同时把单分类器的数值输出统一转化成概率输出;然后通过制定的决策策略得到测试集的自适应权值,决策策略依据概率矩阵而定;最后对各单分类器的输出概率线性加权并判别输出。选择柏林语音库进行识别,结果表明融合KELM在单一情感和整体的识别率均达到最优,优于单一特征、特征融合和常用决策策略的性能,是一种有效的语音情感识别方法。
[Abstract]:Speech is a kind of complex signal which includes speech content and emotional state. Speech emotion recognition is a kind of information processing technology in which the computer extracts and analyzes the characteristic parameters of emotion speech to distinguish the emotion category. It is of great significance to improve the intelligence of human-computer interaction. Under the background of speech emotion recognition, this paper introduces the commonly used speech database, emotional characteristics and recognition network. The EMD (Ensemble Empirical Mode DecompositionEEMD) algorithm is applied to the speech emotion feature extraction. The energy feature of intrinsic mode function (IMFE) and the marginal spectrum feature (MSA) were extracted, and the three affective features of IMFEand prosodic feature were selected for feature level fusion. An adaptive fusion kernel function extreme learning machine (extreme Learning Machine with Kernel) is proposed. KELM) decision level fusion method for speech emotion recognition. The main work of this paper is as follows: 1). EEMD algorithm is chosen to extract emotional speech features by nonlinear non-stationary signal processing, and the traditional emotional feature extraction methods assume that the speech is a short-time stationary signal. Aiming at the limitation of the traditional method, this paper extracts the marginal spectral amplitude feature based on the EEMD algorithm, and selects KELM as the recognition network. Based on the Berlin language corpus, a simulation experiment was designed and four emotions (happiness, sadness, anger, neutral) were recognized, and the results were compared with those of the prosodic feature MFCC. Verify the validity of MSA feature. A feature extraction method based on EEMD algorithm is proposed and applied to speech emotion recognition. The speech emotion signal is decomposed into a set of intrinsic mode functions by EEMD algorithm. The effective IMF component is selected by Spearman Rank correlation coefficient, and a new feature of speech emotion is obtained by energy calculation, and the Berlin phonetic corpus is selected for recognition. Compared with the prosodic feature of MFCC, the result shows that IMFE can recognize emotion effectively. The feature level data fusion is applied to speech emotion recognition. Aiming at the problem that the recognition effect of single speech emotion feature is not good, this paper chooses IMFE feature. The prosodic features are fused with MFCC features, and the different combinations of these three features are input into the classifier respectively. The results are simulated in the Berlin speech corpus and compared with the single feature recognition results. The results show that the feature fusion improves the recognition performance to some extent and proves that the three features are complementary. However, because of the simple addition of feature dimension, the recognition rate of feature fusion in some emotions is lower than that of single feature recognition rate. A new speech emotion recognition method based on fusion KELM is proposed, which aims at a single feature. In this paper, the decision level data fusion is applied to the research of speech emotion recognition. Firstly, three kinds of speech emotion features are extracted and the corresponding single classifiers are trained. At the same time, the numerical output of the single classifier is transformed into probabilistic output. Then the adaptive weight of the test set is obtained by the decision strategy, and the decision strategy is based on the probability matrix. Finally, the output probability of each single classifier is linearly weighted and the output is judged. The results show that the recognition rate of the fusion KELM is optimal both in single emotion and in the whole, which is superior to the single feature. Feature fusion and the performance of common decision strategies is an effective method for speech emotion recognition.
【学位授予单位】：太原理工大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN912.34

【相似文献】