基于语音数据的有效特征分析及其在抑郁水平评估中的应用
发布时间:2018-11-19 11:19
【摘要】:抑郁症作为一种常见的精神障碍,具有发病率、复发率、自杀率高,知晓率、治疗率低等特点,严重危害个人身心健康。近年来,随着社会压力不断增大,抑郁症发病率逐年上升,全球约有3亿人正在遭受抑郁症的折磨。而当前诊断严重依赖医师临床经验和患者自我描述,受主观因素影响较大。因此需要一个客观、有效、便捷的评估手段辅助抑郁诊断,语音以其非侵入、低成本的优点成为检测抑郁症有力的客观指标。使用语音检测抑郁的研究按照研究方法可分为横断面研究和追踪研究。追踪研究是指抑郁治疗过程中定期采集抑郁患者语音数据,关注语音特征随抑郁水平变化的趋势,但该研究只能关注个体语音变化情况,所得结论在人群分类上不一定适用;横断面研究在一个较短时间区间内采集语音数据,关注正常、抑郁人群在语音特征上的差异,但较少关注不同抑郁水平人群的分类问题,且由于个体差异,存在不一致的结论;不同于单纯地区别正常、抑郁人群,不同抑郁水平人群在心理、生理上存在的差异不明显,检测语音特征差异存在困难。目前尚未有研究明确提出能够有效区分不同抑郁水平人群的特征。为解决这一问题,同时考虑个体差异的存在,本文进行了一系列工作,主要工作及贡献如下:(1)构建语音数据集,并引入相关研究未讨论过的新特征。本文选取132名被试(72名女性,60名男性被试),依据量表分数,将受试人群分为正常、轻度抑郁、重度抑郁三类,对其年龄、学历、职业等因素进行匹配,减少干扰因素的影响,采用该领域常用的范式与情绪刺激激发语音,构建三分类语音数据集。数据集共包含14类特征,包括相关研究列举的经典特征,未被讨论过的新特征。(2)应用统计分析及数据降维中的多种方法进行有效特征筛选,发现了多个能有效区分不同抑郁水平人群的特征集,均为韵律与频谱特征的组合,如声音强度等韵律特征,以及梅尔频率与LPC系数等频谱特征。男、女性数据分别选出了5个、4个特征集,在三分类问题上取得了较好的分类结果。(3)利用上述特征集建立多特征集综合决策分类系统,应用于语音数据中,改善了使用语音数据评估抑郁水平的效果。本文使用GMM建立多特征集决策系统,在多个特征集上分别训练模型,然后对预测结果进行决策融合,在男、女数据上分别得到了70%、75%的分类准确率,与相关研究相比有所上升。本文构建了一个基于抑郁水平的三分类语音数据集,并在这一数据集上,利用多种统计分析及数据降维方法发现了多个有效特征集,对语音数据的多分类效果较好;并利用它们建立了多特征集综合决策分类系统,与相关研究相比,提高了抑郁水平评估的准确率。这一成果为使用语音数据评估抑郁水平提供了基础。
[Abstract]:As a common mental disorder, depression has the characteristics of high incidence, relapse rate, suicide rate, awareness rate and low treatment rate, which seriously endangers the physical and mental health of individuals. In recent years, with the increasing social pressure, the incidence of depression is increasing year by year, about 300 million people worldwide are suffering from depression. At present, diagnosis depends heavily on physician's clinical experience and patient's self-description, which is greatly influenced by subjective factors. Therefore, it needs an objective, effective and convenient evaluation method to assist the diagnosis of depression. Speech has become a powerful objective index for the detection of depression because of its non-invasive and low-cost advantages. The study of depression using speech test can be divided into cross-sectional study and tracking study according to the research method. The tracking study refers to the regular collection of speech data of depression patients during the course of depression treatment and the tendency of phonological characteristics changing with depression level. However, this study only focuses on individual phonetic changes, and the conclusions are not necessarily applicable to the classification of population. Cross-sectional study collected voice data in a relatively short time interval, focusing on the differences in speech characteristics of normal and depressed people, but less on the classification of people with different levels of depression, and because of individual differences, there are inconsistent conclusions; Different from the simple difference between normal, depressed and depressed people, there are no obvious differences in psychology and physiology, so it is difficult to detect the difference of phonological characteristics. At present, there is no clear research that can effectively distinguish the characteristics of people with different levels of depression. In order to solve this problem and consider the existence of individual differences, a series of work has been done in this paper. The main work and contributions are as follows: (1) the speech data set is constructed, and some new features that have not been discussed in relevant research are introduced. 132 subjects (72 women and 60 men) were selected. According to the score of the scale, the subjects were divided into three groups: normal, mild depression and severe depression. Their age, educational background, occupation and other factors were matched. In order to reduce the influence of interference factors, we construct three kinds of speech data sets by using the usual paradigm and emotion stimulation in this field. The data set contains 14 kinds of features, including classical features listed by related studies, and new features that have not been discussed. (2) effective feature selection is carried out by using various methods in statistical analysis and data dimensionality reduction. Several characteristic sets which can effectively distinguish different depression levels were found, all of which were the combination of prosody and spectral characteristics, such as sound intensity, Mayer frequency and LPC coefficient. Five or four feature sets are selected for male and female data, and good classification results are obtained. (3) A multi-feature set comprehensive decision classification system is established based on the above feature sets, which is applied to speech data. Improved use of voice data to assess depression levels. In this paper, GMM is used to set up a multi-feature collection decision system. The model is trained on several feature sets, and then the prediction results are fused, and the classification accuracy of 70% or 75% is obtained in the data of male and female. There was an increase compared with related studies. In this paper, a three-classification speech data set based on depression level is constructed, and on this dataset, several effective feature sets are found by using a variety of statistical analysis and data dimensionality reduction methods, which have a good effect on multi-classification of speech data. A comprehensive decision classification system was established by using them, which improved the accuracy of evaluation of depression level compared with related research. The results provide a basis for using voice data to assess depression levels.
【学位授予单位】:兰州大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN912.3
本文编号:2342147
[Abstract]:As a common mental disorder, depression has the characteristics of high incidence, relapse rate, suicide rate, awareness rate and low treatment rate, which seriously endangers the physical and mental health of individuals. In recent years, with the increasing social pressure, the incidence of depression is increasing year by year, about 300 million people worldwide are suffering from depression. At present, diagnosis depends heavily on physician's clinical experience and patient's self-description, which is greatly influenced by subjective factors. Therefore, it needs an objective, effective and convenient evaluation method to assist the diagnosis of depression. Speech has become a powerful objective index for the detection of depression because of its non-invasive and low-cost advantages. The study of depression using speech test can be divided into cross-sectional study and tracking study according to the research method. The tracking study refers to the regular collection of speech data of depression patients during the course of depression treatment and the tendency of phonological characteristics changing with depression level. However, this study only focuses on individual phonetic changes, and the conclusions are not necessarily applicable to the classification of population. Cross-sectional study collected voice data in a relatively short time interval, focusing on the differences in speech characteristics of normal and depressed people, but less on the classification of people with different levels of depression, and because of individual differences, there are inconsistent conclusions; Different from the simple difference between normal, depressed and depressed people, there are no obvious differences in psychology and physiology, so it is difficult to detect the difference of phonological characteristics. At present, there is no clear research that can effectively distinguish the characteristics of people with different levels of depression. In order to solve this problem and consider the existence of individual differences, a series of work has been done in this paper. The main work and contributions are as follows: (1) the speech data set is constructed, and some new features that have not been discussed in relevant research are introduced. 132 subjects (72 women and 60 men) were selected. According to the score of the scale, the subjects were divided into three groups: normal, mild depression and severe depression. Their age, educational background, occupation and other factors were matched. In order to reduce the influence of interference factors, we construct three kinds of speech data sets by using the usual paradigm and emotion stimulation in this field. The data set contains 14 kinds of features, including classical features listed by related studies, and new features that have not been discussed. (2) effective feature selection is carried out by using various methods in statistical analysis and data dimensionality reduction. Several characteristic sets which can effectively distinguish different depression levels were found, all of which were the combination of prosody and spectral characteristics, such as sound intensity, Mayer frequency and LPC coefficient. Five or four feature sets are selected for male and female data, and good classification results are obtained. (3) A multi-feature set comprehensive decision classification system is established based on the above feature sets, which is applied to speech data. Improved use of voice data to assess depression levels. In this paper, GMM is used to set up a multi-feature collection decision system. The model is trained on several feature sets, and then the prediction results are fused, and the classification accuracy of 70% or 75% is obtained in the data of male and female. There was an increase compared with related studies. In this paper, a three-classification speech data set based on depression level is constructed, and on this dataset, several effective feature sets are found by using a variety of statistical analysis and data dimensionality reduction methods, which have a good effect on multi-classification of speech data. A comprehensive decision classification system was established by using them, which improved the accuracy of evaluation of depression level compared with related research. The results provide a basis for using voice data to assess depression levels.
【学位授予单位】:兰州大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN912.3
【参考文献】
相关期刊论文 前4条
1 龚栩;黄宇霞;王妍;罗跃嘉;;中国面孔表情图片系统的修订[J];中国心理卫生杂志;2011年01期
2 韩一;王国胤;杨勇;;基于MFCC的语音情感识别[J];重庆邮电大学学报(自然科学版);2008年05期
3 林奕琳;韦岗;杨康才;;语音情感识别的研究进展[J];电路与系统学报;2007年01期
4 徐琳宏;林鸿飞;潘宇;任惠;陈建美;;情感词汇本体的构造[J];情报学报;2008年02期
,本文编号:2342147
本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/2342147.html