中文情绪表达常识库构建及其在情绪分析中的应用
发布时间:2018-07-22 11:01
【摘要】:随着人机交互逐渐被人们所熟知和应用,计算机被期望拥有与人一样的情感、情绪方面处理能力。近年来,社会化媒体的兴起使得用户生成的文本,尤其是带有个人情绪的微博、博客和评论等被大量推送在网络上。网络文本数据推动了对大量真实个体情绪分析和跟踪的研究,在社会、政治、经济等领域显示出重要的研究意义和广阔的应用前景。本课题研究中文情绪基础资源建设及其在文本情绪分析中的应用,从情绪体系模型、情绪词基础资源构建和多标签文本情绪自动分类三个方面分析。本文主要包括以下四项工作:第一,针对中文情绪词典资源较为匮乏的问题,利用英文情绪词典Word Net-Affect,通过机器翻译、噪音过滤和同义扩展步骤,自动构建了一个具有较高质量和覆盖率的中文情绪词表,为文本情绪分析建立可靠的基础资源。第二,目前已有的中文情绪词典普遍存在完善性和精确性等问题,以往研究中,情绪词信息通常只包括词语简单的情绪类别和强度值。本课题认为词语的情绪类型分为表达和认知两种,在本文中主要挖掘词语情绪表达方面蕴含的深层信息,同时引入How Net的词语概念解释来区分词语多义性,在此基础上提出新型标注体系,构建了细粒度中文情绪表达常识库。第三,面对网络文本和词语不断新增的情况,采用基于规则的新词发现方法自动扩充常识库。面对句子短小信息量少和难以识别非情绪词表达情绪的问题,引入词语的义项概念自动扩展句子。第四,将情绪词资源应用在基于语义规则以及基于机器学习的多类标文本情绪分类算法中,通过对比实验发现,本课题构建的中文情绪词词表和情绪表达常识库分类性能优于传统情绪词资源,同时表明,融入了常识库信息的特征表示方法能有效提升基于机器学习方法的分类性能。本课题的贡献在于:一,构建了高质量的中文情绪词表以及目前已知最精细的中文情绪表达常识库。二,采用规则的方法发掘新情绪词可以扩大常识库规模,同时,利用词语概念扩充句子的方法有利于改善文本情绪分析结果。三,相比于传统中文情绪词典以及现有特征表达方法在多标签文本情绪分类中的作用,新词典及新型细粒度中文情绪表达常识库的应用提高了分类性能,体现了它们的优势以及在文本情绪计算应用中的有效性。
[Abstract]:As human-computer interaction is gradually known and applied, computers are expected to have the same emotional and emotional processing abilities as humans. In recent years, the rise of social media has made user-generated texts, especially Weibo, blogs and comments with personal emotions, being heavily pushed online. Web text data promote the research of a large number of real individual emotional analysis and tracking, and show important research significance and broad application prospect in social, political, economic and other fields. This paper studies the construction of Chinese emotional basic resources and its application in text emotion analysis, which is analyzed from three aspects: the emotional system model, the construction of the basic resources of emotional words and the automatic classification of multi-label text emotions. This paper mainly includes the following four tasks: first, aiming at the shortage of Chinese emotion dictionary resources, we use the English emotion dictionary word Net-Affectthrough machine translation, noise filtering and synonymous extension steps. An automatic Chinese emotional lexicon with high quality and coverage is constructed to establish a reliable basic resource for text emotion analysis. Secondly, the existing Chinese emotion dictionaries generally have some problems, such as perfection and accuracy. In previous studies, the information of emotion words usually only includes simple categories of emotions and intensity of words. This thesis holds that the emotion types of words can be divided into expression and cognition. In this paper, the deep information contained in the expression of words' emotions is mainly explored, and the concept of how net is introduced to distinguish the polysemy of words. On this basis, a new annotation system is proposed, and a fine-grained common sense database of Chinese emotion expression is constructed. Thirdly, in the face of the new network text and words, the rule-based new word discovery method is used to automatically expand the common sense database. In the face of the problem that there is little short information in sentences and it is difficult to recognize the expression of emotion by non-emotional words, the concept of meaning of words is introduced to extend sentences automatically. Fourthly, the emotional word resources are applied to the multi-class text emotion classification algorithm based on semantic rules and machine learning. The classification performance of the Chinese emotional vocabulary and the common sense database of emotion expression constructed in this paper is superior to that of the traditional emotional word resources. It is also shown that the feature representation method incorporating the common sense information can effectively improve the classification performance based on the machine learning method. The contributions of this thesis are as follows: first, a high quality Chinese emotional lexicon and the best known common sense database of Chinese emotion expression are constructed. Secondly, the use of rules to discover new emotional words can expand the scale of the common sense database, at the same time, the use of word concepts to expand the sentence method is conducive to improve the text emotional analysis results. Third, compared with the traditional Chinese emotion dictionary and the existing feature expression methods in multi-label text emotion classification, the new dictionary and the new fine-grained Chinese emotion expression common sense database have improved the classification performance. It shows their advantages and effectiveness in the application of text emotion calculation.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP391.1
本文编号:2137228
[Abstract]:As human-computer interaction is gradually known and applied, computers are expected to have the same emotional and emotional processing abilities as humans. In recent years, the rise of social media has made user-generated texts, especially Weibo, blogs and comments with personal emotions, being heavily pushed online. Web text data promote the research of a large number of real individual emotional analysis and tracking, and show important research significance and broad application prospect in social, political, economic and other fields. This paper studies the construction of Chinese emotional basic resources and its application in text emotion analysis, which is analyzed from three aspects: the emotional system model, the construction of the basic resources of emotional words and the automatic classification of multi-label text emotions. This paper mainly includes the following four tasks: first, aiming at the shortage of Chinese emotion dictionary resources, we use the English emotion dictionary word Net-Affectthrough machine translation, noise filtering and synonymous extension steps. An automatic Chinese emotional lexicon with high quality and coverage is constructed to establish a reliable basic resource for text emotion analysis. Secondly, the existing Chinese emotion dictionaries generally have some problems, such as perfection and accuracy. In previous studies, the information of emotion words usually only includes simple categories of emotions and intensity of words. This thesis holds that the emotion types of words can be divided into expression and cognition. In this paper, the deep information contained in the expression of words' emotions is mainly explored, and the concept of how net is introduced to distinguish the polysemy of words. On this basis, a new annotation system is proposed, and a fine-grained common sense database of Chinese emotion expression is constructed. Thirdly, in the face of the new network text and words, the rule-based new word discovery method is used to automatically expand the common sense database. In the face of the problem that there is little short information in sentences and it is difficult to recognize the expression of emotion by non-emotional words, the concept of meaning of words is introduced to extend sentences automatically. Fourthly, the emotional word resources are applied to the multi-class text emotion classification algorithm based on semantic rules and machine learning. The classification performance of the Chinese emotional vocabulary and the common sense database of emotion expression constructed in this paper is superior to that of the traditional emotional word resources. It is also shown that the feature representation method incorporating the common sense information can effectively improve the classification performance based on the machine learning method. The contributions of this thesis are as follows: first, a high quality Chinese emotional lexicon and the best known common sense database of Chinese emotion expression are constructed. Secondly, the use of rules to discover new emotional words can expand the scale of the common sense database, at the same time, the use of word concepts to expand the sentence method is conducive to improve the text emotional analysis results. Third, compared with the traditional Chinese emotion dictionary and the existing feature expression methods in multi-label text emotion classification, the new dictionary and the new fine-grained Chinese emotion expression common sense database have improved the classification performance. It shows their advantages and effectiveness in the application of text emotion calculation.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP391.1
【参考文献】
相关期刊论文 前1条
1 徐睿峰;邹承天;郑燕珍;徐军;桂林;刘滨;王晓龙;;一种基于情绪表达与情绪认知分离的新型情绪词典[J];中文信息学报;2013年06期
,本文编号:2137228
本文链接:https://www.wllwen.com/jingjilunwen/zhengzhijingjixuelunwen/2137228.html