基于条件随机场的微博情感对象识别研究
发布时间:2019-04-12 13:20
【摘要】:近年来社交网络飞速发展,越来越多的人通过微博来进行信息交换和分享。由于微博具有短小精悍,使用便捷,传播迅速等特点,使得其广受欢迎。用户乐于在微博上分享自己的观点或体验,这使得微博中存在着大量具有情感倾向的用户评论信息。随着这样的评论信息迅速膨胀,仅靠人工的方法难以应对海量信息的处理和分析。因此,如何利用计算机技术对微博中的评论数据进行有效的加工处理和分析挖掘己成为当前热门研究问题,情感对象识别研究就是用于解决这个问题的一种非常有效的途径。 本文主要是针对中文微博文本进行情感对象识别研究,然而对非结构化的文本进行情感对象识别本身就是一个困难的问题,现有研究往往存在一些不足之处。一方面,微博和传统文本是有区别的,其表达简短且具有较大的自由性,通常不是规范的中文语言表达,现有的基础中文文本处理工具并不能很好的适用于微博这种特殊的文本,这为情感对象识别任务提高了难度。为了解决这个问题,本文提出对微博文本进行了规范化处理并构建了包括网络用语词典、表情词典、情感词典和否定词词典等在内的多个词典,通过这种方式不但能够改善现有文本处理工具对微博进行分词和句法依赖解析,而且还能够更加有效地结合上下文信息进行特征提取。另一方面,针对文本中显性出现的情感对象,目前一些方法已经能够有效的识别,但是面对隐性的情感对象时还是显得力不从心。因此,当情感对象直接出现在文本中时,本文采用条件随机场模型和分类模型相融合的方式进行情感对象识别;而对于情感对象并不出现在文本中时,则尝试对蕴含的情感对象进行抽象化处理,提出了一种包含隐节点的条件随机场改进模型用于识别隐藏情感对象。 本课题研究的核心思想是将情感对象识别问题看成序列标记问题,利用条件随机场模型在句子级的微博文本上进行对象标注,模型综合利用多种特征改善识别准确度。在实验部分,本文在公开评测数据集和自建数据集两个数据集上进行了实验验证和评估,结果表明模型不但能够较好识别出微博中显性的情感对象,还能够识别出隐藏情感对象。
[Abstract]:In recent years, with the rapid development of social networks, more and more people use Weibo to exchange and share information. Weibo is popular because it is short, easy to use and spread quickly. Users are happy to share their views or experiences on Weibo, which leads to a large number of emotional user comments in Weibo. With the rapid expansion of such comment information, it is difficult to deal with the massive information processing and analysis only by artificial method. Therefore, how to process and mine the comment data in Weibo effectively by using computer technology has become a hot research problem at present. Emotion object recognition is a very effective way to solve this problem. This paper mainly focuses on the emotional object recognition of Chinese Weibo text. However, the emotional object recognition of unstructured text is a difficult problem in itself, and there are often some shortcomings in the existing research. On the one hand, Weibo is different from traditional text in that it is short and free, and is usually not a canonical Chinese language. The existing basic Chinese text processing tools are not suitable for the special text such as Weibo, which makes the task of emotional object recognition more difficult. In order to solve this problem, this paper proposes to normalize the Weibo text and construct a number of dictionaries including network dictionary, expression dictionary, emotion dictionary and negative word dictionary, etc. This approach can not only improve the existing text processing tools for word segmentation and syntactic dependency analysis of Weibo, but also can more effectively combine context information for feature extraction. On the other hand, some methods have been able to effectively identify the explicit emotional objects in the text, but they still appear to be weak in the face of implicit emotional objects. Therefore, when emotional objects appear directly in the text, this paper uses the combination of conditional random field model and classification model to identify emotional objects. When the emotion object does not appear in the text, the implied emotion object is abstracted, and a modified conditional random field model with hidden nodes is proposed to identify hidden emotion object. The key idea of this paper is to consider the emotional object recognition as a sequence marking problem. The conditional random field model is used to label the object on the sentence-level Weibo text. The model comprehensively uses a variety of features to improve the recognition accuracy. In the experiment part, two sets of open evaluation data set and self-built data set are tested and evaluated. The results show that the model can not only recognize the dominant emotional objects in Weibo well. It can also identify hidden emotional objects.
【学位授予单位】:广东工业大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092;TP391.1
本文编号:2457053
[Abstract]:In recent years, with the rapid development of social networks, more and more people use Weibo to exchange and share information. Weibo is popular because it is short, easy to use and spread quickly. Users are happy to share their views or experiences on Weibo, which leads to a large number of emotional user comments in Weibo. With the rapid expansion of such comment information, it is difficult to deal with the massive information processing and analysis only by artificial method. Therefore, how to process and mine the comment data in Weibo effectively by using computer technology has become a hot research problem at present. Emotion object recognition is a very effective way to solve this problem. This paper mainly focuses on the emotional object recognition of Chinese Weibo text. However, the emotional object recognition of unstructured text is a difficult problem in itself, and there are often some shortcomings in the existing research. On the one hand, Weibo is different from traditional text in that it is short and free, and is usually not a canonical Chinese language. The existing basic Chinese text processing tools are not suitable for the special text such as Weibo, which makes the task of emotional object recognition more difficult. In order to solve this problem, this paper proposes to normalize the Weibo text and construct a number of dictionaries including network dictionary, expression dictionary, emotion dictionary and negative word dictionary, etc. This approach can not only improve the existing text processing tools for word segmentation and syntactic dependency analysis of Weibo, but also can more effectively combine context information for feature extraction. On the other hand, some methods have been able to effectively identify the explicit emotional objects in the text, but they still appear to be weak in the face of implicit emotional objects. Therefore, when emotional objects appear directly in the text, this paper uses the combination of conditional random field model and classification model to identify emotional objects. When the emotion object does not appear in the text, the implied emotion object is abstracted, and a modified conditional random field model with hidden nodes is proposed to identify hidden emotion object. The key idea of this paper is to consider the emotional object recognition as a sequence marking problem. The conditional random field model is used to label the object on the sentence-level Weibo text. The model comprehensively uses a variety of features to improve the recognition accuracy. In the experiment part, two sets of open evaluation data set and self-built data set are tested and evaluated. The results show that the model can not only recognize the dominant emotional objects in Weibo well. It can also identify hidden emotional objects.
【学位授予单位】:广东工业大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092;TP391.1
【参考文献】
相关期刊论文 前7条
1 谢丽星;周明;孙茂松;;基于层次结构的多策略中文微博情感分析和特征抽取[J];中文信息学报;2012年01期
2 王荣洋;鞠久朋;李寿山;周国栋;;基于CRFs的评价对象抽取特征研究[J];中文信息学报;2012年02期
3 徐冰;赵铁军;王山雨;郑德权;;基于浅层句法特征的评价对象抽取研究[J];自动化学报;2011年10期
4 周胜臣;瞿文婷;石英子;施询之;孙韵辰;;中文微博情感分析研究综述[J];计算机应用与软件;2013年03期
5 郑敏洁;雷志城;廖祥文;陈国龙;;基于层叠CRFs的中文句子评价对象抽取[J];中文信息学报;2013年03期
6 阳爱民;林江豪;周咏梅;;中文文本情感词典构建方法[J];计算机科学与探索;2013年11期
7 宋晖;史南胜;;基于模式匹配与半监督学习的评价对象抽取[J];计算机工程;2013年10期
,本文编号:2457053
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2457053.html