信息抽取中情感要素挖掘的关键技术研究
发布时间:2018-05-11 02:44
本文选题:情感要素抽取 + 集成学习 ; 参考:《北京邮电大学》2015年硕士论文
【摘要】:近年来,在互联网的浪潮中,信息逐渐显示出它巨大的力量。其中,迅速发展的社交网络催生了自媒体的产生,大量带有主观情感倾向的信息涌现出来。那么,人们如何在这样的信息海洋中找到有用信息,成为一个棘手的问题。近几十年来,随着自然语言处理(Natural Language Processing)技术的深入发展,人们可以借助信息抽取(Information Extraction)的方法来在海量数据中寻找自己感兴趣的关键信息。而在信息抽取中,情感要素挖掘是一个很重要的方向。它关注于与一些用户情感相关的信息,如情感来源,情感受体及情感的正负倾向。这些信息由于带有主观色彩,往往具有更重要的价值。特别是在互联网时代,很多大型公司的广告投放,推荐系统等都需要这些信息。 基于此背景,本文研究了情感要素挖掘近年的发展方向,设计并实现了一个基于经典信息抽取方法的情感要素挖掘系统;并提出了一种基于条件随机场与专家系统相结合的集成学习混合模型;此外,还利用外部语义信息,提升了机器翻译模型在情感要素挖掘中的性能。本文的主要工作有如下几个方面: 1.基于集成学习(Ensemble Learning)思想实现模型融合(Models Combining),将规则方法融入到半监督学习中,寻找一种可行的方式将人工先验知识建模到机器学习模型中,并取得比仅使用单个模型更好的效果。 2.通过机器翻译(Machine Translation)模型部分代替句法分析工具,在特定场景中实现情感要素抽取;并提出一种融合外部语义信息与新词发现的提升方法。 3.设计并实现了一个基于经典信息抽取方法的情感要素挖掘系统,在中文语料的预处理中,提出一种针对中文语料的可扩展的新词发现方法。
[Abstract]:In recent years, in the tide of the Internet, information gradually shows its great power. Among them, the rapid development of social networks spawned the emergence of self-media, a large number of information with subjective emotional tendencies emerged. So, how people find useful information in such a sea of information becomes a thorny problem. In recent decades, with the development of Natural Language processing (NLP) technology, people can find the key information they are interested in the massive data by the method of information extraction. In information extraction, emotional factor mining is a very important direction. It focuses on information related to some users' emotions, such as emotional sources, emotional receptors and positive and negative tendencies. Because of the subjective color, these information often have more important value. Especially in the Internet era, many large companies advertising, recommendation systems and so on need this information. Based on this background, this paper studies the development direction of emotional factor mining in recent years, and designs and implements an emotional factor mining system based on classical information extraction method. An integrated learning hybrid model based on conditional random field and expert system is proposed. In addition, the performance of machine translation model in emotion element mining is improved by using external semantic information. The main work of this paper is as follows: 1. Based on the idea of Ensemble learning, this paper implements the model fusion of models and combines models into semi-supervised learning, and finds a feasible way to model artificial priori knowledge into machine learning model, and achieves better results than only using a single model. 2. In this paper, the machine translation model is used to partially replace the syntactic analysis tool to extract emotion elements in a specific scene, and a method to improve the performance of the model is proposed, which combines the external semantic information with the discovery of new words. 3. An emotion element mining system based on classical information extraction method is designed and implemented. In the preprocessing of Chinese corpus, an extensible new word discovery method for Chinese corpus is proposed.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP391.1
【共引文献】
相关期刊论文 前10条
1 吕美香;何琳;李s,
本文编号:1872088
本文链接:https://www.wllwen.com/wenyilunwen/guanggaoshejilunwen/1872088.html