基于CRF和名词短语识别的中文微博情感要素抽取
发布时间:2018-03-20 21:51
本文选题:情感要素 切入点:条件随机场 出处:《大连理工大学》2014年硕士论文 论文类型:学位论文
【摘要】:随着信息技术的发展,信息的发布和传播速度越来越快,如何从海量数据中提取有价值的信息显得越来越重要。微博作为近年来新的社交平台工具发展很快,用户数量庞大,除了主动发布信息,还可以通过话题的方式参与讨论,话题的类别多种多样,很多有价值的话题的讨论内容会带有作者的主观意愿。如何分析出这些话题微博的情感要素就是本文的研究内容,情感要素的抽取包括情感对象的抽取和情感倾向的判断。 在情感倾向判断问题上,由于中文微博可以包含较大的信息量,一条微博可能含有多个情感对象,因此基于机器学习的情感倾向分类较难以划分边界。本文采用建立词典的方法对情感对象的情感倾向进行判断,通过词典的匹配形成情感单元,使用情感单元的情感值判断情感对象的情感倾向。 在情感对象抽取问题上,本文使用条件随机场(CRF)模型进行情感对象抽取。结合词形、词性、是否为情感词和依存信息等语义特征,实现对情感对象的自动抽取。该方法在闭式测试中效果较好,但开式测试效果较差。造成结果的原因很大一部分是CRF方法的训练语料规模不够,但人工标注语料的成本过高,语料规模难以扩大。 由于CRF方法在该问题上的表现不佳,本文提出一种基于名词短语识别的候选情感对象表自动生成的方法,该方法结合依存信息对候选情感对象进行有效的过滤,得到候选情感对象表,利用该表对CRF未识别出情感对象的句子进行情感对象抽取。实验表明该方法在情感对象抽取问题上较为有效。
[Abstract]:With the development of information technology, the speed of information dissemination and dissemination is getting faster and faster. How to extract valuable information from massive data becomes more and more important. Weibo, as a new social platform tool, has developed rapidly in recent years and has a large number of users. In addition to actively publishing information, you can also participate in the discussion through the way of topics, there are many kinds of topics, How to analyze the emotional elements of Weibo is the research content of this paper. The extraction of emotional elements includes the extraction of emotional objects and the judgment of emotional tendency. On the issue of emotional disposition judgment, as Chinese Weibo can contain a large amount of information, a Weibo may contain more than one emotional object. Therefore, the classification of emotion tendency based on machine learning is difficult to divide the boundary. In this paper, we use the method of establishing dictionary to judge the emotion tendency of emotion object, and form the emotion unit by matching the dictionary. The emotion value of the emotion unit is used to judge the emotional tendency of the emotion object. In the problem of emotional object extraction, we use conditional random field (CRF) model to extract affective object, combining semantic features such as word form, part of speech, whether emotional word and dependent information, etc. The effect of this method is good in closed test, but the effect of open test is poor. The reason of the result is that the scale of training corpus of CRF method is not enough, but the cost of manual tagging is too high. The scale of the corpus is difficult to expand. Due to the poor performance of the CRF method on this issue, this paper proposes a method of automatic generation of candidate emotional object tables based on noun phrase recognition, which combines dependency information to filter candidate emotional objects effectively. A list of candidate emotional objects is obtained and used to extract emotional objects from sentences that are not recognized by CRF. Experiments show that this method is more effective in the problem of emotional object extraction.
【学位授予单位】:大连理工大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP391.1;TP393.092
【参考文献】
相关期刊论文 前10条
1 李艺红;蒋秀凤;;中文句子倾向性分析[J];福州大学学报(自然科学版);2010年04期
2 孙艳;周学广;付伟;;基于主题情感混合模型的无监督文本情感分析[J];北京大学学报(自然科学版);2013年01期
3 苏杰;缪裕青;刘少兵;吴孔玲;;基于语义倾向计算器的情感分析方法[J];桂林电子科技大学学报;2012年04期
4 刘志明;刘鲁;;基于机器学习的中文微博情感分类实证研究[J];计算机工程与应用;2012年01期
5 张昱琪,周强;汉语基本短语的自动识别[J];中文信息学报;2002年06期
6 刘鸿宇;赵妍妍;秦兵;刘挺;;评价对象抽取及其倾向性分析[J];中文信息学报;2010年01期
7 谢丽星;周明;孙茂松;;基于层次结构的多策略中文微博情感分析和特征抽取[J];中文信息学报;2012年01期
8 杨亮;林原;林鸿飞;;基于情感分布的微博热点事件发现[J];中文信息学报;2012年01期
9 庞磊;李寿山;周国栋;;基于情绪知识的中文微博情感分类方法[J];计算机工程;2012年13期
10 韩忠明;张玉沙;张慧;万月亮;黄今慧;;有效的中文微博短文本倾向性分类算法[J];计算机应用与软件;2012年10期
,本文编号:1640939
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1640939.html