当前位置:主页 > 科技论文 > 搜索引擎论文 >

实体检索结果倾向性分析

发布时间:2018-02-12 13:05

  本文关键词: 信息检索 情感分析 实体检索 句子领域识别 句子情感分类 出处:《哈尔滨工业大学》2012年硕士论文 论文类型:学位论文


【摘要】:随着论坛等互联网社区的蓬勃发展,越来越多用户参与到互联网的建设中来,向互联网贡献数据。这些数据中很大一部分是对人物和事件的评论,包含了用户的观点和态度。浏览这些信息能够帮助用户了解舆论大众对自己所关心事物的看法。互联网中的情感信息是海量的,很难依靠人工方法收集和整理。搜索引擎是人们获取信息的主要方式,但是搜索引擎关注的是事实相关的文档,忽略了文档中的情感信息。因此,本文将情感分析技术和搜索技术结合起来,当搜索引擎接入的检索串是实体时,以搜索引擎的检索结果为研究对象,分析包含实体的句子对实体的情感倾向。分析结果可以支撑情感检索、信息过滤等任务,具有很大的实用价值。本文中研究的实体包括数码产品、人物、机构和政策法规。 首先,,本文提出了实体相关句识别问题的解决方法。该方法采用SVM分类算法,使用实体到评价词语的依存句法路径等特征,从包含实体的句子中选取真正和实体相关的句子,即评价对象是实体的句子。该方法能够将相关句比例由不进行实体相关句识别时的77.5%提高到85.85%。 然后,本文提出了基于上下文扩展的句子领域识别方法,该方法将包含实体的句子及其前后各两个句子看作一个整体,并用这个整体表示包含实体的句子,并对其进行分类。这种方法扩充了待分类句子的内容,一定程度上解决了数据稀疏问题。与直接对包含实体的句子进行分类的方法相比,该方法显著提高了分类的准确率,但是政策法规和机构的识别效果较差。通过分析发现,政策法规和机构的特征分布极其相似,这也造成了这两个类别识别性能较差。 最后,本文对包含实体的句子进行了情感分类,将包含实体的句子分为褒义、贬义和客观3类。本文采用SVM分类算法,使用评价词语和unigram两种特征,并采用信息增益对unigram特征进行特征选择。实验结果表明,同时使用评价词语和unigram两种特征取得的效果比单独使用其中一种特征取得的效果好。另外,通过分析unigram特征维数对情感分类性能的影响,发现随着特征维数的增加分类准确率很快就达到了饱和,这也说明特征选择对句子级情感分类是极其必要的。
[Abstract]:With the boom of Internet communities such as forums, more and more users are involved in the construction of the Internet, contributing data to the Internet. Much of this data is about people and events. It contains the views and attitudes of the users. Browsing this information can help users understand the public opinion of what they care about. The emotional information in the Internet is huge. Search engine is the main way for people to get information, but the search engine is concerned about the documents related to facts and neglects the emotional information in the documents. In this paper, the emotion analysis technology and search technology are combined, when the search engine access search string is entity, the search results of search engine as the research object. The analysis results can support the tasks of emotional retrieval, information filtering and so on, which are of great practical value. The entities studied in this paper include digital products, people, institutions and policies and regulations. First of all, this paper proposes a method to solve the problem of entity related sentence recognition. The method uses SVM classification algorithm and the dependent syntactic path from entity to evaluative word to select the real and entity related sentence from the sentence containing entity. This method can increase the proportion of related sentences from 77.5% to 85.85. Then, a context-extended sentence domain recognition method is proposed, in which the sentences containing entities and the two sentences before and after the sentences are regarded as a whole, and the sentences containing entities are represented by this whole. This method extends the content of the sentence to be classified and solves the problem of sparse data to some extent. Compared with the method of directly classifying the sentences containing entities, this method improves the accuracy of classification significantly. Through analysis, it is found that the distribution of the characteristics of policies, regulations and institutions is very similar, which results in the poor recognition performance of these two categories. Finally, this paper classifies the sentences containing entities into three categories: positive, derogatory and objective. In this paper, we use SVM classification algorithm, use evaluation words and unigram features. The information gain is used to select the features of unigram. The experimental results show that the effect of using both evaluative words and unigram features is better than that of using one of the features alone. By analyzing the effect of unigram feature dimension on the performance of emotion classification, it is found that the accuracy of feature classification reaches saturation with the increase of feature dimension, which also shows that feature selection is extremely necessary for sentence level emotion classification.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3

【参考文献】

相关期刊论文 前2条

1 朱嫣岚;闵锦;周雅倩;黄萱菁;吴立德;;基于HowNet的词汇语义倾向计算[J];中文信息学报;2006年01期

2 赵妍妍;秦兵;车万翔;刘挺;;基于句法路径的情感评价单元识别[J];软件学报;2011年05期



本文编号:1505701

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1505701.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户933bc***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com