电商和微博评论中商品属性与倾向性识别技术的研究与实现
发布时间:2018-07-17 00:13
【摘要】:在电子商务和社交网络席卷世界的今天,用户在网上购物的同时,会发表自己对于商品的观点。这些观点以用户为中心,反映了用户的使用体验,包含了用户对产品的特征、功能和性能等的看法。然而,随着网络评论数量飞速增长,评论内容越来越复杂,很难人工地根据评论内容,得到有用的信息。因此,迫切需要一种自动化的收集用户评论信息,分析用户对商品的评价属性和识别用户观点的技术。商品评论的情感挖掘分析技术正是在这样的背景下产生并迅速发展起来的。 挖掘产品特性、挖掘用户对于产品特征的主要观点以及判断主要观点的情感导向是商品评论的情感挖掘的三个核心问题,本文针对这三个问题进行了深入的研究。同时考虑到网上评论信息的特点,即用户在网上发表评论时通常不太遵守语法规则,句子的语法结构不完整,在句子中常常省略主语,重点分析了隐式主语的抽取问题。对于没有显式主语的句子中,识别并且抽取出真正的主语。 本文的工作主要包括以下三个方面: (1)识别评价对象和评价词。利用POSEM算法来抽取评价对象和评价词二元组。并且,由于网络上产品评论信息的语法结构比较自由,很多句子没有完整的主谓宾结构,提出隐式主语抽取的方法,提高了评价对象和评价词抽取的召回率和准确率。 (2)判定评论者的态度,也就是从用户的评论中找到和产品属性相关的评论者的观点极性。要从用户的评论中找到观点极性,首先需要找到评价词汇,然后综合利用形容词、副词情感词库和领域词库来对词汇的极性进行判断。因为词汇所代表的极性往往与词汇的背景上下文和特定领域是息息相关的。相同的词在不同的领域所代表的情感极性很可能是截然相反的。 (3)设计并实现商品评论信息分析工具,工具主要包括自动抽取网页DOM树中特定标签下的评论信息,数据预处理,主观评价句抽取,评价对象和评价词抽取,评价词极性判别等功能。
[Abstract]:Today, when electronic commerce and social networks sweep the world, users will express their views on goods while shopping online. These views are user-centered and reflect the user's experience, including the user's views on product features, functions and performance. However, with the rapid growth of the number of online comments, the content of comments is becoming more and more complex, and it is difficult to obtain useful information according to the content of comments manually. Therefore, there is an urgent need for an automatic collection of user comment information, analysis of users' evaluation attributes of commodities and identification of user views. It is under this background that the emotion mining and analysis technology of commodity comment is produced and developed rapidly. Mining product characteristics, mining users' main views on product features and judging the emotional orientation of main viewpoints are the three core issues of emotion mining in commodity reviews. This paper makes an in-depth study on these three issues. At the same time, considering the characteristics of online comment information, that is, users usually do not abide by grammatical rules when publishing comments on the Internet, the grammatical structure of sentences is incomplete, subject is often omitted in sentences, and the problem of extracting implicit subjects is analyzed emphatically. For sentences without explicit subjects, the real subject is identified and extracted. The work of this paper mainly includes the following three aspects: (1) Identification of evaluation objects and evaluation words. POSEM algorithm is used to extract the binary groups of evaluation objects and words. Moreover, because the grammatical structure of product comment information on the network is relatively free, many sentences do not have a complete subject-predicate structure, so an implicit subject extraction method is proposed. It improves the recall rate and accuracy of evaluation object and word extraction. (2) judging the attitude of the reviewer, that is, finding the opinion polarity of the reviewer related to the product attribute from the user's comment. In order to find the polarity of the viewpoint from the user's comments, the evaluation vocabulary should be found first, and then the polarity of the vocabulary should be judged by the comprehensive use of adjectives, affective lexicon of adverbs and domain lexicon. Because the polarity of words is often closely related to the context of the context and specific areas of the word. The same words may represent the opposite emotional polarity in different domains. (3) Design and implement a commodity review information analysis tool, which mainly includes automatically extracting comment information under a specific label in the Dom tree of a web page. Data preprocessing, subjective evaluation sentence extraction, evaluation object and evaluation word extraction, evaluation word polarity discrimination and other functions.
【学位授予单位】:东华大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092;TP391.1
本文编号:2128177
[Abstract]:Today, when electronic commerce and social networks sweep the world, users will express their views on goods while shopping online. These views are user-centered and reflect the user's experience, including the user's views on product features, functions and performance. However, with the rapid growth of the number of online comments, the content of comments is becoming more and more complex, and it is difficult to obtain useful information according to the content of comments manually. Therefore, there is an urgent need for an automatic collection of user comment information, analysis of users' evaluation attributes of commodities and identification of user views. It is under this background that the emotion mining and analysis technology of commodity comment is produced and developed rapidly. Mining product characteristics, mining users' main views on product features and judging the emotional orientation of main viewpoints are the three core issues of emotion mining in commodity reviews. This paper makes an in-depth study on these three issues. At the same time, considering the characteristics of online comment information, that is, users usually do not abide by grammatical rules when publishing comments on the Internet, the grammatical structure of sentences is incomplete, subject is often omitted in sentences, and the problem of extracting implicit subjects is analyzed emphatically. For sentences without explicit subjects, the real subject is identified and extracted. The work of this paper mainly includes the following three aspects: (1) Identification of evaluation objects and evaluation words. POSEM algorithm is used to extract the binary groups of evaluation objects and words. Moreover, because the grammatical structure of product comment information on the network is relatively free, many sentences do not have a complete subject-predicate structure, so an implicit subject extraction method is proposed. It improves the recall rate and accuracy of evaluation object and word extraction. (2) judging the attitude of the reviewer, that is, finding the opinion polarity of the reviewer related to the product attribute from the user's comment. In order to find the polarity of the viewpoint from the user's comments, the evaluation vocabulary should be found first, and then the polarity of the vocabulary should be judged by the comprehensive use of adjectives, affective lexicon of adverbs and domain lexicon. Because the polarity of words is often closely related to the context of the context and specific areas of the word. The same words may represent the opposite emotional polarity in different domains. (3) Design and implement a commodity review information analysis tool, which mainly includes automatically extracting comment information under a specific label in the Dom tree of a web page. Data preprocessing, subjective evaluation sentence extraction, evaluation object and evaluation word extraction, evaluation word polarity discrimination and other functions.
【学位授予单位】:东华大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092;TP391.1
【参考文献】
相关期刊论文 前9条
1 龙树全;赵正文;唐华;;中文分词算法概述[J];电脑知识与技术;2009年10期
2 孙立伟;何国辉;吴礼发;;网络爬虫技术的研究[J];电脑知识与技术;2010年15期
3 董静;孙乐;冯元勇;黄瑞红;;中文实体关系抽取中的特征选择研究[J];中文信息学报;2007年04期
4 杨宝珍;;企业市场营销战略创新[J];企业经济;2011年05期
5 于嘉;网络时代的百科全书——维基百科[J];图书馆论坛;2005年04期
6 沈睿芳,郭立甫,时希杰;数据挖掘中的数据预处理模型与算法研究[J];计算机系统应用;2005年07期
7 宋锐;洪莉;林鸿飞;;基于ChunkCRF的观点持有者识别及其在观点摘要中的应用[J];小型微型计算机系统;2009年07期
8 姚天顺,张俐,高竹;WordNet综述[J];语言文字应用;2001年01期
9 陈建美;林鸿飞;杨志豪;;基于语法的情感词汇自动获取[J];智能系统学报;2009年02期
,本文编号:2128177
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2128177.html