基于中文微博的产品评价分类及推荐算法研究
发布时间:2018-10-16 17:05
【摘要】:微博是近年新兴的网络媒体传播平台,它具有内容简短、传播速度快、用户众多等特点,而对于微博文本的情感分析是近年来数据挖掘的热点之一,具有重要意义和价值。用户在实施网上购物等行为时,都希望从微博上获取关注产品的评价信息。本文针对中文微博产品评价信息挖掘中存在的文本格式不规范、网络用语大量使用、成分省略等文本特点,及标记数据稀缺、手工标注困难等分类问题开展了如下几项研究工作。 针对中文微博的文本特点,提出了一种情感评价单元构建方法。该方法分别构建了情感评价词、副词和评价对象词典,并制定了相应的成分补充和单元构建规则,不仅保证了提取信息的全面性和准确性,还在精简词集、提高效率方面做出了尝试。实验表明,该方法的准确性比基于句法路径的相关方法更高。 针对微博文本的分类问题,提出了一种基于图半监督学习的分类算法LP-SVM。该算法将标签扩散过程与支持向量机相结合,不仅实现了少量标记样本的分类,而且避免了图半监督学习不产生分类器,对于新数据只能重新训练的问题。结合该算法对微博产品的情感评价单元进行特征提取和半监督分类。实验表明,该算法的表现优于传统及直推式的支持向量机算法。 结合实际应用,提出了一种基于评价分类的微博产品推荐算法。该算法利用产品评价分类的结果,并结合微博的文本特征,制定了微博产品推荐指标及其计算方法。实验最终得到的微博产品推荐方案与相关网站用户评价结果基本一致,充分验证了该算法的准确性。
[Abstract]:Weibo is a newly emerging network media communication platform in recent years. It has the characteristics of short content, fast communication speed, numerous users, etc. The emotional analysis of Weibo text is one of the hot spots of data mining in recent years, which has important significance and value. Users in the implementation of online shopping and other activities, they hope to obtain from Weibo concerned product evaluation information. In this paper, the text format of Chinese Weibo product evaluation information mining is not standard, the network language is used extensively, the composition is omitted and so on, and the marking data is scarce. The following research work has been carried out on the classification problems such as the difficulty of manual marking. According to the characteristics of Chinese Weibo, a method of constructing emotion evaluation unit is proposed. The method constructs the dictionary of emotion evaluation words, adverbs and evaluation objects, and formulates the corresponding component supplement and unit construction rules, which not only ensures the comprehensiveness and accuracy of extracting information, but also simplifies the word set. An attempt was made to improve efficiency. Experiments show that the proposed method is more accurate than the syntactic path-based correlation method. Aiming at the problem of Weibo text classification, this paper proposes a classification algorithm LP-SVM. based on graph semi-supervised learning. The algorithm combines tag diffusion process with support vector machine (SVM), which not only realizes the classification of a small number of labeled samples, but also avoids the problem of not producing classifiers in graph semi-supervised learning, so that the new data can only be retrained. Based on this algorithm, the feature extraction and semi-supervised classification of Weibo product emotion evaluation unit are carried out. Experimental results show that the proposed algorithm is superior to the traditional and direct push support vector machine (SVM) algorithms. Combined with practical application, a Weibo product recommendation algorithm based on evaluation and classification is proposed. Based on the results of product evaluation and classification and Weibo's text features, the proposed algorithm formulates the recommended product index and its calculation method. The experimental results of Weibo product recommendation are basically consistent with the results of user evaluation of related websites, which fully verify the accuracy of the algorithm.
【学位授予单位】:大连理工大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
本文编号:2275057
[Abstract]:Weibo is a newly emerging network media communication platform in recent years. It has the characteristics of short content, fast communication speed, numerous users, etc. The emotional analysis of Weibo text is one of the hot spots of data mining in recent years, which has important significance and value. Users in the implementation of online shopping and other activities, they hope to obtain from Weibo concerned product evaluation information. In this paper, the text format of Chinese Weibo product evaluation information mining is not standard, the network language is used extensively, the composition is omitted and so on, and the marking data is scarce. The following research work has been carried out on the classification problems such as the difficulty of manual marking. According to the characteristics of Chinese Weibo, a method of constructing emotion evaluation unit is proposed. The method constructs the dictionary of emotion evaluation words, adverbs and evaluation objects, and formulates the corresponding component supplement and unit construction rules, which not only ensures the comprehensiveness and accuracy of extracting information, but also simplifies the word set. An attempt was made to improve efficiency. Experiments show that the proposed method is more accurate than the syntactic path-based correlation method. Aiming at the problem of Weibo text classification, this paper proposes a classification algorithm LP-SVM. based on graph semi-supervised learning. The algorithm combines tag diffusion process with support vector machine (SVM), which not only realizes the classification of a small number of labeled samples, but also avoids the problem of not producing classifiers in graph semi-supervised learning, so that the new data can only be retrained. Based on this algorithm, the feature extraction and semi-supervised classification of Weibo product emotion evaluation unit are carried out. Experimental results show that the proposed algorithm is superior to the traditional and direct push support vector machine (SVM) algorithms. Combined with practical application, a Weibo product recommendation algorithm based on evaluation and classification is proposed. Based on the results of product evaluation and classification and Weibo's text features, the proposed algorithm formulates the recommended product index and its calculation method. The experimental results of Weibo product recommendation are basically consistent with the results of user evaluation of related websites, which fully verify the accuracy of the algorithm.
【学位授予单位】:大连理工大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
【参考文献】
相关期刊论文 前10条
1 肖建鹏;张来顺;任星;;直推式支持向量机在Web信息抽取中的应用研究[J];计算机工程与应用;2009年02期
2 刘志明;刘鲁;;基于机器学习的中文微博情感分类实证研究[J];计算机工程与应用;2012年01期
3 周立柱;贺宇凯;王建勇;;情感分析研究综述[J];计算机应用;2008年11期
4 杨经;林世平;;基于SVM的文本词句情感分析[J];计算机应用与软件;2011年09期
5 朱嫣岚;闵锦;周雅倩;黄萱菁;吴立德;;基于HowNet的词汇语义倾向计算[J];中文信息学报;2006年01期
6 章剑锋;张奇;吴立德;黄萱菁;;中文观点挖掘中的主观性关系抽取[J];中文信息学报;2008年02期
7 韩忠明;张玉沙;张慧;万月亮;黄今慧;;有效的中文微博短文本倾向性分类算法[J];计算机应用与软件;2012年10期
8 王文远;王大玲;冯时;李任斐;王琳;;一种面向情感分析的微博表情情感词典构建及应用[J];计算机与数字工程;2012年11期
9 周胜臣;瞿文婷;石英子;施询之;孙韵辰;;中文微博情感分析研究综述[J];计算机应用与软件;2013年03期
10 张珊;于留宝;胡长军;;基于表情图片与情感词的中文微博情感分析[J];计算机科学;2012年S3期
,本文编号:2275057
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2275057.html