手机产品信息垂直搜索引擎的研究
发布时间:2018-04-14 17:31
本文选题:主题相关度 + 网络爬虫 ; 参考:《湖南工业大学》2013年硕士论文
【摘要】:随着互联网技术的飞跃,电子商务的蓬勃发展,论坛,博客等的兴起,越来越多的人们喜欢针对商品的各种属性发表自己的评论,表达自己对某款商品的态度、看法以及使用感受。因此网络上涌现了海量产品评论信息。通过阅读这些评论,可以帮助潜在的商品购买者了解产品的特点,作出是否购买的决策,另外商家也可以通过挖掘这些评论信息及时有效的了解到商品的供求关系,受欢迎程度,给自己的销售决策提供很大的帮助。但是仅仅依靠人工的浏览、收集这些信息是费时费力的,而且获得的信息不够全面、及时、有效,因此人们在搜索信息时越来越依靠搜索引擎。但是针对具体领域,通用搜索引擎的缺点显而易见,因此构建一款针对具体产品领域的垂直搜索引擎是十分必要的。 在对国内外垂直搜索引擎及情感分类研究现状的分析基础上,本文以构建手机产品信息垂直搜索引擎为线索,所做的主要工作如下: (1)设计了针对手机产品领域的主题爬虫框架,在爬虫搜索策略上,深入研究了传统的基于内容的搜索策略和基于链接的搜索策略后,改进了一种基于内容和基于链接相结合的搜索策略,使爬虫爬取到的网页主题相关程度大大增加,方便了构建垂直搜索引擎的后续步骤。同时通过实验对比了HITS算法,,宽度优先算法,PageRank算法,显示了本文算法的优势。 (2)在获取手机产品属性和情感词之后,提出了一种属性词和情感词的搭配识别方法,通过SVM训练分类器,有效的获取评论中针对产品的某个属性的情感倾向得分,然后综合某一手机型号的所有评论信息给出总体满意度。通过实验对比,验证了搭配方法的有效性。 (3)设计实现了一个针对手机产品信息的垂直搜索引擎,给出了设计的框架,并对各个模块的实现进行描述,给出了系统界面。
[Abstract]:With the rapid development of Internet technology, e-commerce, forums, blogs and so on, more and more people like to express their attitude towards a certain product by commenting on the various attributes of the product.Perception and use of feelings.As a result, a large number of product reviews have emerged on the network.By reading these comments, you can help potential commodity buyers understand the characteristics of the product and make decisions about whether to buy or not. In addition, merchants can also find out the supply and demand relationship of the goods in a timely and effective manner by digging up these comments.Popularity, to their own sales decisions to provide a great help.But only relying on manual browsing to collect these information is time-consuming and laborious, and the information obtained is not comprehensive, timely and effective, so people rely more and more on search engine when searching for information.But for specific areas, the shortcomings of general search engine are obvious, so it is necessary to build a vertical search engine for specific product domain.Based on the analysis of the research status of vertical search engine and emotion classification at home and abroad, this paper takes the construction of vertical search engine of mobile phone product information as the clue, and the main work is as follows:In this paper, we design a topic crawler framework in the field of mobile phone products. In the crawler search strategy, we deeply study the traditional content-based search strategy and the linked based search strategy.A search strategy based on content and link is improved to increase the correlation degree of web topics crawled by crawlers and facilitate the subsequent steps of building vertical search engines.At the same time, the HITS algorithm and the width first algorithm are compared with the PageRank algorithm, which shows the advantages of this algorithm.(2) after obtaining mobile phone product attributes and affective words, a collocation recognition method of attribute words and affective words is proposed. By training the classifier with SVM, we can effectively obtain the scores of emotional tendency of a certain attribute of a product in a comment.Then give overall satisfaction by synthesizing all the comments on a mobile phone model.The validity of the collocation method is verified by experimental comparison.A vertical search engine for mobile phone product information is designed and implemented. The framework of the design is given. The implementation of each module is described and the system interface is given.
【学位授予单位】:湖南工业大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前10条
1 秦锋;任诗流;程泽凯;罗慧;;基于属性加权的朴素贝叶斯分类算法[J];计算机工程与应用;2008年06期
2 汪涛,樊孝忠;主题爬虫的设计与实现[J];计算机应用;2004年S1期
3 和文全;薛惠峰;解丹蕊;杜U
本文编号:1750349
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1750349.html