Nutch中网页排序效果的改进方法
发布时间:2018-12-17 12:08
【摘要】:Nutch是一个Java实现的开源搜索引擎。针对目前Nutch对中文进行单字切分且没有实现PageRank计算的缺点,改进PageRank算法,设计并实现基于MapReduce的PageRank计算方法,对Nutch中文分词进行改进,加入JE中文分词器。实验结果表明,改进后的Nutch具有更高的查询结果准确率和中文网页排序效果。
[Abstract]:Nutch is an open source search engine implemented by Java. In view of the disadvantage of Nutch segmentation of Chinese characters without PageRank computation, the PageRank algorithm is improved, the PageRank computing method based on MapReduce is designed and implemented, the Chinese word segmentation of Nutch is improved and JE Chinese word segmentation is added. The experimental results show that the improved Nutch has higher query accuracy and Chinese web page sorting effect.
【作者单位】: 广西大学计算机与电子信息学院;
【基金】:广西自然科学基金资助项目(桂科自0832059)
【分类号】:TP391.3
[Abstract]:Nutch is an open source search engine implemented by Java. In view of the disadvantage of Nutch segmentation of Chinese characters without PageRank computation, the PageRank algorithm is improved, the PageRank computing method based on MapReduce is designed and implemented, the Chinese word segmentation of Nutch is improved and JE Chinese word segmentation is added. The experimental results show that the improved Nutch has higher query accuracy and Chinese web page sorting effect.
【作者单位】: 广西大学计算机与电子信息学院;
【基金】:广西自然科学基金资助项目(桂科自0832059)
【分类号】:TP391.3
【相似文献】
相关期刊论文 前10条
1 潘涛;梁正友;;Nutch中网页排序效果的改进方法[J];计算机工程;2010年13期
2 詹恒飞;杨岳湘;方宏;;Nutch分布式网络爬虫研究与优化[J];计算机科学与探索;2011年01期
3 江务学;张t,
本文编号:2384178
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2384178.html