基于改进决策树算法的网络关键资源页面判定.pdf 全文免费在线阅读
1000-9825/2005/16111958 ?2005 Journal of Software 软 件 学 报 Vol.16, No.11基于改进决策树算法的网络关键资源页面判定
刘奕 群 , 张 敏 , 马少 平
智能技术与系 统国家重点实 验室 清华 大学 ,北京 100084
Web Key Resource Page Judgment Based on Improved Decision Tree Algorithm
LIU Yi-Qun , ZHANG Min, MA Shao-Ping
State Key Laboratory of Intelligent Technology and Systems Tsinghua University, Beijing 100084, China
+ Corresponding author: Phn: +86-10-62777699, E-mail: liuyiqun03@, ://max.book118.com
Received 2004-07-26; Accepted 2005-06-02
Liu YQ, Zhang M, Ma SP. Web key resource page judgment based on improved decision tree algorithmJournal of Software, 2005,1611:1958?1966. DOI: 10.1360/jos161958
Abstract: Key resource page is one of the most important search target pages for Web search users. Decision tree
learning is one of the most widely-used and practical methods for inductive inference in machine learning. Because
of the difficulty in uniform sampling of Web pages, there are not enough negative instances for training a key
resource decision tree. To solve the problem, the original algorithm is partly modified to learn from global instead
of individual instance information. With the same evaluation method as TREC Text Retrieval Conference 2003,
large scale retrieval experiments based on improved decision tree algorithm achieves more than 40% improvement
than the ones based on the original algorithm. It not only offers an effective way for selecting Web key resource
pages, but also shows a possible way to improve decision tree learning performancesKey words: Web information retrieval; key resource page; machine learning; decision tree
: 关 键资源 页面是 网络信息 环境中 一 种 重 要 的 高 质 量 页 面 ,是 用户进 行网络信 息检索 的主要 目 标决策
树算 法是 机器 学习 中应 用最 广的 归纳 推理 算法 之一