关于“中文网页自动分类竞赛”结果的分析
发布时间:2018-06-29 11:15
本文选题:计算机应用 + 中文信息处理 ; 参考:《中文信息学报》2003年05期
【摘要】:在最近召开的"全国搜索引擎与网上信息挖掘学术研讨会"上,举办了一场"中文网页自动分类竞赛",共有来自全国各地的10个队参加。本文在介绍本次竞赛活动规则和过程的基础上,详细分析了竞赛的结果,从而使我们对于目前中文网页自动分类技术的现状有了一种具体的认识:目前已有分类器的性能没有呈现出明显的差距,中文网页的分类比普通文本的分类要困难的多。同时,本文还尝试推出一个标准的中文网页分类的实例样本集,希望通过不断完善,最终作为中文网页分类技术研究的基本语料。
[Abstract]:At the "National Symposium on search engines and online Information Mining", a "Chinese Page automatic Classification Competition" was held, involving 10 teams from all over the country. On the basis of introducing the rules and process of the competition, this paper analyzes the results of the competition in detail. So that we have a specific understanding of the current situation of Chinese web page automatic classification technology: the performance of the existing classifiers has not shown a clear gap, the classification of Chinese web pages is much more difficult than the ordinary text classification. At the same time, this paper also tries to develop a standard sample set of Chinese web page classification, hoping that it can be used as the basic language data for the research of Chinese web page classification technology.
【作者单位】: 北京大学计算机科学与技术系 北京大学计算机科学与技术系
【基金】:国家973重大基础研究项目资助(G1999032706)
【分类号】:TP393.09
,
本文编号:2081934
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2081934.html