基于XML的自动学习Web信息抽取

发布时间：2018-04-16 02:31

本文选题：信息提取 + 半结构化　；参考：《计算机科学》2008年03期

【摘要】：因特网给我们提供了巨大的信息量,在信息量极其丰富的Web资源中,蕴涵着大量有用的知识信息。信息爆炸而知识匮乏是当今人们所面临的一个很重要的问题。通过搜索引擎来查找信息将不容易定位到用户最感兴趣的数据上。而通过Web信息抽取的自动化实现,可以提高信息获得的效率。信息抽取可以从网络上分析和发现有用的信息,废弃冗余的数据,提取用户知识领域的知识。本文分析了基于XML的Web信息提取,讨论了相关技术在Web信息抽取中的应用并建立了相应的Web信息抽取模型,通过自动学习来获取信息抽取规则,实现Web信息的自动提取。
[Abstract]:The Internet provides us with a huge amount of information. In the abundant Web resources, it contains a lot of useful knowledge information.Information explosion and lack of knowledge is a very important problem that people are facing today.Search engines to find information will not be easy to locate the user's most interesting data.Through the automation of Web information extraction, the efficiency of information acquisition can be improved.Information extraction can analyze and find useful information from the network, discard redundant data, and extract user knowledge in the domain of knowledge.This paper analyzes the Web information extraction based on XML, discusses the application of related techniques in Web information extraction, and establishes the corresponding Web information extraction model. The rules of information extraction are obtained by automatic learning, and the automatic extraction of Web information is realized.
【作者单位】：中山大学计算机科学系中山大学计算机科学系中山大学计算机科学系中山大学计算机科学系中山大学计算机科学系中山大学计算机科学系
【基金】：国家自然科学基金项目(60373081,60673135) 广东省自然科学基金项目(04105503,5003348) 教育部“新世纪优秀人才支持计划”资助项目
【分类号】：TP312.2

【相似文献】