基于本体学习的Deep Web语义标注关键问题研究

发布时间：2018-05-12 04:47

本文选题：Deep + Web　；参考：《苏州大学》2012年硕士论文

【摘要】：随着互联网的快速发展以及Web相关技术的不断成熟，Deep Web目前已成为人们获取信息的重要来源，为了使用户可以快速、准确、方便地获取所需的Deep Web资源，Deep Web信息集成技术已经成为该领域研究的热点问题。 Deep Web结果语义标注是Deep Web信息集成系统中的重要阶段，而Deep Web查询接口模式的准确抽取又是语义标注的基础。因此，本文分别对Deep Web查询接口模式抽取和语义标注进行了深入的研究，并将本体引入标注过程，在此基础上设计并构建一个面向Deep Web的搜索引擎原型系统。本文的主要研究工作如下：（1）对Deep Web信息集成系统框架和Deep Web语义标注国内外研究现状进行介绍，分析传统语义标注方法的缺点和不足。简要介绍本体的概念与作用以及本文所使用的Deep Web领域本体的构建原则和学习方法。（2）提出一种基于分层模型的Deep Web查询接口模式抽取方法，解决现有接口模式抽取方法中忽略查询接口内部结构与语义关系的问题。该方法首先挖掘查询接口元素的页面布局特征，使用基于扩展的层次聚类方法抽取接口模式树，其次利用控件与标签的位置及语义关系为模式树各节点匹配语义描述标签。（3）提出一种基于本体的Deep Web语义标注方法，解决传统方法中标注能力不足和标注结果不统一的问题。首先对数据单元进行对齐分组，然后采用多种基本标注器对分组进行组合标注；之后建立结果模式与本体间的映射得到完整且统一的标注结果；最后对同一领域内的不同数据源进行交叉标注验证。（4）设计并实现一个面向图书电子商务领域的Deep Web搜索引擎原型系统。本文采用UIUC提供的数据集分别对文中提出的解决方法进行实验，，通过对实验结果的分析验证了本文提出的方法是可行且有效的。
[Abstract]:With the rapid development of the Internet and the continuous maturity of Web related technology, Deep Web has become an important source of information for people to obtain, in order to enable users to quickly and accurately, It has become a hot issue in this field to obtain the Deep Web resource and deep Web information integration technology conveniently. Semantic annotation of Deep Web results is an important stage in Deep Web information integration system, and the accurate extraction of Deep Web query interface pattern is the basis of semantic annotation. Therefore, in this paper, Deep Web query interface pattern extraction and semantic annotation are studied in depth, and ontology is introduced into the annotation process. On this basis, a Deep Web oriented search engine prototype system is designed and constructed. The main work of this paper is as follows: This paper introduces the framework of Deep Web information integration system and the current research status of Deep Web semantic annotation at home and abroad, and analyzes the shortcomings and shortcomings of traditional semantic annotation methods. This paper briefly introduces the concept and function of ontology, as well as the construction principles and learning methods of Deep Web domain ontology used in this paper. A hierarchical model based Deep Web query interface pattern extraction method is proposed to solve the problem of ignoring the relationship between the structure and semantics of the query interface in the existing interface pattern extraction methods. Firstly, the page layout features of the query interface elements are mined, and the interface pattern tree is extracted by using the extended hierarchical clustering method. Secondly, the semantic relation between the control and the tag is used to match the semantic description labels for each node of the pattern tree. In this paper, we propose an ontology-based Deep Web semantic annotation method to solve the problems of insufficient tagging ability and inconsistent annotation results in traditional methods. Firstly, the data units are grouped in alignment, then the grouping is labeled by a variety of basic annotators, and then the mapping between the result pattern and the ontology is established to obtain a complete and uniform annotation result. Finally, cross-tagging validation of different data sources in the same domain is carried out. Design and implement a Deep Web search engine prototype system for book e-commerce. In this paper, we use the data set provided by UIUC to carry out experiments on the proposed solution, and the analysis of the experimental results shows that the proposed method is feasible and effective.
【学位授予单位】：苏州大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP391.1;TP393.4

【参考文献】