中文知识工程和知识服务平台的设计与实现

发布时间：2018-12-12 13:27

【摘要】：互联网的飞速发展带动了网络用户的数量迅猛增长,越来越多的用户进入互联网成为网民,使得用户对网络资源的需求在急剧的增长。如何从浩瀚的信息资源中检索出用户需要的信息是目前互联网发展所面临的一大挑战,而当前为网络用户提供信息检索服务的主要是互联网搜索引擎以及部分问答社区,它们在向用户提供数据服务的广‘度方面已经有了长足的进步,诸如百度和百度知道等。但是在提供数据的精度方面效果并不太好,尤其当用户的需求是精确度要求较高的知识信息时无论是搜索引擎还是问答社区都显得有些力不从心。本文针对网络信息迅速膨胀与网络用户对知识信息的需求不匹配的问题,提出了利用中文知识工程的相关技术创建中文知识库,并且建立一个提供中文知识服务的平台,该平台旨在实现为网络用户提供优质、高效的知识共享信息。在知识库的构建方面,本文提出利用百科页面的信息框来抽取属性对,并根据从信息框抽取的属性对训练分类模型,利用该模型并结合现代汉语自动分词、词性标记和命名实体标注技术实现了从不含有信息框的百科页面中抽取属性对，，并利用抽取出来的属性对建立属性值数据库,实现用户检索知识信息的准确定位。同时,用户在检索一个知识信息的时候还很关心与其相关的一些其它的知识信息,于是本文提出了一种基于维基百科的实体关联度计算方法,该方法利用维基百科页面中含有的共现链接信息来计算两个命名实体的关联度。在知识服务方面,本文利用基于链接分析的HITS算法来对检索结果进行排序,并且对经过HITS排序的网页再通过计算页面与问题的相似度来最终确定答案贝面的排序。
[Abstract]:The rapid development of the Internet has driven the rapid growth of the number of network users, more and more users enter the Internet to become Internet users, making the demand for network resources of users in the rapid growth. How to retrieve the information that users need from the vast information resources is a major challenge facing the development of the Internet. At present, the Internet search engines and part of the Q & A community are the main service providers of information retrieval for Internet users. They have come a long way in providing users with a wide range of data services, such as Baidu and Baidu know. However, the accuracy of the data is not very good, especially when the user needs high accuracy of knowledge information, whether search engines or Q & A community seem to be unable to do. In this paper, aiming at the problem that the rapid expansion of network information does not match the demand of network users for knowledge information, this paper proposes to use the relevant technologies of Chinese knowledge engineering to create Chinese knowledge base, and to establish a platform to provide Chinese knowledge service. The platform aims to provide high-quality and efficient knowledge sharing information for network users. In the construction of knowledge base, this paper proposes to use the information box of encyclopedia page to extract attribute pairs, and to train the classification model according to the attribute pairs extracted from the information frame, using this model and combining with modern Chinese automatic word segmentation. The technology of part of speech and named entity tagging realizes the extraction of attribute pairs from encyclopedia pages which never contain information box, and sets up attribute value database by using extracted attribute pairs to realize the accurate location of users' retrieval of knowledge information. At the same time, when retrieving a knowledge information, users are also concerned about some other knowledge information related to it, so this paper proposes a method of entity correlation degree calculation based on Wikipedia. This method uses coexisting link information contained in Wikipedia page to calculate the correlation degree of two named entities. In the aspect of knowledge service, this paper uses the HITS algorithm based on link analysis to sort the retrieval results, and then calculates the similarity between the page and the problem by calculating the similarity between the page and the problem to determine the sorting of the answer shell surface.
【学位授予单位】：北方工业大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP393.09;TP391.3

【参考文献】