中文知识工程和知识服务平台的设计与实现
发布时间:2018-12-12 13:27
【摘要】:互联网的飞速发展带动了网络用户的数量迅猛增长,越来越多的用户进入互联网成为网民,使得用户对网络资源的需求在急剧的增长。如何从浩瀚的信息资源中检索出用户需要的信息是目前互联网发展所面临的一大挑战,而当前为网络用户提供信息检索服务的主要是互联网搜索引擎以及部分问答社区,它们在向用户提供数据服务的广‘度方面已经有了长足的进步,诸如百度和百度知道等。但是在提供数据的精度方面效果并不太好,尤其当用户的需求是精确度要求较高的知识信息时无论是搜索引擎还是问答社区都显得有些力不从心。 本文针对网络信息迅速膨胀与网络用户对知识信息的需求不匹配的问题,提出了利用中文知识工程的相关技术创建中文知识库,并且建立一个提供中文知识服务的平台,该平台旨在实现为网络用户提供优质、高效的知识共享信息。 在知识库的构建方面,本文提出利用百科页面的信息框来抽取属性对,并根据从信息框抽取的属性对训练分类模型,利用该模型并结合现代汉语自动分词、词性标记和命名实体标注技术实现了从不含有信息框的百科页面中抽取属性对,,并利用抽取出来的属性对建立属性值数据库,实现用户检索知识信息的准确定位。同时,用户在检索一个知识信息的时候还很关心与其相关的一些其它的知识信息,于是本文提出了一种基于维基百科的实体关联度计算方法,该方法利用维基百科页面中含有的共现链接信息来计算两个命名实体的关联度。在知识服务方面,本文利用基于链接分析的HITS算法来对检索结果进行排序,并且对经过HITS排序的网页再通过计算页面与问题的相似度来最终确定答案贝面的排序。
[Abstract]:The rapid development of the Internet has driven the rapid growth of the number of network users, more and more users enter the Internet to become Internet users, making the demand for network resources of users in the rapid growth. How to retrieve the information that users need from the vast information resources is a major challenge facing the development of the Internet. At present, the Internet search engines and part of the Q & A community are the main service providers of information retrieval for Internet users. They have come a long way in providing users with a wide range of data services, such as Baidu and Baidu know. However, the accuracy of the data is not very good, especially when the user needs high accuracy of knowledge information, whether search engines or Q & A community seem to be unable to do. In this paper, aiming at the problem that the rapid expansion of network information does not match the demand of network users for knowledge information, this paper proposes to use the relevant technologies of Chinese knowledge engineering to create Chinese knowledge base, and to establish a platform to provide Chinese knowledge service. The platform aims to provide high-quality and efficient knowledge sharing information for network users. In the construction of knowledge base, this paper proposes to use the information box of encyclopedia page to extract attribute pairs, and to train the classification model according to the attribute pairs extracted from the information frame, using this model and combining with modern Chinese automatic word segmentation. The technology of part of speech and named entity tagging realizes the extraction of attribute pairs from encyclopedia pages which never contain information box, and sets up attribute value database by using extracted attribute pairs to realize the accurate location of users' retrieval of knowledge information. At the same time, when retrieving a knowledge information, users are also concerned about some other knowledge information related to it, so this paper proposes a method of entity correlation degree calculation based on Wikipedia. This method uses coexisting link information contained in Wikipedia page to calculate the correlation degree of two named entities. In the aspect of knowledge service, this paper uses the HITS algorithm based on link analysis to sort the retrieval results, and then calculates the similarity between the page and the problem by calculating the similarity between the page and the problem to determine the sorting of the answer shell surface.
【学位授予单位】:北方工业大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP393.09;TP391.3
本文编号:2374655
[Abstract]:The rapid development of the Internet has driven the rapid growth of the number of network users, more and more users enter the Internet to become Internet users, making the demand for network resources of users in the rapid growth. How to retrieve the information that users need from the vast information resources is a major challenge facing the development of the Internet. At present, the Internet search engines and part of the Q & A community are the main service providers of information retrieval for Internet users. They have come a long way in providing users with a wide range of data services, such as Baidu and Baidu know. However, the accuracy of the data is not very good, especially when the user needs high accuracy of knowledge information, whether search engines or Q & A community seem to be unable to do. In this paper, aiming at the problem that the rapid expansion of network information does not match the demand of network users for knowledge information, this paper proposes to use the relevant technologies of Chinese knowledge engineering to create Chinese knowledge base, and to establish a platform to provide Chinese knowledge service. The platform aims to provide high-quality and efficient knowledge sharing information for network users. In the construction of knowledge base, this paper proposes to use the information box of encyclopedia page to extract attribute pairs, and to train the classification model according to the attribute pairs extracted from the information frame, using this model and combining with modern Chinese automatic word segmentation. The technology of part of speech and named entity tagging realizes the extraction of attribute pairs from encyclopedia pages which never contain information box, and sets up attribute value database by using extracted attribute pairs to realize the accurate location of users' retrieval of knowledge information. At the same time, when retrieving a knowledge information, users are also concerned about some other knowledge information related to it, so this paper proposes a method of entity correlation degree calculation based on Wikipedia. This method uses coexisting link information contained in Wikipedia page to calculate the correlation degree of two named entities. In the aspect of knowledge service, this paper uses the HITS algorithm based on link analysis to sort the retrieval results, and then calculates the similarity between the page and the problem by calculating the similarity between the page and the problem to determine the sorting of the answer shell surface.
【学位授予单位】:北方工业大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP393.09;TP391.3
【参考文献】
相关期刊论文 前8条
1 刘高军;马砚忠;段建勇;;基于维基百科的中文命名实体关联度计算[J];北方工业大学学报;2012年01期
2 田久乐;赵蔚;;基于同义词词林的词语相似度计算方法[J];吉林大学学报(信息科学版);2010年06期
3 李淑英;;中文分词技术[J];科技信息(科学教研);2007年36期
4 刘斌,黄铁军,程军,高文;一种新的基于统计的自动文本分类方法[J];中文信息学报;2002年06期
5 秦春秀;赵捧未;刘怀亮;;词语相似度计算研究[J];情报理论与实践;2007年01期
6 李满华;;股市财富效应相关问题研究[J];商场现代化;2010年12期
7 牟晋娟;包宏;;中文实体关系抽取研究[J];计算机工程与设计;2009年15期
8 李满华;;财富与财富效应相关问题研究[J];现代商贸工业;2010年11期
相关博士学位论文 前1条
1 李荣陆;文本分类及其相关技术研究[D];复旦大学;2005年
相关硕士学位论文 前1条
1 顾申华;基于互动问答系统的问题推荐[D];上海交通大学;2009年
本文编号:2374655
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2374655.html