分布式知识搜索系统的研究与实现
发布时间:2018-08-30 17:58
【摘要】:互联网中蕴含着大量的有价值信息,而搜索引擎是目前人们从互联网上检索信息的重要工具。传统的搜索引擎仅依靠关键字匹配为用户寻找相关的网页,并根据一定的算法进行排名呈献给用户,并没有参考网页的语义信息。随着互联网技术的发展和人们精确搜索需求的增加,传统的搜索引擎已经不能够很好地适应这一需求的变化。为了解决传统搜索引擎存在的不足,知识搜索应运而生。知识搜索会分析用户的查询意图,并将相关的知识返回给用户,大大提高了搜索结果的准确程度和相关程度。由于自然语言处理的高耗时性并顾及到知识库增长带来的存储问题以及安全性,本文将知识搜索与分布式框架相结合,实现了一个包含工作流框架、分布式爬虫和分布式知识抽取模块的可灵活配置流程的分布式知识搜索系统,并对单机系统和分布式系统的效率进行了对比。在由三台机器组成的实验性分布式系统上进行的对比实验说明分布式知识抽取系统的效率比单机系统提高了近一倍,并且可以随着分布式集群的扩展继续提高。同时,分布式系统也能提供更好的安全性。
[Abstract]:The Internet contains a lot of valuable information, and search engine is an important tool for people to retrieve information from the Internet. Traditional search engines only rely on keyword matching to find relevant pages for users, and rank them to users according to certain algorithms, without referring to the semantic information of web pages. With the development of Internet technology and the increase of people's demand for accurate search, the traditional search engine can not adapt to the change of this demand. In order to solve the shortcomings of traditional search engines, knowledge search emerged as the times require. Knowledge search will analyze the user's query intention and return the relevant knowledge to the user, which greatly improves the accuracy and correlation of the search results. Due to the high time consuming of natural language processing and the storage problems and security caused by the growth of knowledge base, this paper combines knowledge search with distributed framework to implement a workflow framework. Distributed crawler and distributed knowledge extraction module can flexibly configure the process of distributed knowledge search system, and the efficiency of single computer system and distributed system are compared. A comparative experiment on an experimental distributed system composed of three machines shows that the efficiency of the distributed knowledge extraction system is nearly twice as high as that of the single machine system, and can be further improved with the expansion of the distributed cluster. At the same time, the distributed system can also provide better security.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
本文编号:2213846
[Abstract]:The Internet contains a lot of valuable information, and search engine is an important tool for people to retrieve information from the Internet. Traditional search engines only rely on keyword matching to find relevant pages for users, and rank them to users according to certain algorithms, without referring to the semantic information of web pages. With the development of Internet technology and the increase of people's demand for accurate search, the traditional search engine can not adapt to the change of this demand. In order to solve the shortcomings of traditional search engines, knowledge search emerged as the times require. Knowledge search will analyze the user's query intention and return the relevant knowledge to the user, which greatly improves the accuracy and correlation of the search results. Due to the high time consuming of natural language processing and the storage problems and security caused by the growth of knowledge base, this paper combines knowledge search with distributed framework to implement a workflow framework. Distributed crawler and distributed knowledge extraction module can flexibly configure the process of distributed knowledge search system, and the efficiency of single computer system and distributed system are compared. A comparative experiment on an experimental distributed system composed of three machines shows that the efficiency of the distributed knowledge extraction system is nearly twice as high as that of the single machine system, and can be further improved with the expansion of the distributed cluster. At the same time, the distributed system can also provide better security.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前10条
1 李颖新,刘全金,阮晓钢;多发性骨髓瘤基因表达谱分析[J];北京工业大学学报;2004年03期
2 胡光民;周亮;柯立新;;基于Hadoop的网络日志分析系统研究[J];电脑知识与技术;2010年22期
3 许勇,宋柔;基于HMM的百科辞典文本中句子的知识点分类[J];计算机工程与应用;2005年04期
4 陈莉;吴洁;马静;薛浩;;基于本体的领域知识搜索研究[J];计算机工程;2008年24期
5 安强强;张蕾;;基于依存树的中文语义角色标注[J];计算机工程;2010年04期
6 毛文吉,陆汝钤;基于SELD描述语言的英文科技文本知识自动获取[J];计算机学报;1998年S1期
7 陈克健;电子词典与词汇知识表达[J];中文信息学报;2002年04期
8 刘怀军;车万翔;刘挺;;中文语义角色标注的特征工程[J];中文信息学报;2007年01期
9 李军辉;王红玲;周国栋;朱巧明;钱培德;;语义角色标注中句法特征的研究[J];中文信息学报;2009年06期
10 刘挺;车万翔;李生;;基于最大熵分类器的语义角色标注[J];软件学报;2007年03期
相关硕士学位论文 前1条
1 邓昱;中文问答系统中的答案抽取算法研究[D];北京邮电大学;2009年
,本文编号:2213846
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2213846.html