当前位置:主页 > 科技论文 > 搜索引擎论文 >

面向实体查询的开放式信息抽取技术研究

发布时间:2018-07-09 16:01

  本文选题:维基百科 + 实体抽取 ; 参考:《北方工业大学》2012年硕士论文


【摘要】:查询推荐是搜索引擎系统中的一项重要技术,其通过推荐更合适的查询以提高用户的搜索体验现,现有方法能够找到直接通过某种属性关联的相似查询,却忽略了具有间接关联的语义相关查询。 为解决上述问题,本文采用开放式的知识库维基百科,并以此提出了一种新型的查询扩展系统。该方法通过抽取维基百科的部分结构化信息及自然文本信息,形成了以实体为骨架,以实体特征和实体关系为网络的层级语料库,基于此语料库完成相应的用户查询推荐系统,并进一步针对用户查询未被收录在维基百科时,设计辅助查询系统改进查询推荐效果。 本文主要创新点如下: 提出一种基于随机游走模型的查询意图识别算法RWM。该方法能够解决一些数据稀疏的问题,通过随机游走过程,对未直接关联的概念进行了扩展,从而有效的达到查询意图的识别。 提出一种共同利用维基百科的结构化知识和web知识的稀有查询分类算法WWRQ,该方法利用搜索引擎得到检索结果,通过从维基百科抽取的特征信息进行投票,得到查询分类。 实验结果表明:与传统的查询推荐系统相比,随机游走模型的查询意图识别算法能够同时兼顾准确率和召回率,显著提高查询精度。基于维基百科和web知识的稀有查询算法有效解决了针对简短查询无法准确定位的问题。
[Abstract]:Query recommendation is an important technology in search engine system. By recommending more appropriate queries to improve the user's search experience, the existing methods can find similar queries directly associated with some attributes. The semantic correlation query with indirect association is ignored. In order to solve the above problems, an open knowledge base Wikipedia is adopted and a new query extension system is proposed. By extracting part of structured information and natural text information from Wikipedia, the method forms a hierarchical corpus based on entity skeleton and entity feature and entity relationship. Based on this corpus, the corresponding user query recommendation system is completed. Furthermore, an auxiliary query system is designed to improve the performance of query recommendation when the user query is not included in Wikipedia. The main innovations of this paper are as follows: a query intention recognition algorithm RWM based on random walk model is proposed. This method can solve the problem of sparse data. By random walk process, the concept that is not directly related is extended, so that the identification of query intention can be achieved effectively. This paper proposes a rare query classification algorithm, WWRQ, which uses the structured knowledge of Wikipedia and web knowledge together. The search engine is used to obtain the retrieval results, and the feature information extracted from Wikipedia is used to vote to obtain the query classification. The experimental results show that, compared with the traditional query recommendation system, the search intention recognition algorithm based on random walk model can improve the query accuracy and recall rate simultaneously. The rare query algorithm based on Wikipedia and web effectively solves the problem that short queries can not be located accurately.
【学位授予单位】:北方工业大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3

【参考文献】

相关期刊论文 前2条

1 张海粟;马大明;邓智龙;;基于维基百科的语义知识库及其构建方法研究[J];计算机应用研究;2011年08期

2 王锦;王会珍;张俐;;基于维基百科类别的文本特征表示[J];中文信息学报;2011年02期



本文编号:2109888

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2109888.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户1225a***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com