基于概念语义相似度的长句查询扩展研究
[Abstract]:With the rapid development of Internet, network information retrieval has also developed rapidly. At present, the main form of network information retrieval is search engine, which is the second largest service after email service. The current search engine mainly uses the query keywords entered by the user to retrieve the information, but the limited words entered by the user can not completely accurately express the real intention of the retrieval. The ambiguity of the query itself leads the search engine to return a large number of documents independent of user requirements, resulting in a low recall and precision. On the other hand, users sometimes enter long sentence queries, and the length of query words provided to search engines is increasing, which makes the retrieval results not ideal because of query topic offset and other problems. Therefore, in order to solve the above problems, the relevant scholars have proposed query expansion technology, that is, by modifying the original query words to improve the accuracy and recall rate of query retrieval, and have indeed achieved some results. However, most of them are aimed at short queries. In recent years, foreign countries are increasing the research on long sentence query expansion technology, which is mainly due to the use of natural language-long sentence, which can better express complex and specific information requirements, which is a development trend of user query expression in the future. Moreover, with the continuous development of query extension technology in the direction of semantic query, the rich semantic association contained in long sentence query provides a better research basis for the implementation of semantic query extension. It is very helpful to understand the complex language characteristics and different grammatical habits of users. Therefore, in order to solve the problems of long sentence query topic offset, low accuracy and low recall rate, a long sentence query extension method based on conceptual semantic similarity is proposed in this paper. In this method, the AAlesk word meaning disambiguation method is used to determine the correct word meaning of the query word, and then the related semantic concepts of the WordNet synonym set under the meaning of the word are added to the original query. The query clustering set is obtained by clustering the semantic similarity between different concepts in the query, and the best candidate concept set is obtained according to the overall semantic similarity level of the clustering set and the semantic correlation importance of the concept itself. Finally, the keywords with the highest score are extracted from the concept set to represent the original query, so as to improve the query of the original long sentence. In addition, a KeyGraph keyword extraction method is used to process the original long sentence query, and the improved results of these two different long sentence queries are put into three different types of mainstream retrieval models for retrieval experiments. The experimental results show that the improved long sentence query retrieval efficiency has been improved, especially the long sentence query extension method proposed in this paper can better express the real information needs of users from the semantic level. It greatly improves the accuracy and recall rate of long sentence query, and is more suitable for application in the existing mainstream language retrieval model.
【学位授予单位】:山东理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【相似文献】
相关期刊论文 前10条
1 阳小华;蒋辉;马家宇;;基于任务上下文的查询扩展方法[J];郑州大学学报(理学版);2010年01期
2 吴煈;张奇;黄萱菁;;基于整数线性规划的查询扩展[J];计算机研究与发展;2013年08期
3 何燕;;基于用户反馈的查询扩展研究[J];情报理论与实践;2013年08期
4 黄伟群;;基于用户视角的交互式查询扩展研究[J];图书情报工作;2014年05期
5 黄名选;严小卫;张师超;;查询扩展技术进展与展望[J];计算机应用与软件;2007年11期
6 林国俊;叶飞跃;耿冬;郑国良;;基于语义的概念查询扩展[J];计算机工程与设计;2009年06期
7 巩玉玺;王大玲;;一种改进的基于伪相关反馈的查询扩展[J];微计算机信息;2009年15期
8 黄名选;张师超;严小卫;;基于查询行为和关联规则的相关反馈查询扩展[J];计算机工程;2009年10期
9 张超盟;李战怀;温宗臣;;局部上下文分析剪枝概念树的查询扩展[J];计算机工程;2009年14期
10 罗小聪;;基于专用双语词典的查询扩展[J];现代计算机(专业版);2009年10期
相关会议论文 前10条
1 黄明初;钟威;何拥军;蒙斌;;基于查询扩展的数字档案检索策略[A];广西计算机学会2010年学术年会论文集[C];2010年
2 吕碧波;赵军;;基于相关文档池建模的查询扩展[A];第二届全国信息检索与内容安全学术会议(NCIRCS-2005)论文集[C];2005年
3 林建方;李生;郑德权;;基于词语搭配关系的查询扩展方法[A];第四届全国信息检索与内容安全学术会议论文集(上)[C];2008年
4 丁国栋;白硕;王斌;;一种基于局部共现的查询扩展方法[A];第二届全国信息检索与内容安全学术会议(NCIRCS-2005)论文集[C];2005年
5 李东园;白宇;蔡东风;;基于用户日志分析的查询扩展研究[A];第四届全国学生计算语言学研讨会会议论文集[C];2008年
6 张志强;孟庆海;谢晓芹;;个性化的社会标签查询扩展技术研究[A];NDBC2010第27届中国数据库学术会议论文集A辑二[C];2010年
7 王秉卿;张奇;吴立德;黄萱菁;;机器学习的查询扩展在博客检索中的应用[A];第四届全国学生计算语言学研讨会会议论文集[C];2008年
8 王秉卿;黄萱菁;;基于线性模型的查询扩展方法[A];第五届全国信息检索学术会议论文集[C];2009年
9 晋松;林鸿飞;苏绥;;基于标签共现的查询扩展研究[A];中国计算机语言学研究前沿进展(2007-2009)[C];2009年
10 郭文;史晓东;陈毅东;;跨语言信息检索中的查询扩展[A];第四届全国学生计算语言学研讨会会议论文集[C];2008年
相关重要报纸文章 前1条
1 钟威 何拥军;数字档案信息扩展查询功能需求分析及实现方式[N];中国档案报;2011年
相关博士学位论文 前3条
1 郭晓黎;煤矿安全事件本体及其在查询扩展中的应用研究[D];中国矿业大学(北京);2016年
2 仲兆满;事件本体及其在查询扩展中的应用[D];上海大学;2011年
3 王俊义;正负相关反馈与查询扩展技术的研究[D];内蒙古大学;2012年
相关硕士学位论文 前10条
1 郑永军;基于DMLS的语音关键词检测技术研究[D];解放军信息工程大学;2014年
2 李云飞;基于查询日志的动态查询扩展研究[D];内蒙古大学;2016年
3 杨振瑜;基于概念语义相似度的长句查询扩展研究[D];山东理工大学;2013年
4 赵晶;汉语—泰语的跨语言查询翻译和扩展[D];昆明理工大学;2016年
5 秦广顺;汉越双语新闻事件检索方法研究[D];昆明理工大学;2016年
6 成昊;基于Word2Vec的中文问句检索技术研究及系统实现[D];哈尔滨工业大学;2016年
7 姚小同;查询扩展技术研究[D];北京邮电大学;2009年
8 许威;基于概念格的查询扩展系统及建格算法研究[D];北京邮电大学;2008年
9 胡保祥;基于查询日志的查询扩展研究[D];北京邮电大学;2013年
10 董静;基于信任网络的查询扩展技术研究[D];哈尔滨工程大学;2013年
,本文编号:2487432
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2487432.html