当前位置:主页 > 科技论文 > 搜索引擎论文 >

私有信息检索算法研究

发布时间:2019-02-27 08:33
【摘要】:随着信息技术的广泛应用,公共可访问的数据库和搜索引擎是用户获取最新信息的重要资源。但是,由于传统的私有信息检索模型本身存在的不足,,很难应用于实际的大型数据库和搜索引擎中。因此,研究新的、实用的私有信息检索模型及算法具有重要的意义。 通过对现有的私有信息检索系统以及基于词语语义相似度的私有信息检索系统的功能要求进行分析,给出了一个基于词语语义相似度的私有信息检索模型。对模型中的词语语义相似度计算、伪造关键字的选择策略、查询信息隐藏和查询结果过滤进行了相关的分析,设计了私有信息检索系统的总体架构。系统架构包括词语语义相似度计算模块、查询处理模块和页面抓取过滤模块。 给出了基于WordNet和HowNet的词语语义相似度计算的算法实现。在已有的基于WordNet的词语语义相似度计算算法的基础上,引入节点深度的影响因素。然后将基于WordNet的词语语义相似度的计算算法应用于HowNet的义原相似度计算中。实验表明,改进算法的相似度计算结果更精确,更符合人们日常的语义习惯。 给出了基于词语语义相似度的私有信息检索算法。其中伪造关键字的选择标准是算法的关键之处。该算法选择词语语义相似度作为伪造关键字的选择标准,要求伪造关键字与目标关键字的语义相似度满足一定的条件。该算法的时间复杂度是O (k),其中k表示伪造关键字的个数。实验表明,基于词语语义相似度的私有信息检索模型同GooPir模型相比,查询结果质量有一定的提高,信息熵有所下降,但降幅不大。
[Abstract]:With the wide application of information technology, publicly accessible databases and search engines are important resources for users to obtain the latest information. However, due to the shortcomings of the traditional private information retrieval model, it is difficult to apply to the actual large-scale database and search engine. Therefore, it is of great significance to study new and practical private information retrieval models and algorithms. By analyzing the functional requirements of existing private information retrieval systems and private information retrieval systems based on word semantic similarity, a private information retrieval model based on word semantic similarity is proposed. In this paper, the semantic similarity calculation of words in the model, the selection strategy of forged keywords, the hiding of query information and the filtering of query results are analyzed, and the overall architecture of private information retrieval system is designed. The system architecture includes word semantic similarity computing module, query processing module and page crawling filter module. The algorithm implementation of semantic similarity calculation based on WordNet and HowNet is given. Based on the existing algorithms for computing semantic similarity of words based on WordNet, the influencing factors of node depth are introduced. Then the semantic similarity calculation algorithm based on WordNet is applied to the semantic similarity calculation of HowNet. Experimental results show that the similarity calculation results of the improved algorithm are more accurate and more consistent with the daily semantic habits of people. A private information retrieval algorithm based on semantic similarity of words is presented. The key point of the algorithm is the selection criteria of forged keywords. This algorithm chooses semantic similarity of words as the selection criterion of forged keywords and requires semantic similarity between forged keywords and target keywords to satisfy certain conditions. The time complexity of the algorithm is O (k), where k denotes the number of forged keywords. The experimental results show that compared with GooPir model, the quality of query results is improved, the entropy of information is decreased, but the decrease of information entropy is not obvious in the private information retrieval model based on semantic similarity of words.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3

【参考文献】

相关期刊论文 前3条

1 夏天;;汉语词语语义相似度计算研究[J];计算机工程;2007年06期

2 吴健,吴朝晖,李莹,邓水光;基于本体论和词汇语义相似度的Web服务发现[J];计算机学报;2005年04期

3 祁X;黄刘生;罗永龙;荆巍巍;;一种高效的私有信息检索方案[J];小型微型计算机系统;2007年07期



本文编号:2431286

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2431286.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户f580a***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com