基于全文检索系统的安全索引技术研究与实现
发布时间:2018-07-04 21:27
本文选题:全文检索 + 安全索引 ; 参考:《华中科技大学》2012年硕士论文
【摘要】:随着互联网的兴起与蓬勃发展,信息检索技术成为了人们生活中不可或缺的工具。在某些场合中,为了达到保护用户信息的目的,提出安全索引这个概念,即在保证用户信息不被泄露的前提下,同时能达到信息检索的目的。 根据当前所存在的问题,实现了两种具有安全索引的全文检索系统。第一种是基于倒排文档的安全索引模式。它采取了倒排文档作为其索引结构,在索引及查询阶段对文本进行了加密处理,并在写入索引时采取了二次加密手段,在实现了安全索引的基础上很好地继承了倒排文档结构快速准确等优点,但是对于选择明文攻击和频率统计等攻击手段尚不能有效防御。另外针对倒排文档的安全性不足,设计实现了另一种安全索引模式:基于陷门单向函数的安全索引模式。将陷门单向函数用于信息检索中,与伪随机函数相结合,,其不可逆和伪随机特性极大地弥补了倒排文档技术在安全性上的不足,可以很好抵御诸如选择明文攻击等攻击手段,从而更加安全地保护了用户信息。但是计算上的复杂性、与倒排结构的不兼容导致了其检索效率的低下,此外占用的空间大,存在着一定的误判机率等都是其不足之处。 分析了两种索引方式进行检索的原理及各自的安全性,并分别用代码实现了两套索引系统S-Lucene及BF-Index。S-Lucene在开源搜索引擎Lucene的基础上,进行安全性定制;BF-Index时,则采用了Bloom Filter作为索引存储结构。通过实验在各方面性能上对两套系统进行了对比,包括建立索引时间、查询时间、准确率等。通过实验数据分析了两套系统各自的优缺点,并明确以后改进的方向。
[Abstract]:With the rise and flourishing of the Internet, information retrieval technology has become an indispensable tool in people's life. In some cases, in order to achieve the purpose of protecting user information, the concept of security index is put forward, that is, to ensure the purpose of information retrieval at the same time, to ensure that the information of the user is not leaked.
According to the existing problems, two full text retrieval systems with secure index are implemented. The first is a secure index pattern based on inverted document. It takes inverted document as its index structure, encrypts the text in the stage of index and query, and adopts two encryption means when it is written into the index. On the basis of security index, it inherits the advantages of fast and accurate inverted document structure, but it can not be effectively defended for selecting attack methods such as plaintext attack and frequency statistics. In addition, for the lack of security of inverted documents, another security index mode: a secure index mode based on trapdoor unidirectional function is designed and implemented. It combines the trapdoor unidirectional function with the pseudo random function. Its irreversible and pseudo random properties greatly compensate for the inadequacy of the inverted document technology in the security, and can well resist the attack means such as the selection of the plaintext attack, and thus protect the user information more safely. The incompatibility of inverted structure leads to its low retrieval efficiency. Besides, the large space occupied and the probability of misjudgement are all shortcomings.
This paper analyzes the principle of two indexing methods and their respective security, and implements two sets of index systems S-Lucene and BF-Index.S-Lucene on the basis of open source search engine Lucene respectively, and then uses the Bloom Filter as the index storage structure when BF-Index. The two systems are compared, including the establishment of index time, query time, accuracy and so on. The advantages and disadvantages of the two sets of systems are analyzed through the experimental data, and the direction for future improvement is also clearly defined.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3;TP309.7
【参考文献】
相关期刊论文 前9条
1 沈昌祥;张焕国;冯登国;曹珍富;黄继武;;信息安全综述[J];中国科学(E辑:信息科学);2007年02期
2 彭洪汇;林作铨;;Internet上的搜索引擎和元搜索引擎[J];计算机科学;2002年09期
3 印鉴,陈忆群,张钢;搜索引擎技术研究与发展[J];计算机工程;2005年14期
4 李庆虎,陈玉健,孙家广;一种中文分词词典新机制——双字哈希机制[J];中文信息学报;2003年04期
5 黄昌宁;赵海;;中文分词十年回顾[J];中文信息学报;2007年03期
6 孙琦;关于一类陷门单向函数[J];四川大学学报(自然科学版);1985年04期
7 陈晓峰,王育民;公钥密码体制研究与进展[J];通信学报;2004年08期
8 钱爱兵;全文检索算法设计及全文检索系统概述[J];现代图书情报技术;2003年02期
9 杨一平;中文全文检索算法研究[J];计算机系统应用;1997年09期
相关硕士学位论文 前1条
1 刘兴宇;基于倒排索引的全文检索技术研究[D];华中科技大学;2004年
本文编号:2097494
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2097494.html