当前位置:主页 > 科技论文 > 计算机论文 >

溯源的高效存储管理及在安全方面的应用研究

发布时间:2018-09-10 08:28
【摘要】:如今,全世界每天都在爆炸性的产生各种新的信息量。对于存储系统的容量需求,也从PB(Petabyte)、EB(Exabyte)到如今能容纳‘'Big Data"的海量存储系统在发展。尽管有各种新的存储器件在不断产生,新的存储体系架构也在不断提出,但对于海量数据本身的分析和理解却停滞不前。比如,当我们在云端获取某些重要数据时,我们可能会问,这些数据从哪里来,之前有人用过么,可靠性和安全性如何? 溯源(Provenance),作为一种包含了数据对象历史信息的元数据,正好可以用来解答这样的问题。比如,一个数据对象是如何被创建的,经过了哪些修改,两个数据对象的祖先有什么不同。在系统领域,一个数据的溯源是所有影响这个数据最终状态的进程信息和相关数据。正因为溯源揭示了数据对象的过去或产生过程,使得溯源具有了更广泛的使用价值。现在,溯源已经被科学家用来验证重要的实验数据集,提高桌面搜索的效率,审计重要的财务账目等,还有一些研究正在将它用于重复性数据删除,分布式安全等领域。但目前针对溯源特点的研究还并不多。比如,溯源的一大特点是数据量大,但现在还很少有比较好的算法在大量压缩溯源的同时支持对溯源的高效查询。另外,溯源记载了数据的生成历史,但对于用溯源来保证数据可靠性以及根据这种生成历史来分析系统入侵行为的研究却并不多。 提出了一种可高效压缩溯源的基于web图形压缩和字典编码的混合压缩方法。通过利用溯源图和web图的相似性,该方法充分挖掘了溯源图节点中的局部性和相似性特征,以及消除了溯源信息中固有的一些重复性字符串。和以往的压缩方法相比,该方法能进一步压缩溯源图中边上的信息,具有更细的压缩粒度,并且支持对溯源的高效查询。在大量溯源trace上的实验表明,该方法在压缩率、压缩时间和查询性能等方面,相比其它压缩模式提供了最好的折衷。 提出了一种面向单个数据对象进行重建、可并行重建及设置重建优先级的基于溯源的数据重建方法。通过回溯数据文件的生成过程,该方法可以准确地重建丢失或受损的文件。相比以往更注重整个硬盘或系统安全的保证数据存储可靠性的解决方案(例如,日志文件、快照或备份),其优势主要在于,能重建单个数据对象,能并行重建多个数据对象,以及优先重建重要的数据文件。基于溯源的数据重建系统在文件被正常读取时,能够收集文件的溯源信息。而在文件丢失或损坏后,能自动重建这些文件。并且在重建过程中,能恢复受影响的其它文件。实验结果表明,基于溯源的重建性能显著优于以日志为基础的重建性能。尽管有溯源数据库大小等影响溯源重建的因素,但实验表明,这些因素对基于溯源的重建性能影响并不大。 提出了一种采用溯源信息来进行入侵检测的方法,通过对和系统进行交互的进程收集溯源信息,从而确定入侵进程对文件访问和修改的详细行为模式,进而方便快捷地判断系统是否入侵以及找出系统漏洞。该方法克服了采用传统的系统/网络日志来进行人工分析时的复杂性和低效性。另外,由于日志一般记录的仅仅是系统事件中的部分信息,比如说]HTTP连接或者Login记录,从而使得整个分析过程非常困难。基于溯源的入侵检测方法,将和系统进行交互的网络连接当做文件对象,并收集系统进程和文件对象之间依赖关系的溯源信息,然后构造溯源图,这样管理员就可以找出入侵路径。通过对入侵链上的每个事件进行分析,就可以确定系统漏洞以及入侵攻击来源。实验结果表明,基于溯源的入侵检测机制和传统方法相比,具有较低的误检率以及更高的检测率,只有较小的空间开销,并且几乎对系统性能无影响。 提出了一种利用基于对象的主动存储技术来显著优化溯源处理和在网络上传输的性能的方法。溯源数据产生的持续性和大量性,使得溯源数据在网络环境下的传输成为了一个重要的网络瓶颈因素。采用基于对象的主动存储技术能很好地解决这一问题。一方面,主动存储技术将溯源的处理从主机下放到存储设备,从而大大减少了溯源经由存储设备在网络上传输的数据量;另一方面,基于对象的存储设备相比传统的块设备,具有更强大的处理能力,可以更加智能化、自动化的处理溯源。在对象存储设备内,普通的数据文件和溯源数据库记录都被当做用户对象。而各种数据处理任务则被当做功能对象,它们将被灵活的调度执行来完成系统所要执行的一系列任务,如溯源数据的压缩、查询、数据的重建等。评估表明,基于对象的主动存储技术能显著地提升利用溯源来重建数据的性能。
[Abstract]:Nowadays, all kinds of new information are produced explosively all over the world. The capacity requirement of storage system is also developing from PB (Petabyte), EB (Exabyte) to mass storage system which can accommodate''Big Data'. The analysis and understanding of quantitative data itself is stagnant. For example, when we get some important data in the cloud, we might ask, where does this data come from, have anyone used it before, and how reliable and secure is it?
Provenance, as a metadata that contains historical information about data objects, can be used to answer questions such as how a data object is created, what modifications have been made, and how the ancestors of the two data objects differ. Traceability is now being used by scientists to validate important experimental datasets, improve the efficiency of desktop search, audit important financial accounts, and so on. It is used in the fields of repetitive data deletion, distributed security and so on. However, there are not many researches on traceability. For example, traceability is characterized by large amount of data, but few good algorithms support efficient query of traceability while compressing a large amount of traceability. However, there are few studies on traceability to ensure data reliability and to analyze system intrusion based on this generation history.
A hybrid compression method based on web graphics compression and dictionary encoding is proposed, which can compress traceability efficiently. By using the similarity between traceability graph and web graph, the locality and similarity characteristics of traceability graph nodes are fully exploited, and some repetitive strings inherent in traceability information are eliminated. Compared with other compression schemes, this method can further compress the edge information in the traceability graph, has finer compression granularity, and supports efficient query for traceability.
This paper presents a traceability-based data reconstruction method for reconstructing a single data object, which can reconstruct and prioritize the reconstructed data in parallel. By tracing back the generation process of data files, this method can reconstruct the lost or damaged files accurately. Sexual solutions (e.g., log files, snapshots, or backups) have the advantage of reconstructing a single data object, reconstructing multiple data objects in parallel, and giving priority to reconstructing important data files. The experimental results show that the performance of traceability-based reconstruction is significantly better than that of log-based reconstruction. Although there are factors such as the size of traceability database that affect traceability reconstruction, experiments show that these factors affect traceability-based reconstruction. Performance has little impact.
This paper presents a method of Intrusion Detection Based on traceability information. By collecting traceability information from the process interacting with the system, the intrusion process can determine the detailed behavior mode of file access and modification, and then judge whether the system is intruded and find out the system vulnerabilities quickly and conveniently. The complexity and inefficiency of system/network logs for manual analysis. In addition, because logs generally record only part of the information in system events, such as] HTTP connections or login records, the whole analysis process is very difficult. File objects collect the traceability information of dependencies between system processes and file objects, and then construct traceability graph, so that administrators can find the intrusion path. By analyzing each event in the intrusion chain, we can determine the system vulnerabilities and the source of intrusion attacks. Compared with traditional methods, the proposed method has lower false alarm rate and higher detection rate, less space overhead and almost no impact on system performance.
An object-based active storage technique is proposed to significantly optimize the performance of traceability processing and transmission over the network. The persistence and abundance of traceability data make the transmission of traceability data become an important bottleneck factor in the network environment. On the one hand, active storage technology reduces the amount of data transmitted by traceability from the host to the storage device, and on the other hand, object-based storage devices have more powerful processing power and can be more intelligent than traditional block devices. Automated processing traceability. In object storage devices, ordinary data files and traceable database records are treated as user objects. While various data processing tasks are treated as functional objects, they will be flexibly scheduled to perform a series of tasks, such as compression of traceable data, query, data reconstruction. Evaluations show that object-based active storage technology can significantly improve the performance of data reconstruction using traceability.
【学位授予单位】:华中科技大学
【学位级别】:博士
【学位授予年份】:2013
【分类号】:TP333

【参考文献】

相关期刊论文 前1条

1 王黎维;鲍芝峰;KOEHLER Henning;周晓方;SADIQ Shazia;;一种优化关系型溯源信息存储的新方法[J];计算机学报;2011年10期



本文编号:2233914

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2233914.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户bce10***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com