DNA序列比对结果的存储与压缩
发布时间:2018-04-08 17:00
本文选题:DNA序列比对结果 切入点:存储 出处:《复旦大学》2012年硕士论文
【摘要】:随着生物信息学、分子生物学等学科研究的深入,以及人类基因计划的完成,越来越多的人类基因和其他模式生命体的基因被测序。序列比对是处理测序结果的方法,可以发现生物序列之间存在的结构、功能和进化的关系,是生物信息学的基础。 随着这些测序项目的展开,每天都有海量的DNA序列数据产生,DNA序列数据经过序列比对处理,比对结果数据也随之出现。虽然存储设备的快速发展已经在一定程度上缓解了相关数据量急剧膨胀的问题。然而随着比对研究的深入,单纯依靠增加硬件设备已经无法满足DNA比对结果数据量快速增长的需求,存储和使用这些数据的成本也终将增加至无法承担的规模。 下一代测序技术平台(NGS)在很大程度上减少了测序的成本开销,使得基因序列分析在实践医疗场景之中的应用成为可能。因此,不论是从存储方面,还是应用方面考虑,序列比对结果的压缩在DNA数据的存储、管理和传输中起到了重要作用。DNA序列数据的压缩目前已经引起了国内外学术界的广泛关注,然而,很少有学者研究如何在实际医疗场景下压缩比对结果。基因比对结果的存储在未来的发展中仍面临着巨大挑战。 在本文中,我们从医疗场景的应用角度出发,设计出满足需求的存储结构,并在此基础上设计出两种不同的压缩策略,以降低空间存储代价。实验数据表明,当覆盖率提升时,我们的压缩方案略微优于RAR标准压缩和ZIP标准压缩。基于以上方法完成了“DNA序列比对结果存储与压缩系统”,系统实现了对海量DNA比对结果的存储,并提供了图形化界面。
[Abstract]:With the development of bioinformatics, molecular biology and other subjects, and the completion of human gene project, more and more genes of human genes and other model organisms have been sequenced.Although the rapid development of storage devices has to some extent alleviated the problem of the rapid expansion of related data.However, with the deepening of the comparative research, it is no longer possible to meet the demand of increasing the amount of data from DNA comparison results simply by increasing the hardware devices, and the cost of storing and using these data will eventually increase to an unaffordable scale.The next generation sequencing technology platform (NGS) greatly reduces the cost of sequencing, which makes the application of gene sequence analysis in practical medical scenarios possible.Therefore, whether in terms of storage or application, the compression of sequence alignment results in the storage of DNA data,The compression of DNA sequence data plays an important role in the field of management and transmission. At present, the compression of DNA sequence data has attracted extensive attention in academic circles at home and abroad. However, few scholars have studied how to compress the results in actual medical scenarios.The storage of gene comparison results is still facing great challenges in the future.In this paper, we design a storage structure to meet the requirements from the perspective of medical scenarios, and then design two different compression strategies to reduce the cost of space storage.Experimental data show that our compression scheme is slightly better than that of RAR standard and ZIP standard when coverage increases.Based on the above methods, a "DNA sequence alignment result storage and compression system" is completed. The system realizes the storage of massive DNA alignment results, and provides a graphical interface.
【学位授予单位】:复旦大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP333
【参考文献】
相关期刊论文 前1条
1 张春霆;生物信息学的现状与展望[J];中国青年科技;2001年01期
,本文编号:1722517
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1722517.html