敦煌遗书缀残中的相关残片检索技术研究及系统实现

发布时间：2018-05-26 17:04

本文选题：敦煌遗书 + 缀残　；参考：《浙江大学》2017年硕士论文

【摘要】：敦煌遗书是指敦煌莫高窟中出土的一批具有重要研究价值的古代文籍。现今敦煌遗书由于出土时代对文物不重视等各类原因,散布在世界各地,不便于学者们的研究工作。2012年开始的国家重点项目敦煌遗书数据库的建设,使学者们能够更方便地在线对敦煌遗书进行研究。由于年代久远,敦煌遗书中存在大量残片残卷,其中许多是可以缀合的。但由于遗书整体数量众多,人工缀合费时费力。随着数字化技术的发展,利用图像检索技术帮助进行敦煌遗书缀残工作成为可能,这既是敦煌遗书相关研究的基本需求,也是敦煌遗书数据库项目的重要功能。在数字化敦煌遗书缀残工作中,核心问题是相关残片检索,即可以缀合的残片检索,这也是本文的主要研究课题。本文的研究工作主要内容如下:首先,针对遗书缀残的需求,确定了遗书残片的材质、边缘和字形三个主要特征,并提出了基于这三个特征组成的遗书残片图像特征模型。针对遗书残片图像颜色组成类别较为明确的特点,设计了筛选主次要颜色的方法,并利用主色调颜色直方图的思想设计了材质特征直方图表示材质特征。针对遗书缀残主要考虑残片图像左右边缘匹配程度的特点,提出了基于Canny算法的遗书残片左右边缘提取算法,并利用左右边缘点集表示边缘特征。研究并综合SURF算法和最大最小聚类算法,设计了字形特征提取算法,并利用每个字的特征点集表示字形特征。其次,研究了各个特征的差异度定义,并综合提出了遗书残片图像差异度定义以及基于遗书残片图像差异度的相关残片检索算法。研究了利用EMD距离定义材质差异度的方法。设计了统一图像边缘基准的方法,并利用统一基准后的Hausdorff距离定义边缘差异度。设计了建立字形方向向量直方图的方法,然后利用EMD距离定义字形差异度。并在以上三者基础上,提出了遗书残片图像的差异度定义,并提出了基于此差异度的相关残片检索算法。该算法输入一个残片图像集合,然后先将所有图像根据材质特征聚类,然后分别计算聚类中所有图像间综合差异度,并根据朝代信息进行过滤,最后输出残片集合中每一张残片的对应匹配残片。最后,针对国家重点项目敦煌遗书数据库的二期工作需求,本文设计了敦煌遗书数据库二期系统的主要模块,实现了其中的一系列高级功能。并将之前提出的相关残片检索算法应用到敦煌遗书数据库项目中,完成了相关残片浏览这个二期项目主要功能。
[Abstract]:Dunhuang remains are ancient books unearthed in Dunhuang Mogao Grottoes. Due to various reasons such as the lack of attention to cultural relics in the unearthed era, Dunhuang remains scattered all over the world and is not conducive to the research work of scholars. The construction of the Dunhuang legacy database, a national key project, began in 2012. It makes it easier for scholars to study Dunhuang inscriptions online. Because of old age, Dunhuang remains a large number of fragments, many of which can be conjugated. However, due to the large number of suicide notes, manual conjugation takes time and effort. With the development of digital technology, it is possible to use image retrieval technology to help carry out the work of Dunhuang relic affixes, which is not only the basic demand of Dunhuang inscription research, but also the important function of Dunhuang sequestration database project. In the work of digitizing Dunhuang relic, the core problem is the retrieval of relevant fragments, that is, the retrieval of fragments that can be conjugated, which is also the main research topic of this paper. The main contents of this paper are as follows: firstly, according to the requirements of the scraps, the paper determines the material quality, edge and glyph of the scraps, and puts forward the image feature model of the scraps based on these three features. In view of the clear color category of the scraps, a method of selecting primary and secondary colors is designed, and the material feature histogram is designed to represent the material feature by using the idea of the main color histogram. Aiming at the feature that the left and right edge matching of the fragment image is considered, a left and right edge extraction algorithm based on Canny algorithm is proposed, and the left and right edge points set is used to represent the edge feature. In this paper, the SURF algorithm and the maximum and minimum clustering algorithm are studied and synthesized, and the glyph feature extraction algorithm is designed, and the glyph feature is represented by the feature set of each word. Secondly, the definition of the difference degree of each feature is studied, and the definition of the difference degree of the remnant image and the retrieval algorithm based on the difference degree of the remnant image are put forward synthetically. The method of defining material difference by EMD distance is studied. The method of unified image edge reference is designed, and the edge difference is defined by the Hausdorff distance. The method of setting up the histogram of glyph direction vector is designed, and then the difference degree of glyph is defined by EMD distance. On the basis of the above three methods, the definition of the difference degree of the scraper image is proposed, and the retrieval algorithm based on the difference degree is proposed. The algorithm inputs a fragment image set, then clusters all the images according to the material characteristics, then calculates the comprehensive differences between all the images in the clustering, and filters them according to the information of the dynasty. Finally, the corresponding matching fragments of each fragment in the set of fragments are outputted. Finally, according to the demand of the second phase of the national key project, the paper designs the main module of the second phase of the database of Dunhuang relic, and realizes a series of advanced functions. The related fragment retrieval algorithm was applied to Dunhuang sequel database project, and the main function of the second phase project was completed.
【学位授予单位】：浙江大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【参考文献】