基于图像分割和区域语义相关性的图像标注算法研究

发布时间：2018-11-17 08:17

【摘要】：随着计算机技术、网络技术和智能通讯技术的飞速发展,大量的图像数据在网络上广泛传播,并且呈现爆炸式增长,如何有效地管理和利用这些图像资源已经成为当前面临的一项难题。虽然人们在图像检索领域已经取得了不少成果,但是仍然存在很多问题。基于文本的图像检索由于效率低和人为主观性早已无法满足当前大数据时代的需求;基于内容的图像检索由于无法解决“语义鸿沟”问题而阻碍了其发展;基于语义的自动图像标注是当前图像检索领域的主要发展方向,研究者在该领域做了很多研究和探索,但是仍然面临着很多技术难题。针对图像检索领域的研究现状和发展趋势以及当前所面临的诸多难题,本文提出了一系列有效的改进方法,主要有以下几点:(1)基于语义的自动图像标注需要利用图像分割算法对图像进行预处理,并且准确而有效的进行图像分割,对后面图像特征提取以及标注模型的构建非常重要。本文提出了一种改进的图像分割算法,该算法的基本思想是:首先使用Mean Shift算法对图像进行预分割,由于Mean Shift算法对图像边缘比较敏感,因而可以很好的提取出图像的边缘信息,但是该算法也很容易产生很多小的区域,针对这一缺点,本文利用Ncut算法对上一步得到的图像区域进行进一步处理,由于Ncut算法总是倾向于得到较大的图像区域,因而可以解决Mean Shift的过分割问题,并且由于Ncut处理的是已经分割好的图像区域,而不是像素点,所以大大减少了计算量,提高了算法性能,然而Ncut算法也存在一定的不足,该算法是一个NP难题,进行分割之前需要首先指定分割区域个数,如果该参数设置不当,也很容易产生过分割和欠分割现象,因而本文利用区域合并与分裂算法对Ncut处理后得到的分割结果进行进一步校正,对过分割区域进行合并,对欠分割区域进行分裂,尽可能提高图像分割结果的准确度。(2)本文提出了一种结合区域语义相关性和高斯混合模型的改进图像语义标注方法。传统的高斯混合模型都是直接根据语义后验概率的大小来得到图像标注结果:一种是直接选择语义后验概率较大的N个语义词作为图像的标注结果,另一种是直接选择语义后验概率大于某个阈值的语义词作为图像标注结果。而这种方法得到的标注结果并不准确,很容易产生一些多余的或者错误的标注词,影响标注结果的准确度。而且考虑到模型中的“语义鸿沟”问题,后验概率的大小并不能完全决定其权重,仅依据后验概率进行分类决策可能存在较大误差。针对以上问题,本文提出了一种基于区域语义相关性的GMM图像标注方法,将各区域之间的语义相关性融合到GMM模型中进行综合决策,对该模型的标注结果进行有效的校准和优化,从而提高标注结果的准确度。
[Abstract]:With the rapid development of computer technology, network technology and intelligent communication technology, a large number of image data are widely spread on the network, and show explosive growth. How to manage and utilize these image resources effectively has become a difficult problem. Although many achievements have been made in the field of image retrieval, there are still many problems. Because of its low efficiency and artificial subjectivity, text-based image retrieval has been unable to meet the needs of the current big data era, and content-based image retrieval has hindered its development because of its inability to solve the problem of "semantic gap". Semantic automatic image annotation is the main development direction in the field of image retrieval. Researchers have done a lot of research and exploration in this field, but still face a lot of technical difficulties. In view of the present situation and development trend of image retrieval field and many difficult problems, this paper puts forward a series of effective improvement methods. The main points are as follows: (1) automatic image tagging based on semantics needs to use image segmentation algorithm to preprocess the image and to segment the image accurately and effectively. It is very important for the feature extraction and the construction of the tagging model. In this paper, an improved image segmentation algorithm is proposed. The basic idea of the algorithm is: firstly, the Mean Shift algorithm is used to pre-segment the image, because the Mean Shift algorithm is sensitive to the edge of the image. Therefore, the edge information of the image can be extracted very well, but the algorithm can easily produce a lot of small regions. In view of this shortcoming, this paper uses Ncut algorithm to further process the image region obtained from the previous step. Because the Ncut algorithm always tends to get large image regions, it can solve the problem of over-segmentation of Mean Shift, and because Ncut deals with image regions that have been segmented rather than pixels, it greatly reduces the amount of computation. The performance of the algorithm is improved, but the Ncut algorithm also has some shortcomings. This algorithm is a difficult problem of NP. It is necessary to specify the number of segmentation regions before segmentation. If the parameter is not set properly, it is easy to produce over-segmentation and under-segmentation. In this paper, we use the region merging and splitting algorithm to further correct the segmentation results obtained by Ncut processing, merge the over-segmented regions and split the under-segmented regions. The accuracy of image segmentation is improved as much as possible. (2) an improved image semantic annotation method combining regional semantic correlation with Gao Si mixed model is proposed. The traditional Gao Si mixed model is based on the size of the semantic posteriori probability directly to get the image tagging results: one is the direct selection of semantic posteriori probability of N semantic words as the image tagging results. The other is to directly select semantic words whose semantic posteriori probability is greater than a threshold value as the result of image tagging. However, the results obtained by this method are not accurate, and it is easy to produce some redundant or incorrect tagging words, which affects the accuracy of the labeling results. Considering the "semantic gap" problem in the model, the magnitude of the posterior probability can not completely determine its weight, and there may be large errors in the classification decision only based on the posterior probability. Aiming at the above problems, this paper proposes a method of GMM image tagging based on regional semantic correlation, which integrates the semantic correlation of each region into the GMM model to make comprehensive decision. The labeling results of the model are calibrated and optimized effectively, so as to improve the accuracy of the labeling results.
【学位授予单位】：华中师范大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【参考文献】