当前位置:主页 > 科技论文 > 软件论文 >

K-Means聚类算法的优化及在图片去重中的应用

发布时间:2018-10-23 10:53
【摘要】:随着云存储服务高速发展及普及,多媒体数据如图片、视频等越来越成记录和分享信息的主要方式。与传统文字记录相比,图片等多媒体数据存储所占用存储空间要大得许多。因此在应对图片等多媒体数据时,如何有效的对图片去重,减少存储图片的容量也是一种新的挑战。研究观察发现在主流的社交网站(如facebook、qq、百度云)中,相似性图片在总图片数量集中占据很大比例。相似性图片其定义为经过图片的常见变换,例如图片的连拍、水印、裁剪、缩放等操作得到的一系列图片。针对这一发现,设计出一种图片去重系统。图片去重系统可以分为两部分。第一部分,将图片集进行相似性聚类,对图片集进行基于内容的图片检索。在图片检索技术方面,先将图片进行预处理,提取图片局部特征值,对提取的所有特征值执行K-Means聚类算法,将最终的聚类中心作为BOW(Bag-of-Words)模型的视觉单词,用视觉单词对SIFT特征点集进行量化处理,从而达到一张图片只需要用一个特征向量表示。最后采用倒排索引方式,将相似性图片进行聚类。第二部分,由于已聚类好的相似性图片其相似度很高,采用视频流压缩算法对图片进行压缩,极大减少图片存储容量。K-Means聚类算法是图片相似性聚类过程中的关键技术,其执行速度与结果将直接影响相似性图片压缩效果。换句话说,K-Means聚类算法会是整个系统的一个性能瓶颈。当处理大数量特征点时,标准K-Means聚类算法中数据点n和中心点k值会变得相当大,从而使得K-Means聚类算法效率变得低下。设计实现一种K-Means聚类算法优化方案,使其能够在面对大数据量n和k值情况下,降低算法时间复杂度,提升K-Means算法执行效率,因而应用于图片去重系统中提升系统执行速度。最后根据实验测试结果显示,优化的K-Means算法在大数量级下有较好的性能提升。
[Abstract]:With the rapid development and popularization of cloud storage services, multimedia data such as pictures and videos have become the main way to record and share information. Compared with traditional text records, multimedia data storage such as pictures takes up much more storage space. Therefore, how to reduce the storage capacity of images is a new challenge when dealing with multimedia data such as pictures. The study found that in mainstream social networking sites, such as the facebook,qq, Baidu cloud, similar images accounted for a large proportion of the total number of images. The similarity picture is defined as a series of pictures which are obtained by the common transformation of the picture, such as continuous shooting, watermark, clipping, zooming and so on. In response to this discovery, a system of image removal was designed. The image removal system can be divided into two parts. In the first part, the similarity clustering is carried out, and the content-based image retrieval is carried out. In the aspect of image retrieval technology, the image is preprocessed, the local feature value is extracted, the K-Means clustering algorithm is implemented for all the extracted feature values, and the final clustering center is regarded as the visual word of the BOW (Bag-of-Words) model. The SIFT feature point set is quantized with visual words, so that only one feature vector is used to represent a picture. Finally, the similarity images are clustered by inverted index. In the second part, due to the high similarity of the well clustered images, the video stream compression algorithm is used to compress the images, which greatly reduces the storage capacity of the images. K-Means clustering algorithm is the key technology in the process of image similarity clustering. Its execution speed and result will directly affect the image compression effect of similarity. In other words, K-Means clustering algorithm is a performance bottleneck for the whole system. When dealing with a large number of feature points, the data point n and the center point k in the standard K-Means clustering algorithm become quite large, which makes the K-Means clustering algorithm inefficient. An optimization scheme of K-Means clustering algorithm is designed and implemented, which can reduce the time complexity of the algorithm and improve the execution efficiency of the K-Means algorithm in the case of large amount of data n and k, so it can be applied to the image removal system to improve the execution speed of the system. Finally, the experimental results show that the optimized K-Means algorithm has better performance in large order of magnitude.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.41;TP311.13

【参考文献】

相关博士学位论文 前2条

1 李大瑞;基于大规模视觉模式学习的高性能图像表示[D];中国科学技术大学;2014年

2 叶航军;面向大规模图像库的索引和检索机制研究[D];清华大学;2003年

相关硕士学位论文 前5条

1 曹六一;海量图像检索系统设计与实现[D];山东大学;2014年

2 王琰;JPEG2000图像压缩算法软件优化设计[D];西安电子科技大学;2012年

3 李俊伟;H264帧内预测和去块滤波的研究和实现[D];电子科技大学;2010年

4 马斌;基于DSP的JPEG编码器的实现与优化[D];西安电子科技大学;2009年

5 赵刚;多媒体数据库降维与索引算法研究及系统实现[D];吉林大学;2007年



本文编号:2289022

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2289022.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户28ba6***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com