基于堆叠乘积量化的最近邻检索

发布时间：2018-07-24 22:03

【摘要】：近年来,大数据、人工智能和物联网技术得到飞速的发展,图像、视频等高维数据正呈现爆炸性增长。在这些海量的高维数据中查找目标数据也随之变得耗时和低效。为了解决上述问题,近似最近邻的概念及各种算法被陆续提出,并成为图像检索、机器学习、数据挖掘等多种应用中的一类基本算法。其中的乘积量化算法(PQ)具备内存消耗低,查询效率高等优点,被证明是解决高维空间近似最近邻查找的最有效算法之一。基于经典乘积量化算法的不足,近年来不少学者对乘积量化算法进行了优化改进,由于乘积量化算法中存在向量子空间中特征点分布不均匀的问题,优化的乘积量化算法(OPQ)被提出来,用于优化空间特征点的重新分配。由于对特征向量进行简单划分,会导致子向量间的相互独立,有学者提出了加法量化算法(AQ)以解决这个问题。为解决特征向量在进行量化时,存在量化误差较大的问题,堆叠量化算法(SQ)通过迭代地对误差进行量化,以进一步降低量化误差。本文中,我们提出了一种新的量化算法来做近似最近邻查找:堆叠乘积量化算法。这种量化算法融合了堆叠量化算法的量化误差低和乘积量化算法内存消耗低的优点。该算法的核心思想为:第一步将高维的特征向量划分成维度相同的低维子特征向量,在每个子向量空间中进行k-means聚类量化;第二步将特征向量与量化后对应的编码单词做差得到对应的误差向量;第三步把误差向量看成特征向量,以此进行划分子向量、子向量分别量化、求误差向量的操作;迭代第三步直到达到终止条件,从而产生一组从粗糙到精细的分层子码本。对分层子码本进行笛卡儿乘积得到整体向量的码本,进而可以得到原始向量的编码表示。本文在经典的SIFT1M、GIST1M和ConvNet1M-128数据集进行实验设计及验证。大量的实验表明,在不少算法性能指标上,比如量化误差、召回率、可扩展性等,本文算法与经典的量化方法相比都有较大优势。而且,对比乘积量化方法,我们的方法能够产生精确度更高的子码本;对比堆叠量化算法,我们的方法消耗内存少且具备较好的扩展性。
[Abstract]:In recent years, big data, artificial intelligence and Internet of things technology have been rapid development, image, video and other high-dimensional data are explosive growth. Finding target data in these massive high-dimensional data becomes time consuming and inefficient. In order to solve the above problems, the concept of approximate nearest neighbor and various algorithms have been proposed one after another, and become a kind of basic algorithms in image retrieval, machine learning, data mining and other applications. The product quantization algorithm (PQ) has the advantages of low memory consumption and high query efficiency. It has been proved to be one of the most effective algorithms for approximate nearest neighbor search in high dimensional space. Due to the shortcomings of the classical product quantization algorithm, many scholars have improved the product quantization algorithm in recent years, because of the problem of uneven distribution of the feature points in the product quantization algorithm. (OPQ), an optimized product quantization algorithm, is proposed to optimize the redistribution of spatial feature points. Because the simple partition of the feature vectors leads to the independence of the subvectors, some scholars have proposed the additive quantization algorithm (AQ) to solve this problem. In order to solve the problem of large quantization error in the quantization of eigenvector, the stacked quantization algorithm (SQ) quantizes the error iteratively to further reduce the quantization error. In this paper, we propose a new quantization algorithm to do approximate nearest neighbor lookup: stack product quantization algorithm. This quantization algorithm combines the advantages of low quantization error of stack quantization algorithm and low memory consumption of product quantization algorithm. The key idea of the algorithm is as follows: firstly, the high-dimensional feature vectors are divided into low-dimensional subvectors with the same dimension, and the k-means clustering quantization is carried out in each subvector space. In the second step, the feature vector and the corresponding coded word after quantization are devided to the corresponding error vector, the third step regards the error vector as the feature vector, and then divides the subvector, quantizes the subvector, and calculates the operation of error vector. Iterate the third step until the termination condition is reached to produce a set of hierarchical subcodebooks from rough to fine. The codebook of the global vector is obtained by Cartesian product of the layered subcodebook, and the coding representation of the original vector can be obtained. In this paper, the classical SIFT1M GIST1M and ConvNet1M-128 data sets are designed and verified. A large number of experiments show that this algorithm has more advantages than classical quantization methods in many performance indexes such as quantization error recall rate and scalability. In addition, our method can produce more accurate subcodebook by contrast product quantization. Compared with stack quantization algorithm, our method consumes less memory and has better expansibility.
【学位授予单位】：大连海事大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【相似文献】