局部线性嵌入算法的改进及其在人脸识别中的应用

发布时间：2018-03-28 13:32

本文选题：流形学习　切入点：人脸识别　出处：《重庆理工大学》2017年硕士论文

【摘要】：人脸识别技术是一种生物特征识别技术,由于其数据采集的友好性、面部的客观性以及应用场景的多样性,使其已成为模式识别与深度学习方面的研究热点。但人脸识别在具体应用过程中会遇到各种实际问题,尤其是对人脸图像特征提取的影响。不同的特征提取方法对于最终的识别有着举足轻重的作用。早期人们一般是从纹理、形态、色彩等主观方面进行,难以提取人脸图像中的本质结构信息。流形学习理论的发展为高维数据的特征提取提供了新的思路,而且相关的研究表明人脸数据更有可能分布于高维的非线性流形结构上,因此非线性降维和流形学习理论越来越多地被人们应用于图像识别尤其是人脸识别中。本文以流形学习为基础,主要研究了局部线性嵌入(Locally Linear Embedding,LLE)算法和有监督的局部线性嵌入(Supervised Locally Linear Embedding,SLLE)算法,针对偏离样本整体分布的样本点在低维重构过程中可能映射在其它平面的不足,同时结合Kmeans++算法的优点,提出了基于聚类的Cluster-SLLE算法;同时针对CSLLE算法引入新的参数、以及类内距与类间距线性关系对噪声鲁棒性较差的缺点,改进了算法中的距离相似性度量,与传统算法相比,该算法在相关的人脸数据集检验中具有较高的识别率。本文的主要研究工作如下:1.对流形学习中基于全局保持以及局部保持的降维方法,如主成分分析、多维尺度分析、拉普拉斯特征映射等进行了较为详细的理论阐述,并在相关的数据集上进行算法的对比分析,研究了各算法存在的优势与不足之处。2.在流形学习的基础上,细致地分析了LLE算法、引用样本类别信息的SLLE算法以及在具体应用过程中参数的取值问题。SLLE算法利用样本的类别标签进行数据点间的相似性度量,但忽略了数据集中类别差异性较大的个体对整体数据的影响,因而提出了基于聚类的Cluster-SLLE算法,通过引入Kmeans++聚类算法标识“奇异点”,对数据点间的距离矩阵作进一步地改进,在Yale和ORL人脸数据集中表明了算法的可行性及泛化能力的提高。3.在SLLE及CSLLE算法中,类间数据点间距离及类内数据点间距离的相似性度量呈线性关系,使得嵌入数据的判别和泛化能力仍被限制在一定的范围;而且样本中存在的噪声会破坏样本间的邻域关系;另外,CSLLE算法虽一定程度提高了识别率,但也引入了新的不确定因素:新参数的取值问题,增加了算法的主观性。针对此种情形,在原有算法的启发下,提出了优化类内样本间距离的度量的改进算法,在减少参数个数的同时,也降低了噪声对实验的干扰,有助于人脸数据的低维嵌入表示。
[Abstract]:Face recognition is a biometric recognition technology. Because of its friendliness of data collection, objectivity of face and diversity of application scene, face recognition technology is a kind of biometric recognition technology. It has become a research hotspot in pattern recognition and depth learning, but face recognition will meet various practical problems in the process of application. In particular, the influence on facial image feature extraction. Different feature extraction methods play an important role in the final recognition. It is difficult to extract essential structure information from face image. The development of manifold learning theory provides a new idea for feature extraction of high-dimensional data, and related research shows that face data is more likely to be distributed on high-dimensional nonlinear manifold structure. Therefore, the theory of nonlinear reduced dimension manifold learning is more and more used in image recognition, especially in face recognition. The local Linear embedding algorithm and the supervised local linear embedding Locally Linear embedding algorithm are studied in this paper. In view of the shortcomings of the sample points deviating from the global distribution of the samples in the low-dimensional reconstruction process, they may be mapped in other planes. Combined with the advantages of Kmeans algorithm, the Cluster-SLLE algorithm based on clustering is proposed, and the distance similarity measurement in CSLLE algorithm is improved by introducing new parameters into CSLLE algorithm, and improving the distance similarity measure of CSLLE algorithm because of its poor robustness to noise due to the linear relationship between inter-class distance and inter-class spacing. Compared with the traditional algorithm, the algorithm has a higher recognition rate in the related face dataset test. The main research work of this paper is as follows: 1. The dimension reduction methods based on global and local preservation in convection learning, such as principal component analysis, are proposed. The multidimensional scale analysis, Laplace feature mapping and so on are discussed in detail, and the algorithms are compared and analyzed on the relevant data sets, and the advantages and disadvantages of each algorithm are studied. 2. On the basis of manifold learning, The LLE algorithm, the SLLE algorithm which refers to the sample category information, and the parameter selection problem in the process of application. SLLE algorithm uses the class label of the sample to measure the similarity between the data points. However, the influence of individuals whose data sets are quite different on the whole data is ignored, so the Cluster-SLLE algorithm based on clustering is proposed. By introducing Kmeans clustering algorithm to identify "singular points", the distance matrix between data points is further improved. In Yale and ORL face data sets, the feasibility of the algorithm and the improvement of generalization ability are demonstrated. 3. In the SLLE and CSLLE algorithms, the similarity measurement of the distance between data points between classes and the distance between data points within classes is linear. The ability of discriminating and generalization of embedded data is still limited to a certain extent, and the noise in the sample will destroy the neighborhood relationship between the samples. In addition, the CSLLE algorithm improves the recognition rate to a certain extent. However, a new uncertain factor is also introduced: the value problem of new parameters increases the subjectivity of the algorithm. In this case, an improved algorithm is proposed to optimize the measurement of the distance between samples in the class, which is inspired by the original algorithm. Not only the number of parameters is reduced, but also the interference of noise to experiments is reduced, which is helpful to the low dimensional embedded representation of face data.
【学位授予单位】：重庆理工大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【参考文献】