基于稀疏贪婪搜索的人脸画像合成

发布时间：2018-08-19 20:03

【摘要】：人脸画像合成是通过机器学习对照片和画像之间的复杂映射关系建模进而利用该模型从照片合成画像的过程。画像合成对刑侦破案和数字娱乐具有重要的应用价值。例如,当案件发生后,由于环境或硬件条件的制约,警方无法获取犯罪嫌疑人的影像资料。此时,画家根据受害者或者目击证人的描述绘制的素描画像就成为犯罪嫌疑人照片的最优替代。将警方数据库中的身份证照片转换成对应的素描画像,再利用犯罪嫌疑人的素描画进行检索,从而缩小或锁定犯罪嫌疑人的身份。另外,随着社交媒体的发展,很多年轻人都希望自己的用户头像富有个性,因此各种风格的素描画便成为他们热衷的选择之一。此外,人脸画像合成还可作为其他计算机视觉任务的重要组成部分,比如人脸画像老化等。现有基于机器学习的人脸画像合成方法可分成两大类：模型驱动和数据驱动的方法。本论文致力于研究数据驱动的方法,针对现有数据驱动方法存在的一些不足,如对测试照片要求严格、需要依赖大量的训练样本等,进行方法的创新。本文的主要创新点可以概括为：1. 提出一种基于多照片-画像对的人脸画像合成方法。现有的数据驱动方法只考虑局部搜索策略,导致无法顺利合成测试照片独有的非人脸因素。此外,局部搜索要求测试照片和训练集中的图像对齐,限制了测试照片的要求。针对上述问题,提出一种基于多照片-画像对的人脸画像合成方法：第一步利用稀疏编码算法将图像块像素特征变成稀疏表示特征,提高算法对干扰的鲁棒性；第二步利用稀疏表示中每个稀疏系数的值以及稀疏系数编码的次序这两个信息对训练图像块建立搜索树,提高算法的搜索精度和速度；第三步利用测试照片的先验信息,结合图模型,通过贝叶斯推断进行人脸画像的合成。第一步和第二步简称稀疏贪婪搜索算法。实验表明所提方法相较于已有的数据驱动方法能更好更快的合成非人脸因素,而且对于任意的测试照片都适用。2. 提出一种基于单照片-画像对的人脸画像合成方法。现有数据驱动方法需要依赖大量的照片-画像对作为训练集。然而获取大量照片-画像对的代价很大,限制了已有方法的实际应用。此外,在某些极端情况下只有一个照片-画像对可用。针对上述问题,提出一种基于单照片-画像对的人脸画像合成方法：第一步对训练集中的单照片-画像对建立高斯金字塔,不仅增加了训练样本而且考虑了人脸结构的尺度信息；第二步利用稀疏贪婪搜索算法得到测试照片的初始画像,充分保持了基于多照片-画像对的人脸画像合成方法的优点；第三步利用由测试照片和初始画像以及已有的单照片-画像对所构成的新训练集,通过结合级联回归策略和图模型进行最终的人脸画像合成。实验表明所提方法能取得与最新数据驱动方法可比拟的结果,而且同样能合成非人脸因素且不限制测试照片的要求。3. 提出一种基于单目标画像的人脸画像合成方法。现有数据驱动方法需要依赖照片-画像对,不管是大量还是一对,都限制了合成任意风格画像的能力。针对上述问题,提出一种基于单目标画像的人脸画像合成方法。第一步利用稀疏贪婪搜索算法合成测试照片的初始画像；第二步利用多尺度特征寻找候选画像块；第三步利用基于多特征的最优化模型精选候选画像块；第四步利用级联回归策略对初始画像进行质量提升。实验表明所提方法能取得与最新数据驱动方法可比拟的结果。而且在以实验所列的风格目标画像作为训练集的情况下,所提方法对于任意给定的测试照片都能合成质量良好的对应风格画像,这使算法更加有利于数字娱乐。4. 提出一种基于统一框架的人脸画像合成方法。现有数据驱动方法在候选图像块搜索时只利用了局部搜索策略而上述所提方法则只利用了全局搜索策略。此外,已有方法在最终画像合成时大多利用了多个候选块的线性组合,导致合成结果有些平滑。而已有的高频重建策略都是模型驱动的方法,缺失画像风格信息。针对上述问题,提出一种基于统一框架的人脸画像合成方法：第一步将训练集划分为初始训练集和高频训练集；第二步给定测试照片利用局部搜索策略和全局搜索策略在初始训练集中进行候选图像块的搜索,充分利用局部位置信息和人脸全局相似性的信息,然后利用图模型进行人脸初始画像合成；第三步对给定的测试照片利用与初始画像合成相同的策略进行人脸高频画像的合成。将初始画像和高频画像相加得到最终的人脸画像。实验表明所提算法能同时合成非人脸部件和人脸部件,并且高频细节更加丰富。综上,本文本着训练样本从多到少、研究思路从偏到全的方式,提出以稀疏贪婪搜索作为基础的四种人脸画像合成方法,用于提升人脸画像合成的实用性。理论分析和实验结果表明了所提出方法相对于已有方法的优越性。
[Abstract]:Face portrait synthesis is a process of synthesizing images from photographs by modeling the complex mapping relationship between photographs and images through machine learning. Portrait synthesis has important application value in criminal investigation and digital entertainment. For example, when a case occurs, the police can not obtain a crime because of environmental or hardware constraints. At this point, the painter's sketch based on the description of the victim or witness becomes the best alternative to the photograph of the criminal suspect. In addition, with the development of social media, many young people want their user portraits to be personalized, so sketching of various styles has become one of their favorite choices. In addition, face portrait synthesis can also be an important part of other computer vision tasks, such as face portrait aging. Machine-learning face portrait synthesis methods can be divided into two categories: model-driven and data-driven methods. This paper focuses on data-driven methods, aiming at the shortcomings of existing data-driven methods, such as strict requirements on test photos, relying on a large number of training samples, and so on, to innovate the methods. The main innovations can be summarized as follows: 1. A multi-photo-image pair based face image synthesis method is proposed. The existing data-driven methods only consider the local search strategy, which makes it impossible to successfully synthesize the non-face factors unique to the test photos. In addition, the local search requires the alignment of the test photos and the training set, which limits the test. In order to solve the above problems, this paper proposes a face image synthesis method based on multi-photo-image pairs. The first step is to use sparse coding algorithm to transform the pixel features of image blocks into sparse representation features to improve the robustness of the algorithm to interference. The second step is to use the value of each sparse coefficient in sparse representation and sparse coefficient coding. The order of the two information sets up a search tree for the training image blocks to improve the search accuracy and speed of the algorithm. The third step uses the prior information of the test photos and combines with the graph model to synthesize the face image by Bayesian inference. Data-driven methods can synthesize non-face factors better and faster, and can be applied to any test photograph. 2. A face image synthesis method based on single photo-image pairs is proposed. In addition, in some extreme cases there is only one photo-portrait pair available. To solve the above problem, a face portrait synthesis method based on single photo-portrait pair is proposed. The first step is to build a Gaussian pyramid for the single photo-portrait pair in the training set, which not only increases the training sample but also considers the human being. In the second step, the sparse greedy search algorithm is used to obtain the initial portrait of the test photos, which fully maintains the advantages of the multi-photo-image pair based face portrait synthesis method. In the third step, the new training set composed of the test photos, the initial portrait and the existing single-photo-image pairs is utilized, and the combination level is adopted. Experiments show that the proposed method can achieve comparable results with the latest data-driven methods, and can also synthesize non-face factors without restricting the requirements of the test photos. 3. A face image synthesis method based on single-object portrait is proposed. In order to solve the above problem, a method of face portrait synthesis based on single-object portrait is proposed. First, the initial portrait of the test picture is synthesized by sparse greedy search algorithm; second, the multi-scale feature is used to find the condition. In the third step, the candidate blocks are selected by the Multi-feature-based optimization model, and in the fourth step, the quality of the initial portrait is improved by cascade regression strategy. In the case of any given test image, the proposed method can synthesize good quality corresponding style portraits, which makes the algorithm more conducive to digital entertainment. 4. A unified framework based face portrait synthesis method is proposed. In addition, most of the existing methods utilize the linear combination of multiple candidate blocks in the final portrait synthesis, resulting in smoother results. However, the existing high-frequency reconstruction strategies are model-driven and lack of portrait style information. The first step is to divide the training set into the initial training set and the high frequency training set; the second step is to search the candidate image blocks in the initial training set by using local search strategy and global search strategy for a given test photo, making full use of the information of local location information and global face similarity, and then using the graph model to advance. In the third step, we use the same strategy as the original portrait synthesis to synthesize the high-frequency portrait of the human face for a given test picture. The final portrait is obtained by adding the initial portrait and the high-frequency portrait. In summary, this paper proposes four face image synthesis methods based on sparse greedy search to improve the practicability of face image synthesis. Theoretical analysis and experimental results show that the proposed method is superior to the existing methods.
【学位授予单位】：西安电子科技大学
【学位级别】：博士
【学位授予年份】：2016
【分类号】：TP391.41

【相似文献】