基于语音驱动的人脸口型动画系统

发布时间：2018-05-18 03:11

本文选题：三维人脸建模 + 语音识别　；参考：《吉林大学》2012年硕士论文

【摘要】：近年来，随着信息产业的重要性日益突出，，计算机技术得到了突飞猛进的发展，带动了计算机软硬件的升级，计算机动画产业也随之逐渐兴起。在当今动漫产业发展的黄金时期，计算机图形学与数字媒体技术得到了广泛地应用和发展。本文意在实现利用语音对三维人脸模型进行驱动以产生动画效果，从此角度出发，逐步引出实现人脸动画的方法，人脸的建模方法，人脸关键点的选取与控制，语音特征参数的提取，MPEG-4标准下人脸动画系统的实现方法以及人脸表情的实现方法等技术。本文的最终目标是生成平滑流畅的基于MPEG-4标准的由语音进行驱动的人脸口型动画。基于此，首先，需要利用三维建模软件制作出人脸的三维模型，然后导出为.X模型文件，再利用OpenGL技术将模型导入三维环境的窗口中并显示出来，通过纹理映射技术将人脸纹理贴图映射到三维人脸网格模型上，可以得到较真实的三维人脸模型。然后，利用Baum-Welch算法训练样本以形成语音特征参数与人脸动画参数之间的映射关系，这是人脸动画系统实现过程中的关键一步，其将为下面将要展开的工作奠定基础。接下来，对输入语音文件进行处理，提取出其语音特征参数，并将其与此前建立起来的语音特征参数与人脸动画参数映射库进行比对，从库中提取出与语音特征参数呈映射关系的人脸动画参数信息以便利用它来对人脸网格模型进行驱动。最后，利用MPEG-4标准中提供的算法，通过查找人脸定义表FDT中的相关信息计算出人脸模型上各控制点的新位置坐标，从而可以使得人脸模型产生动作，进而得到语音与人脸口型相同步的动画效果。在此基础上，本文又进一步阐述了人脸表情的实现和应用。
[Abstract]:In recent years, with the importance of the information industry becoming more and more prominent, computer technology has been developed by leaps and bounds, which has led to the upgrading of computer hardware and software, and the computer animation industry has gradually risen. In the golden age of animation industry, computer graphics and digital media technology have been widely used and developed. The purpose of this paper is to use speech to drive 3D face model to produce animation effect. From this point of view, the method of realizing face animation, the method of human face modeling, the selection and control of key points of human face are introduced step by step. The extraction of speech feature parameters and the implementation of face animation system based on MPEG-4 standard, and the realization method of facial expression, etc. The final goal of this paper is to generate smooth and smooth facial mouth animation driven by speech based on MPEG-4 standard. Based on this, we need to use 3D modeling software to make 3D model of human face, then export it as. X model file, then use OpenGL technology to import the model into the window of 3D environment and display it. Based on the texture mapping technique, a more realistic 3D face model can be obtained by mapping the texture map to the 3D face mesh model. Then, the Baum-Welch algorithm is used to train the samples to form the mapping relationship between the speech feature parameters and the face animation parameters, which is a key step in the implementation of the face animation system, which will lay the foundation for the work to be carried out below. Then, the input speech file is processed, its speech feature parameters are extracted, and compared with the previously established mapping library of speech feature parameters and facial animation parameters. The facial animation parameters which are mapped to the speech feature parameters are extracted from the database to drive the face mesh model. Finally, using the algorithm provided in the MPEG-4 standard, the new position coordinates of each control point on the face model can be calculated by looking up the relevant information in the face definition table FDT, which can make the face model produce the action. Then the animation effect of speech synchronizing with facial mouth is obtained. On this basis, the realization and application of facial expression are further discussed.
【学位授予单位】：吉林大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP391.41

【参考文献】