语音驱动三维唇形动画算法研究
发布时间:2018-04-05 22:15
本文选题:语音驱动 切入点:三维动画 出处:《北京理工大学》2016年硕士论文
【摘要】:语音驱动三维唇形动画算法属于语音信号处理与三维动画技术交叉范畴,可应用于各种需要语音与唇形同步的三维动画领域,如三维动画电影或视频、3D游戏、虚拟主播、教学视频等。目前国内外关于语音驱动唇形动画的研究较少,进行唇形动画制作时多以人工制作为主,费时费力,因此研究语音驱动三维唇形动画算法具有一定的社会意义与应用价值。在语音驱动三维唇形动画算法中,语音到唇形的映射直接影响到唇形动画的真实感。在现有的语音驱动唇形动画算法中,主要存在以下难点和问题:(1)不同语言间音素的发音规律有所不同,难以与唇形形成统一的映射关系;(2)使用BP神经网络进行语音特征参数到唇形的映射,通常速度和精度高度受限于训练样本数量和网络结构;(3)三维人脸模型的格式多种多样,没有统一的唇形动画标准,通用性存在不足。本文针对上述问题,在现有的语音驱动唇形动画算法基础上,做了如下改进工作:首先,分析了汉语普通话和英语的发音规律,尝试用国际音标将两种语言的发音规律统一起来,并以此为依据录制了训练语音库。其次,尝试适用高斯混合模型算法和基于有向无环图的支持向量机多分类算法(DAG-SVM)代替神经网络进行音素分类,并对DAG-SVM进行了改进。最后,利用DirectX中的三维网格渐变动画技术实现了通用性强且具有真实感的三维人脸唇形动画,并与分类算法相结合,编写了图形界面。实验结果表明本文提出的算法性能较好,能达到预期要求。
[Abstract]:Speech driven 3D lip animation algorithm belongs to the cross category of speech signal processing and 3D animation technology. It can be used in various 3D animation fields, such as 3D animation movies or video games, virtual anchors, etc.Teaching videos, etc.At present, there are few researches on speech driven lip animation at home and abroad. Most of the lip animation is made manually, which is time-consuming and laborious. Therefore, the study of speech driven three-dimensional lip animation algorithm has certain social significance and application value.In the speech driven 3D lip animation algorithm, the mapping of speech to lip shape directly affects the reality of lip animation.In the existing speech driven lip animation algorithms, there are mainly the following difficulties and problems: 1) the phoneme sounds differently among different languages.It is difficult to form a unified mapping relationship with lip shape.) BP neural network is used to map speech feature parameters to lip shape. Usually, the speed and accuracy are highly limited by the number of training samples and network structure.There is no uniform standard for lip animation, and there is a lack of generality.In order to solve the above problems, based on the existing speech driven lip animation algorithms, this paper makes the following improvements: firstly, it analyzes the pronunciation rules of Mandarin and English.This paper attempts to unify the pronunciation rules of the two languages with the International phonetic Alphabet and record the training corpus on the basis of it.Secondly, we try to use Gao Si hybrid model algorithm and support vector machine multi-classification algorithm based on directed acyclic graph (SVM) instead of neural network to classify phoneme, and improve DAG-SVM.Finally, the 3D facial lip animation with strong generality and realistic sense is realized by using the technology of 3D mesh gradual animation in DirectX, and the graphical interface is compiled by combining with the classification algorithm.The experimental results show that the proposed algorithm has good performance and can meet the expected requirements.
【学位授予单位】:北京理工大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.41;TN912.3
,
本文编号:1716712
本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/1716712.html