基于唇读技术的自动语音识别系统设计与实现

发布时间：2018-01-22 16:24

本文关键词： 自动语音识别唇读卷积核滤波器数据库　出处：《电子科技大学》2014年硕士论文　论文类型：学位论文

【摘要】：在自动语音识别系统的领域,大多数的研究都集中在声波信号上。但在现实世界中,由于噪音的存在,这些系统的性能很难达到预期的效果。因此,利用视觉信息在改善语音识别系统的性能方面起到了非常重要的作用,尤其是在噪音环境下。本论文将主要针对利用视觉信息来进行的唇读技术研究。之前的研究表明,唇型的提取主要有两种方法。第一种是基于模型或几何的方法,例如,由于唇动导致唇部位置的偏差,可以从图像中提取唇部的宽度和高度等特征值。第二种是基于像素和动态的方法,通过获得原始像素值和强度值。第一种方法虽然比较直观,但是由于涉及数据的流失,可能会造成大量信息的丢失。第二种方法虽然基本没有信息丢失,但是高维度的图像空间可能会造成计算上的弱势。本论文将采用基于模型的方法进行唇型的识别,测量出的内唇宽度和高度可以代表不同的唇型。由于内唇的区域相比于其它唇部区域较暗,因此可以很容易对唇部特征进行提取并节省计算时间。利用这一优点,可以设计一个空间滤波器来增强内唇区域的对比度。虽然此系统中滤波器的使用方法并非常用的方法,但是其性能的表现还是令人满意的,同时,这种增强技术还可以应用到其它的区域。图像对比度增强之后,可以使用一个高斯滤波器来消除噪音的影响,从而获得一个清晰的内唇轮廓图。另外,可以采用4种不同的卷积核对内唇的宽度和高度进行测量,并用得到的数据建立一个数据库,来告诉系统单字和数据是如何相互对应的。数据库建立完成后,系统就能识别视频文件中的单字和由多字组成的单词。当一个视频文件导入到系统中后,系统会对每个图像进行处理并与数据库中的数据相对比。最终,系统通过计算与数据库中数据的最小偏差来显示识别的结果。虽然该识别技术取得了一些成绩,但还是存在一些潜在的局限性,如对工作环境以及头部位置摆放的要求。
[Abstract]:In the field of automatic speech recognition systems, most of the research is focused on acoustic signals, but in the real world, due to the existence of noise, the performance of these systems is difficult to achieve the desired results. The use of visual information plays a very important role in improving the performance of speech recognition system. Especially in the noise environment. This paper will mainly focus on the use of visual information to carry out lip reading technology. Previous studies show that. There are two main methods to extract lip shape. The first is model-based or geometric method, for example, the lip position deviation due to lip movement. The width and height of lips can be extracted from the image. The second method is based on pixel and dynamic, by obtaining the original pixel value and intensity value. The first method is more intuitive. However, due to the loss of data, a large number of information may be lost. The second method, although there is basically no loss of information. However, high-dimensional image space may cause computational weakness. This paper will adopt model-based approach to lip recognition. The measured width and height of the inner lip can represent different types of lips, because the region of the inner lip is darker than that of the other lip regions. Therefore, it is easy to extract lip features and save computing time. A spatial filter can be designed to enhance the contrast of the inner lip region. Although the use of the filter in this system is not commonly used, the performance of the filter is still satisfactory and at the same time. This enhancement technique can also be applied to other regions. After the image contrast is enhanced, a Gao Si filter can be used to eliminate the noise effect, thus obtaining a clear outline of the inner lip. We can measure the width and height of inner lip by four different convolution check, and set up a database with the obtained data to tell the system how words and data correspond to each other. When a video file is imported into the system, each image is processed and compared with the data in the database. The system displays the recognition result by calculating the minimum deviation between the data in the database. Although the recognition technology has made some achievements, there are still some potential limitations. Such as the working environment and head position requirements.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TN912.34

【相似文献】