语音识别在视频会议中的应用研究及实现

发布时间：2018-03-01 18:09

本文关键词： 视频会议语音识别 Android平台　出处：《华南理工大学》2014年硕士论文　论文类型：学位论文

【摘要】：视频会议作为一种远程实时信息交流与互动的通信方式，已经在医疗、教育、金融、政府等领域获得广泛应用。在传统的视频会议系统中，主要利用手动控制方式对视频会议进行操控，随着科技的进步和用户体验要求的提高，将语音识别技术应用于视频会议系统具有现实意义，语音识别技术是指计算机将人的语音信号，通过识别和理解过程，将其转换为相应的文本或命令，语音识别技术正逐渐成为信息技术中人机接口的关键技术，语音识别技术的应用已经成为一个具有竞争性的新兴高技术产业。本文以视频会议为背景，将语音识别技术应用于视频会议系统中，通过语音识别技术识别出预设的语音命令从而对视频会议进行操作控制，利用语音控制方式取代通过鼠标、键盘或移动智能终端等设备的手动控制方式，使视频会议系统更加人性化和智能化。本文基于CoolView视频会议系统，以其中的Android平台上的遥控器为基础，设计出基于遥控器平台的语音识别系统的整体结构并对其进行功能模块划分，根据视频会议遥控器的使用场景，分别实现了基于Google语音识别技术的在线语音识别系统和基于CMU PocketSphinx语音识别引擎的本地语音识别系统，在线语音识别系统用于会议的选择，而本地语音识别系统用于遥控器对其受控终端的控制，它是一个小词汇量的语音识别系统。此外，，为了降低周围环境噪声的影响，提高语音信号的质量，语音识别系统中设计实现了一个音频处理模块，用于噪声抑制和音频无损压缩处理等。最后，通过测试，实现的语音识别系统能够满足视频会议系统的基本操作需求，验证了语音识别在视频会议系统中应用的可行性，而且本地小词汇量的语音识别系统具有较高的识别率和较短的识别处理时间，极大地提升了系统的用户体验。
[Abstract]:As a remote and real-time information exchange and interactive communication method, videoconferencing has been widely used in medical, education, finance, government and other fields. With the development of science and technology and the improvement of user experience, it is of practical significance to apply speech recognition technology to video conference system. Speech recognition technology means that the computer converts the human speech signal into the corresponding text or command through the recognition and understanding process. Speech recognition technology is gradually becoming the key technology of man-machine interface in information technology. The application of speech recognition technology has become a competitive new high-tech industry. In this paper, based on the background of video conference, the speech recognition technology is applied to the video conference system, the preset voice command is recognized by the speech recognition technology to control the video conference, and the voice control method is used to replace the mouse. The manual control mode of keyboard or mobile intelligent terminal makes the video conference system more humanized and intelligent. Based on the CoolView video conference system, based on the remote control on the Android platform, this paper designs the whole structure of the speech recognition system based on the remote control platform and divides its function modules. According to the usage scene of video conference remote controller, the online speech recognition system based on Google speech recognition technology and the local speech recognition system based on CMU PocketSphinx speech recognition engine are implemented, respectively. The online speech recognition system is used for meeting selection. The local speech recognition system is used for the remote control of its controlled terminal. It is a small vocabulary speech recognition system. In addition, in order to reduce the influence of ambient noise and improve the quality of speech signal, In the speech recognition system, an audio processing module is designed and implemented, which is used for noise suppression and audio lossless compression. Finally, through testing, the realized speech recognition system can meet the basic operational requirements of the video conference system. The feasibility of the application of speech recognition in video conference system is verified, and the local small vocabulary speech recognition system has higher recognition rate and shorter processing time, which greatly improves the user experience of the system.
【学位授予单位】：华南理工大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TN912.34

【参考文献】