基于Kinect辅助的机器人带噪语音识别

发布时间：2018-02-10 00:29

本文关键词： 仿人机器人自身噪声自动语音识别 Kinect 多模态系统　出处：《清华大学学报(自然科学版)》2017年09期 　论文类型：期刊论文

【摘要】：音视频信息融合可以提升机器人在噪声环境下的语音识别性能。然而受说话者的头部旋转、唇部尺寸不一、距摄像头距离不固定以及光照等因素影响,唇部信息不能得到有效的全面表征。该文提出融合机器人与Kinect的多模态系统。该系统采用Kinect获取3-D数据和视觉信息,并使用3-D数据重构侧唇来补充音视频信息。一系列基于特征融合和决策融合方法的结果表明:该文提出的多模态系统优于基于音视频单流和双流的语音识别系统,能够辅助机器人在自身噪声环境下的语音识别。
[Abstract]:Audio and video information fusion can improve the performance of robot speech recognition in noisy environment. However, it is affected by the speaker's head rotation, lip size, distance from camera and illumination, etc. This paper presents a multimodal system for fusion of robot and Kinect, which uses Kinect to obtain 3-D data and visual information. A series of methods based on feature fusion and decision fusion show that the proposed multi-modal system is superior to the speech recognition system based on audio and video single stream and double stream. It can assist the robot in speech recognition under its own noise environment.
【作者单位】：天津大学计算机科学与技术学院;天津大学软件学院;
【基金】：国家自然科学基金资助项目(61471259,61233009) 天津市自然科学基金资助项目(16JCZDJC35400)
【分类号】：TN912.34;TP242

【相似文献】