立体电视中多视点视频增强和视线跟踪方法研究

发布时间：2018-02-07 16:52

本文关键词： 多视点视频增强立体电视帧率提升虚拟视点合成视线跟踪　出处：《山东大学》2014年博士论文　论文类型：学位论文

【摘要】：数字多媒体是当前最活跃的研究领域之一。随着人们对服务质量和视觉体验要求的不断提高,数字多媒体处理技术和设备都在不停的更新换代。作为一种新兴的可视媒介形式,立体电视(Three Dimensional Television,3DTV)能够为用户提供很强的立体感和沉浸感,因而引起广泛关注并取得一系列成果。与传统二维(2D)电视相比,立体电视需要向用户提供多个视点的视频,所以传输数据量巨大,成为制约立体电视技术发展的瓶颈。在立体电视系统中,多视点视频(Multi-view Video, MVV)的空间分辨率、帧率(frame rate)、视点的数量等对视觉效果有着很大的影响。多视点视频帧率越高,特别对于大尺寸液晶显示设备(Liquid Crystal Display, LCD),电视画面就会越流畅；视点数量越多,用户的观看范围越大,立体感也就越强。然而,随着传输视频帧率的提高、视点数量的增多,传输的数据量必然急剧增加。虽然当前提出了有效的多视点视频编码技术,但立体电视传输中多视点视频的帧率、视点数量仍然不能满足实际要求。另外,在交互式立体电视系统中,通常需要对用户视线或者头部进行跟踪,依此确定视点的播放与切换。针对上述立体电视系统中存在的问题,本文着重对立体电视系统中多视点视频增强和视线跟踪方法进行了研究。其中,多视点视频增强包含立体视频帧率提升和视点合成两个方面。从发送端来看,利用帧率提升和视点合成可以减少数据传输量,节省带宽；从接收端来进行分析,帧率提升使单路视频看起来更流畅,减少了由于视频帧率过低造成的运动模糊和抖动,而通过视点合成增加了虚拟相机的数量,扩大了可视范围,使用户能够获得更好的立体视觉体验。另外,本论文对交互式立体电视中的人机交互技术进行了研究,提出了一种基于灰度分布视频处理的非接触式视线跟踪系统。论文在以下几个方面的研究取得重要进展： 1、提出了一种彩色+深度格式(video plus depth)的立体视频帧率提升算法。首先,根据深度信息将运动向量(Motion Vector, MV)分成深度连续运动向量和深度不连续运动向量。在深度连续运动向量场(Motion Vector Field, MVF)中,利用深度层次约束的运动向量优化方法对错误运动向量进行检测和校正,提高运动向量的准确度；对于深度不连续运动向量场,则采用了一种基于前景匹配的运动向量优化方法,保持了运动补偿过程中前景运动物体边缘的完整性。 2、提出了一种基于深度的适应性运动补偿和图像块分割方案。根据视频场景中深度信息和运动向量的关系,适应性选择前向运动补偿与后向运动补偿,同时,利用基于深度和α-matting的图像分割方法对深度不连续图像块进行分割,相对于传统帧率提升算法,减少了运动补偿后遮挡区域和非遮挡区域出现的模糊和伪影的现象,可以有效提升立体视频的视觉质量。 3、提出了一种基于非对称图像修复的虚拟视点重建算法。在提出的算法中利用左右两路的彩色视频和深度视频对中间的任意虚拟视点的视频进行生成,利用虚拟视点与左右参考视点之间的空间位置关系确定主要参考视点和辅助参考视点。首先,利用三维图像变换技术将主要参考视点和辅助参考视点图像投影到虚拟视点。其次,通过图像处理技术去除虚拟视点图像中的裂纹和错误点,提高图像质量。然后,通过辅助虚拟视点图像对主要虚拟视点图像中的遮挡区域进行填充,.并且为了实现视点间视频色彩的统一,对左右参考视点图像进行亮度调整。最后,利用深度辅助的非对称图像修复方法对剩余的空洞区域进行填充。 4、提出了一种基于灰度分布视频处理的非接触式视线跟踪系统,该系统可以作为交互式立体电视系统中的人机交互设备。首先,在近红外光源照射条件下采集使用者头动视频,然后根据视频帧的灰度分布特征,依次进行面部区域、眼睛区域、瞳孔区域的检测与提取,最终计算出瞳孔角膜点反射坐标和瞳孔中心坐标；根据特征参数,采用基于交比不变性质的视线跟踪算法进行注视点位置的计算,实现视线跟踪；另外,针对视线跟踪过程中眼球视轴和光轴的不重合,提出了一种简单有效的五点定标算法。实验结果表明,针对佩戴眼镜和裸眼使用者,该系统均能够达到较高的精度,能够满足实际需要。
[Abstract]:Digital multimedia is one of the most active research areas at present. Along with the people to the quality of service and visual experience requirements continue to increase, the digital multimedia processing technology and equipment upgrading constantly updated. As a new form of visual media, stereo TV (Three Dimensional Television, 3DTV) can provide a strong sense of three-dimensional and immersion for users, which caused widespread concern and made a series of achievements. With the traditional two-dimensional (2D) compared to the TV, stereo TV need to provide multi view video to the user, so the transmission amount of data is huge, become a bottleneck restricting the development of television technology.
In the stereo TV system, multi view video (Multi-view Video, MVV) spatial resolution, frame rate (frame rate), the number of viewpoints have great influence on visual effect. The multi view video frame rate is high, especially for large size liquid crystal display device (Liquid Crystal Display, LCD), the TV screen will be more smooth; view the more number of the user's viewing range is bigger, stereo sense is stronger. However, with the transmission of video frame rate increase, increasing the number of views, the amount of data transmission will increase rapidly. Although the proposed multiview video encoding technology, but the three-dimensional television transmission in multi view video frame rate the number of views, still can not meet the actual requirements. In addition, in the interactive television system, usually need to track the user's line of sight or head, thus confirming the playback and switching viewpoint.
In order to solve the existing problems in stereo TV system, this paper focuses on the stereo television system in multi view video enhancement and eye tracking method is studied. The multi view video enhancement synthesis contains two aspects of stereo video frame rate upgrade and viewpoint. From the sender, the frame rate up conversion and synthesis can reduce the amount of data transmission. Save bandwidth; carry on the analysis from the receiving end, the single video frame rate upgrade looks more smooth, reduce the video frame rate is too low due to motion blur and jitter, and through the virtual viewpoint synthesis increased the number of camera, expand the visual range, the user can get a better stereo vision experience. In addition, this paper the research on human-computer interaction technology for interactive stereoscopic television, this paper proposes a non-contact vision gray distribution tracking system based on video processing.
The paper has made important progress in the following aspects:
1, we propose a color plus depth format (video plus depth) stereo video frame rate upgrade algorithm. Firstly, according to the depth information of the motion vector (Motion Vector MV) is divided into continuous motion vector depth and depth discontinuity motion vectors. In continuous motion vector field (Motion Vector depth Field, MVF), using the the detection and correction of error motion vector optimization method of depth constraints, improve the accuracy of the motion vector; the depth discontinuity of motion vector field, using a motion vector optimization method, based on the prospect of the integrity of the moving foreground, motion compensation process in the edge of the object.
2, the paper proposes a segmentation scheme based on adaptive motion compensation and image block depth. According to the relationship between depth of the video scene and motion vector information, adaptive selection in forward motion compensation and backward motion compensation, at the same time, the use of image depth and alpha -matting segmentation method for discontinuous depth segmentation based on image block. Compared with the traditional frame rate upgrade algorithm, reduce the motion compensated occlusion and non occluded regions appear fuzzy and artifact phenomenon, can effectively enhance the stereoscopic video quality.
3, put forward a virtual view reconstruction algorithm based on non symmetric image restoration. In the proposed algorithm uses two color video and depth video arbitrary virtual view of the intermediate video generation, to determine the main reference view and auxiliary reference view between the virtual viewpoint and reference view about space position relations. First of all, the main reference view and auxiliary reference view image is projected to the virtual viewpoint using 3D image transform. Secondly, by removing the virtual view image of the crack and the error point of image processing technology, improve the quality of the image. Then, through the virtual view image of the occluded areas mainly in the virtual view image to fill, and in order to. To achieve unity between viewpoints about color video, reference view image brightness adjustment. Finally, the non symmetric image using depth assisted repair The residual cavity area was filled with the compound method.
4, we proposed a tracking system for non contact line of gray distribution based on video processing, the system can be used as a human-computer interaction device interactive stereoscopic television systems. First, collect the user head motion video in the near infrared light irradiation, and then according to the gray distribution features of video frames, followed by facial region, eye region detection and extraction, the pupil area, finally calculate the corneal reflection and pupil pupil coordinate center coordinates; according to the characteristic parameters, the invariance of cross ratio gaze tracking algorithm based on position calculation point of gaze, realize the eye tracking; in addition, the eye gaze tracking process of optical axis and visual axis do not coincide, put forward a five simple and effective calibration algorithm. The experimental results show that the wear glasses and naked eye users, the system can achieve high precision, can meet the The actual need.

【学位授予单位】：山东大学
【学位级别】：博士
【学位授予年份】：2014
【分类号】：TN949.13

【参考文献】