基于视觉感知的3D视频编码方法研究

发布时间：2018-11-19 08:10

【摘要】：3D-HEVC能够较好地去除3D视频中的时空域、视点间的冗余,但未能够很好地去除感知冗余,而感知编码的应用能够在保证主观质量不变的前提下,进一步去除感知冗余从而降低编码复杂度或节省编码码率。因此如何构建视觉感知模型以及如何应用到3D视频编码当中,是当前3D视频编码的研究热点。为此,本学位论文基于3D-HEVC编码标准,从视觉感知角度出发,对3D视频的低复杂度编码和率失真优化两个核心技术展开研究。针对深度图编码复杂度较高,本文提出一种基于虚拟视点合成的快速深度图编码方案。面向三维视频系统采用最大可容忍深度失真(Maximum Tolerated Depth Distortions,MTDD)模型,首先根据MTDD值的可容忍性不同,给出应用于不同类型深度范围提前决策算法;然后,检测是否对绘制失真敏感的竖直边缘区域,根据不同的搜索策略进行模式决策;最后,融合这两个算法进一步降低深度视频编码的复杂度。实验结果表明,所提出的算法在保证绘制虚拟视点质量和编码码率基本不变的情况下,降低了49.45%的编码时间。针对立体视频中存在着大量的感知冗余,本文提出了一种基于中心凹的双目恰可察觉编码失真(Foveated Binocular Just-Noticeable Coding Distortion,FBJNCD)模型。首先通过主观实验研究梯度幅值和纹理幅值对立体掩蔽效应的影响;同时考虑到人类视觉特性(Human Visual System,HVS)的视觉敏感度并非恒定不变,当视网膜离心率变大时,像素的视觉阈值也随之变大,因此结合HVS的视网膜中心凹感知特性;最后将FBJNCD并将其应用于多视点高效视频编码(Multi View-High Efficiency Video Coding,MV-HEVC)测试平台中对立体视频进行非对称编码。实验结果表明所提出模型在保持立体视频感知质量的同时,平均能够节省26.04%的编码码率,提高立体视频压缩效率。针对传统的恰可察觉失真(Just-Noticeable Distortion,JND)模型很难应用于立体视频当中且存在高估前景区域的视觉阈值和低估背景区域的视觉阈值的问题,因此本文提出了一种能够应用于立体视频的恰可察觉失真(Stereo Just-Noticeable Distortion,SJND)模型。首先,利用视差信息把传统的JND分为前背景区域,对前景区域赋予较小的阈值,对背景区域赋予较大的阈值。同时考虑到前景区域中人比较关注视觉中心区域和视差较大的区域,因此基于这两个规则提出了一种新的显著图,并给不同显著性区域赋予不同的量化参数(QP)值。实验结果表明所提出方法在保证立体视频质量不变的前提下,平均能够节省19.92%的码率。
[Abstract]:3D-HEVC can remove spatio-temporal domain and redundancy between viewpoints of 3D video, but it can not remove perceptual redundancy. However, the application of perceptual coding can ensure that the subjective quality remains the same. Further remove perceptual redundancy to reduce coding complexity or save coding rate. Therefore, how to construct visual perception model and how to apply it to 3D video coding is a hot topic in 3D video coding. Therefore, based on the 3D-HEVC coding standard, this dissertation focuses on two core technologies of low complexity coding and rate-distortion optimization for 3D video from the perspective of visual perception. In view of the high complexity of depth map coding, this paper proposes a fast depth map coding scheme based on virtual view synthesis. The maximum tolerance depth distortion (Maximum Tolerated Depth Distortions,MTDD) model is adopted for 3D video systems. Firstly, according to the different tolerance of MTDD values, an early decision algorithm for different depth ranges is proposed. Then, the vertical edge region which is sensitive to rendering distortion is detected, and the pattern decision is made according to different search strategies. Finally, the fusion of these two algorithms further reduces the complexity of depth video coding. The experimental results show that the proposed algorithm can reduce the coding time by 49.45% under the condition that the quality of rendering virtual view and the coding rate are not changed. In view of the large amount of perceptual redundancy in stereo video, a binocular exactly detectable coding distortion (Foveated Binocular Just-Noticeable Coding Distortion,FBJNCD) model based on concave is proposed in this paper. Firstly, the influence of gradient amplitude and texture amplitude on stereoscopic masking effect is studied by subjective experiment. At the same time, considering that the visual sensitivity of (Human Visual System,HVS) is not constant, when the retinal eccentricity becomes larger, the visual threshold of pixels also becomes larger, so the visual sensitivity of HVS is combined with the characteristics of retinal fovea perception. Finally, FBJNCD is applied to asymmetric stereo video coding in a multi-view efficient video coding (Multi View-High Efficiency Video Coding,MV-HEVC) test platform. Experimental results show that the proposed model can save an average coding rate of 26.04% and improve the stereo video compression efficiency while maintaining stereo video perception quality. The traditional Just-Noticeable Distortion,JND model is difficult to be used in stereo video and has the problem of overestimating the visual threshold of foreground region and underestimating the visual threshold of background region. Therefore, this paper presents a Stereo Just-Noticeable Distortion,SJND model which can be applied to stereo video. Firstly, the parallax information is used to divide the traditional JND into the pre-background region, which assigns a small threshold to the foreground region and a larger threshold to the background region. Considering that people in foreground region pay more attention to visual center region and parallax region, we propose a new salience map based on these two rules, and assign different quantization parameter (QP) value to different significant region. Experimental results show that the proposed method can save an average rate of 19.92% on the premise that the stereo video quality is invariable.
【学位授予单位】：宁波大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN919.81

【参考文献】