基于视觉特效的高效视频编码技术研究

发布时间：2019-04-30 09:20

【摘要】：随着移动通信技术和无人机技术的迅猛发展,以数字视频为主要内容的多媒体网络技术不再只用于传统的电视系统,特别是随着无人机技术在自然灾害监控、商业演出及军事辅助等领域的作用越来越突出,对数字视频编码技术的要求也越来越高。虽然传统的视频编码技术在消除空间冗余、时间冗余及信息熵冗余等方面已经取得了很大的进步,然而在消除视觉冗余上却没有明显的成果。归根结底,人眼才是视频信号的最终受体,因此本文针对单人观看视频时网络带宽环境恶劣的应用场景,深入研究人眼视觉系统特性并借鉴多描述编码的思想,提出了支持视频信息分为3条视频流同时传输的编码系统。最后通过实验数据得知,传统的基于H.264的编码系统与本文的编码系统在码率相似且不足的情况下,本文的系统有更好的视觉效果;而当它们在视觉效果相似的情况下,本文提出的系统能够节约20%左右的码率。本文研究了多种自下而上的显著图模型,并根据各个显著图模型的优缺点以及本文视频编码系统应用场景的需求选择了基于频率调谐算法的显著图模型。并且在该模型的基础上提出了一种均衡图像亮度与色度作用的改进显著图模型。实验结果表明,改进的模型更能准确和有效地检测出显著图。最后,利用改进的显著图模型实现了图像感兴趣区域的获取。恰可察失真模型使用量化的阈值来表示视觉感知冗余,不高于这个阈值的变化,人的眼睛是没有办法感觉到的。由此可以得知,任何不被注意的信息差异都不用被编码到视频码流中。本文分别研究了该模型的对比度掩蔽效应、背景亮度掩蔽效应和时域掩蔽效应,并在最终的系统中实现了该模型。视觉注意力模型主要是利用了视网膜上的视锥细胞分布高度不均匀的特性。在中心凹处细胞分布密度最大,随着到中央凹距离的加大,细胞分布密度减小非常快。这就导致了人眼系统在视觉中心有最高的空间分辨率而随着图像点到视觉中心距离的增大空间分辨率急速下降。本文在研究了视觉注意力模型的基础上,结合恰可察失真模型实现了基于图像内容的视觉注意力模型,并利用运动矢量实现了视觉关注点的动态转移。
[Abstract]:With the rapid development of mobile communication technology and unmanned aerial vehicle (UAV) technology, digital video as the main content of multimedia network technology is no longer only used in the traditional television system, especially with the UAV technology in natural disaster monitoring. The role of commercial performance and military assistance is becoming more and more prominent, and the requirements of digital video coding technology are becoming higher and higher. Although the traditional video coding technology has made great progress in eliminating spatial redundancy, temporal redundancy and information entropy redundancy, there is no obvious achievement in eliminating visual redundancy. In the final analysis, the human eye is the final recipient of the video signal. Therefore, aiming at the application scene of the bad network bandwidth environment when watching the video by a single person, this paper deeply studies the characteristics of the human visual system and draws lessons from the idea of multi-description coding. A coding system is proposed to support the simultaneous transmission of video information into three video streams. Finally, the experimental data show that the traditional coding system based on H.264 and the coding system in this paper have better visual effect when the bit rate is similar and the coding system in this paper is not enough. When their visual effects are similar, the proposed system can save about 20% bit rate. In this paper, a variety of bottom-up salient graph models are studied, and the salient graph model based on frequency tuning algorithm is selected according to the advantages and disadvantages of each salient graph model and the requirements of the application scenario of video coding system in this paper. On the basis of this model, an improved salient graph model is proposed to balance the effects of luminance and chromaticity on the image. The experimental results show that the improved model can more accurately and effectively detect the salient map. Finally, the region of interest of the image is obtained by using the improved salient graph model. A quantized threshold is used to represent visual perceptual redundancy, which is not higher than the change of the threshold. The eyes can't feel it. As a result, any unnoticed differences in information are not encoded into the video stream. In this paper, the contrast masking effect, the background brightness masking effect and the temporal masking effect of the model are studied respectively, and the model is implemented in the final system. Visual attention model mainly utilizes the highly uneven distribution of pyramidal cells in the retina. The distribution density of cells in the central recess was the largest, and decreased very quickly with the increase of the distance to the central recess. This leads to the highest spatial resolution in the visual center of the human eye system, while the spatial resolution decreases rapidly with the increase of the distance from the image point to the visual center. On the basis of studying the visual attention model, the visual attention model based on image content is realized by combining with the exact distortion model, and the dynamic transfer of visual attention point is realized by using motion vector.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN919.81

【参考文献】