基于背景建模的高性能视频编码方法研究

发布时间：2018-01-09 14:30

本文关键词：基于背景建模的高性能视频编码方法研究　出处：《中国科学技术大学》2017年博士论文　论文类型：学位论文

【摘要】：随着通信技术和多媒体技术的快速发展,视频媒体已经深入到人们工作和生活的各个方面,成为不可替代的第一媒介。而视频的数据量巨大,不经过压缩编码的视频几乎无法在网络中传输,其存储代价更是无法承受。因此,视频编码技术在目前的视频大数据时代显得愈加重要。视频编码技术是安防监控、广播电视等应用的核心技术,视频编码标准为视频编码技术提供了统一的技术规范,使得视频技术得以推广流行。从上个世纪九十年代至今,制定了一系列的视频编码标准,不断推动了视频技术的发展,以满足不断变化的需求。然而,这几年自媒体的爆炸式增长,AR、VR等新媒体的出现,以及公共安全需求下监控视频的更高清化,均急剧加快了视频数据的增长规模,过去几年产生的数据比以前四万年产生的数据还多,即使最新的视频标准H.265/MPEG-H HEVC也已经无法满足现实需求,亟需新的编码技术来进一步提高编码性能。背景参考图像技术是视频编码技术中的新兴技术之一,其基于背景建模理论,通过充分利用静态背景特性消除视频信号的冗余,最大限度提高编码性能。然而,目前的背景图像合成模型多为用于视频分析的模型,此类模型需要大量训练样本,迭代粒度粗放,并不适用于视频编码;面向背景参考图像的码率分配技术多基于经验公式,无法根据内容进行自适应调整;此外,由于无法使用参考图像,帧内编码效率仍比较低,所耗比特数非常高,容易引起传输延时、丢包等现象。为了解决这些问题,本论文重点研究背景建模理论在视频编码中的应用,面向未来(下一代)编码标准技术,在背景参考图像的合成、背景块的帧间码率分配和监控视频的帧内编码方法三个方面开展了研究。论文主要创新点及贡献概括如下:(1)本文提出了一种高效的背景参考图像渐进式合成算法。针对静态摄像头和动态摄像头两种情况分别设计了合成算法。对于静态摄像头视频,首先基于背景图像的时空相关性,检测所有符合条件的候选背景块;再根据各个背景块的时空分布打分,基于分数排序后选取若干背景块进行高质量编码;最后使用重建背景块渐进式更新背景参考图像。对于动态摄像头视频,基于准确的全局运动估计对齐图像,再结合静态背景下的算法检测背景块,在背景参考图像的更新过程中引入光照平滑算法。这两种针对静态和动态摄像头的背景参考图像合成算法均有效提高了视频的编码效率,避免了因额外编码背景参考图像带来的码率陡增现象。本文提出的针对静态背景的背景参考图像合成算法已被最新视频编码国内标准AVS2接收,并被集成到AVS2参考软件中。(2)本文提出了基于稳定性分析的背景参考图像码率分配策略。基于已有的码率分配方法,本文在时域上对背景块的码率进行了二次分配,即在已分配给背景块码率的约束下,研究如何有效分配时域各个背景块间的码率,以实现全局编码性能最优。通过分析视频内容的稳定性,提取各个背景块的运动分布信息,估计当前背景参考图像中图像块被后续参考的概率大小,进而确定当前编码图像中背景块与后续相同位置伪背景块的编码质量关系。基于该关系,获得全局率失真准则下的最优码率分配方案,指导背景块的编码决策。与传统的码率分配方法不同,本文提出的背景块的码率分配策略在进行码率分配时,不仅仅考虑当前编码块的率失真最优,还考虑了当前背景块失真对后续块的影响,实现了全局率失真的最优化。(3)本文提出了基于光照分离和深度学习的监控序列帧内编码方法。一方面,考虑到不同时刻背景部分的反射系数基本不变,仅仅发生光照变化,本文提出了基于光照分离的背景块帧内编码方法。该方法使用不同时刻的背景图像序列进行光照分离,提取背景图像的反射系数图,并将其编码存储,使得后续任何编码图像均可访问。基于高质量反射系数图,背景块均可分离出光照分量。由于光照信号具有更强的空间相关性,更适合于帧内编码,该方法获得了更优的编码性能,并有效降低了帧内编码所需比特数。另一方面,考虑到原有帧内预测方法模式单一,无法根据内容自适应调整插值方式,本文还提出了新的基于深度学习的帧内预测模式。在该模式下,将原有最优预测模式的预测图像块通过周围可用重建像素填补作为输入图像块,使用该图像块通过卷积神经网络获得的输出图像块作为该模式的预测图像。该模式相比原有帧内预测模式,更充分利用了周围已编码信息,且提供了更丰富的插值滤波方式,获得了显著的编码性能提升。
[Abstract]:With the rapid development of communication technology and multimedia technology, video media has gone deep into all aspects of people's work and life, become the first media irreplaceable. The video and the huge amount of data, not compressed video encoding almost impossible in the network transmission, the storage cost is unbearable. Therefore, video encoding the technology becomes more and more important in the video era of big data at present. The video encoding technology is the core technology of security monitoring, radio and TV applications, video encoding standard for video encoding technology provides a unified technical specification, the video technology can be popular. Since the last century in 90s, developed a series of video encoding standards. Continue to promote the development of video technology, to meet the changing needs. However, in recent years, the explosive growth of media, AR, VR and other new media. More HD surveillance video public security demand, are rapidly accelerated video data generated over the past few years, the scale of growth, the data is more than forty thousand years before the data, even if the H.265/MPEG-H is the latest HEVC video standard has been unable to meet the practical needs, the need for new technology to further improve the encoding encoding performance background. The reference image is one of the emerging technology of video encoding technology, its theoretical background modeling based on the elimination of redundant video signal by making full use of the static background characteristics, maximize the encoding performance. However, the background image synthesis model for the multi video analysis model, this model requires a lot of training samples, the iterative particle size is extensive. Not suitable for video encoding; bit allocation technology based on background reference image based on empirical formula, can adaptively adjusted according to the content In addition, due to the use of the whole; reference image frame encoding efficiency is still relatively low, the consumption of the number of bits is very high, easy to cause the transmission delay, packet loss and so on. In order to solve these problems, this paper focuses on the background modeling theory in video encoding, for the future (the next generation) encoding standard technology. In the background of the reference image, the three aspects of background block inter frame bit allocation and video frame encoding method are studied. The main innovations and contributions are summarized as follows: (1) this paper presents an efficient background reference image progressive synthesis algorithms. Based on static and dynamic camera camera two which are designed for static camera video synthesis algorithm. Firstly, based on the temporal correlation of the background image, detecting all eligible candidate background blocks; then according to each piece of the temporal and spatial distribution of back view After sorting, fractional selects some background blocks with high quality based on the encoding; finally use reconstruction background block incremental update background reference image. For dynamic video camera, accurate global motion estimation based on image alignment algorithm, combined with the background of block detection under static background, the introduction of light background reference image smoothing algorithm in the update in the process of the two. For the static and dynamic camera background reference image synthesis algorithm can improve the video encoding efficiency, avoid because of additional background reference image encoding rate increased sharply. As background reference image synthesis algorithm for static background is proposed in this paper has been the latest domestic video encoding standard AVS2 receiver it is integrated into the AVS2 reference software. (2) this paper presents a stability analysis of the background reference image bit allocation strategy based on the existing rate based on the code. With this method, in time to block the background rate in the two distribution, have been assigned to the background block rate constraints on how to effectively allocate the rate between each time background block, to achieve global optimal performance. By encoding the stability analysis of video content, extract the motion information of each block of the background distribution at present, the estimated probability of the size of the reference background image in image blocks are references, and then determine the background of the current image block encoding and subsequent pseudo block encoding the same position background quality relationship. Based on this relation, the optimal rate allocation scheme to obtain the global rate distortion criterion under the guidance background block encoding decisions. Unlike traditional rate allocation the method, background block bit allocation strategy proposed in bit allocation, not only consider the current encoding block rate distortion optimization, considers when the background blocks The distortion effect on the subsequent block, to achieve the global optimization of rate distortion. (3) proposed monitoring frames based on light separation and deep learning within the encoding method. On the one hand, taking into account the reflection coefficient of different time background part basically unchanged, only changes in illumination, this paper proposed the background light frame block according to the separation in encoding method based on background image sequence of the method using different time light extraction separation, reflection coefficient map of the background image, and its encoding storage, making any subsequent encoding image can be accessed. High quality reflection coefficient map based on background blocks can be isolated from the light due to the spatial correlation component. The light signal has a stronger, more suitable for intra frame encoding, the encoding method has better performance, and effectively reduces the number of bits required for intra frame encoding. On the other hand, taking into account the intra prediction Methods a single model, not according to the contents of adaptive interpolation method, this paper also proposes a new deep learning intra prediction mode. Based on this model, the optimal prediction model to predict image block reconstruction through the surrounding pixels as input to fill the available image blocks, using the block image obtained by convolution neural network output image block as the prediction image of the model. This model compared with the original intra prediction mode, make full use of the surrounding encoding information, and provides a way of interpolation filter more abundant, the encoding performance is significantly improved.

【学位授予单位】：中国科学技术大学
【学位级别】：博士
【学位授予年份】：2017
【分类号】：TN919.81

【相似文献】