基于机器学习的H.265视频转码研究

发布时间：2019-01-02 07:34

【摘要】：随着计算机硬件能力的不断提升,互联网与多媒体技术的持续发展,视频图像传输和储存需求已经成为今天网络带宽和硬件储存最大的挑战。视频编码标准不断发展、推陈出新,使得4K、VR、3D、HDR等许多新型应用正在逐步普及。与此同时,由于产品推广、商业竞争、专利保护等诸多因素角力,使得在同一个时期并存多种视频编码标准成为不可避免的现实状况。因此,研发快速转码工具与技术具有强烈的现实急迫性。基于上述背景,论文重点研究面向HEVC/H.265标准的同构和异构视频转码技术。具体而言,研究工作由三个部分组成。(1)转码框架的分类与研究。分析三类转码框架,选择全解部分编作为实施对象,通过自适应构建四叉编码树实现转码加速。具体地,提出了两种编码树构建方法,自底向上法和中间两端法。自底向上构建法从底层的最小编码单元逐层归约到最顶层,计算获得一棵完整的四叉编码树。中间两端法则是根据中间层特征向上下两端进行预测,旨在获得中间层节点的父节点和子节点的分布,实现编码树的构建。(2)基于比特分布映射的视频转码。码流中编码比特数代表了信息熵的代价,它可以直接反映出序列本身的物理属性。编码单元使用的编码比特数越多,往往表示该单元所在位置的图像内容丰富、纹理复杂、变化剧烈等特征。通过比特数的分布,判断在每个编码单元的划分情况,进一步根据映射模型构建获得编码树结构,从而实现快速视频转码。(3)基于机器学习的视频转码。码流被分解并抽取出编码单元比特数、运动矢量、预测模式等信息。编码单元本身内容的特征如方差,梯度,模糊度等信息被直接计算得到。引入机器学习方法,将各种信息抽象化表达作为特征值来进行分块预测模型训练。使用自底向上和中间两端编码树构建方法,构建出完整的四叉编码树单元,实现基于机器学习的快速视频转码。上述三个方面的研究,实现了基于比特分布和基于机器学习的两种快速视频转码算法。大量的视频图像编解码测试表明,在主客观质量保持基本一致的情况下,论文所提出的两种算法比目前业内采用的全解全编转码方法时间节省一半以上,即达到了同构、异构视频转码两倍加速比。论文工作将可能为大型视频网站的实时转码应用提供一些有益的技术参考。
[Abstract]:With the continuous improvement of computer hardware and the continuous development of Internet and multimedia technology, the demand of video and image transmission and storage has become the biggest challenge of network bandwidth and hardware storage. With the continuous development of video coding standards, many new applications such as 4K / VRV / 3D HDR are becoming more and more popular. At the same time, due to product promotion, commercial competition, patent protection and many other factors, it is inevitable to co-exist a variety of video coding standards in the same period. Therefore, the development of fast transcoding tools and technology has a strong urgency. Based on the above background, this paper focuses on isomorphism and heterogeneous video transcoding technology for HEVC/H.265 standards. Specifically, the research consists of three parts. (1) Classification and research of transcoding framework. Three kinds of transcoding frames are analyzed, and the fully decomposed partial coding is selected as the implementation object, and the transcoding acceleration is realized by adaptive construction of quaternary coding tree. Specifically, two coding tree construction methods, bottom-up method and middle end method, are proposed. The bottom-up method is reduced from the lowest coding unit to the top layer, and a complete quadrilateral coding tree is obtained. The rule of middle two ends is to predict the upper and lower ends according to the characteristics of the middle layer, in order to obtain the distribution of the parent node and the child node of the middle layer node, and to construct the coding tree. (2) Video transcoding based on bit-distributed mapping. The number of bits in the code stream represents the cost of information entropy, and it can directly reflect the physical properties of the sequence itself. The more coding bits the coding unit uses, the more the image content of the location of the unit is, the complexity of the texture and the drastic change of the image content. The partition of each coding unit is judged by the distribution of bits, and the coding tree structure is constructed according to the mapping model. (3) the video transcoding based on machine learning is realized. The bitstream is decomposed and extracted the bits of coding unit, motion vector, prediction mode and so on. The features of the coding unit such as variance, gradient, ambiguity and so on are directly calculated. The machine learning method is introduced to train the block prediction model by using various information abstractions as eigenvalues. A complete quad-coding tree unit is constructed by using bottom-up and middle end coding trees to realize fast video transcoding based on machine learning. Two fast video transcoding algorithms based on bit distribution and machine learning are implemented. A large number of video coding and decoding tests show that under the condition that the subjective and objective quality is basically the same, the two algorithms proposed in this paper save more than half of the time of the current full-resolution full-coding method, that is, the isomorphism is achieved. Heterogeneous video transcoding twice the speedup. The work of this paper may provide some useful technical references for the real-time transcoding applications of large video websites.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN919.8;TP181

【参考文献】