HEVC编码器中运动估计的VLSI架构设计

发布时间：2019-02-21 19:03

【摘要】：随着视频技术的发展,视频的分辨率变得越来越高。目前,高清、超高清视频已经成为主流,相应每一帧视频信息量也急剧增加,对超高清视频的存储和传输带来极大挑战。视频编码技术可以为视频的压缩和传输提供了很好的解决方案。最新的视频编码标准HEVC/H.265(High Efficiency Video Coding),为高清和超高清视频提供了良好的压缩效率。在相同视频清晰度的情况下,HEVC比上一代视频编码标准H.264减少了近50%的编码比特率。在提高压缩率的同时,HEVC编码复杂度和编码时间也相应的增加,对视频编解码的实时性非常不利。因此,为了实现超高清视频的实时传输,需要设计高吞吐和高性能的HEVC编解码芯片。本文主要围绕HEVC编码器中帧间预测,提出了一种高吞吐的整像素运动估计和分像素运动估计的硬件架构。具体工作如下:(1)运动估计是HEVC帧间预测中最核心的模块,为了提高视频图像的压缩效率,其预测单元(PU)的尺寸和数量都急剧增加,造成运动估计的高复杂度,为高清和超高清视频的实时处理带来巨大挑战。本文针对整像素运动估计,提出了一种适合硬件实现的运动估计算法,并设计了硬件架构。该算法分为粗搜索和细搜索两个阶段,对同一深度的预测单元共享了其粗搜索结果,增大了细搜索阶段PU的并行度。对硬件设计部分,在粗搜索阶段,设计了一种层次复用的参考像素调度策略,并为其组织了流水线结构,保证了参考像素的完全复用和实现了搜索点之间有规律的流水线匹配代价计算;在细搜索阶段,采用光栅扫描式搜索策略,复用了粗搜索时的参考像素寄存器和SAD计算单元,大大减少了硬件资源。在90nm的工艺下,综合结果表明最高频率可以达到377MHz,在搜索范围为±64时,能够达到超高清视频图像3840×2160@60fps的实时处理速度。(2)本文针对运动估计中分像素运动模块进行了硬件设计,对插值计算单元设计了共享半像素和1/4像素滤波器的插值滤波单元,并在不同插值位置间共享插值结果,减少了插值个数。通过分析搜索点的数据的处理顺序,不同搜索阶段,设计了插值和匹配代价计算单元流水线结构,并优化了插值滤波单元电路结构。最后可达到3840×2160@30fps的处理速度。
[Abstract]:With the development of video technology, the resolution of video becomes higher and higher. At present, HD, UHD video has become the mainstream, the corresponding video information per frame has increased dramatically, which brings great challenges to the storage and transmission of UHD video. Video coding technology can provide a good solution for video compression and transmission. The latest video coding standard HEVC/H.265 (High Efficiency Video Coding), provides high-definition and high-definition video compression efficiency. With the same video definition, HEVC reduces the coding bit rate by nearly 50% compared with the previous video coding standard H. 264. At the same time, the complexity and time of HEVC coding are also increased, which is not good for the real-time performance of video coding and decoding. Therefore, in order to realize the real-time transmission of ultra high-definition video, we need to design high throughput and high performance HEVC codec chip. This paper mainly focuses on inter-frame prediction in HEVC encoder, and proposes a hardware architecture of integrated pixel motion estimation and sub-pixel motion estimation with high throughput. The main works are as follows: (1) Motion estimation is the most important module in HEVC inter-frame prediction. In order to improve the compression efficiency of video image, the size and number of (PU) of the prediction unit increase dramatically, resulting in the high complexity of motion estimation. It brings great challenges for real-time processing of HD and UHD video. In this paper, a motion estimation algorithm suitable for hardware implementation is proposed and the hardware architecture is designed for integer pixel motion estimation. The algorithm is divided into coarse search and fine search. The rough search results are shared for the prediction units of the same depth, and the parallelism of PU in the fine search phase is increased. For hardware design part, in rough search phase, a hierarchical multiplexing reference pixel scheduling strategy is designed, and pipeline structure is organized for it. It ensures the complete reuse of reference pixels and realizes the regular pipeline matching cost calculation between search points. In the fine search phase, the raster scan search strategy is used to reuse the reference pixel registers and SAD computing units in rough search, which greatly reduces the hardware resources. Under the 90nm process, the synthetic results show that the maximum frequency can reach 377MHz, and when the search range is 卤64, The real-time processing speed of ultra high definition video image is 3840 脳 2160@60fps. (2) the hardware design of sub-pixel motion module in motion estimation is presented in this paper. The interpolation filter unit with shared half pixel and 1 / 4 pixel filter is designed for the interpolation computing unit, and the interpolation results are shared among different interpolation positions, thus reducing the number of interpolation. By analyzing the data processing order of search points, the pipeline structure of interpolation and matching cost computing unit is designed in different search stages, and the circuit structure of interpolation filter unit is optimized. Finally, the processing speed of 3840 脳 2160@30fps can be achieved.
【学位授予单位】：中国科学技术大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN919.81

【相似文献】