注意力引导的高效视频解码及显示研究

发布时间：2019-02-15 01:19

【摘要】：对于视频通信而言,其信宿,也就是最终接受者,是一个人类观测者。对于这一点,很多传统的视频编解码方式均没有考虑到。本文围绕这个问题,考虑了人类视觉系统在视线与视野、强度与对比上面的特点,结合人类视觉系统的视觉选择性特性,设计了一个注意力引导的高效视频解码及显示系统,主要工作如下:(1)首先对人类视觉系统的特性进行系统的研究,从人眼的生理构造入手,研究了人类视觉系统的功能,然后研究了人类视觉系统在视线与视野、强度与对比上面的特点。基于以上的基础研究,设计了一个基于注意力引导的高效视频解码及显示系统的结构。(2)在实现所设计的高效视频解码及显示系统中,对其中两个关键技术要点做了重点的研究,第一个是多个数据流的帧融合,通过二值化、腐蚀、高斯模糊等操作后结合一个加权公式得到融合后图像,这种方法可以消除对多数据流的图像进行简单拼接留下的边缘效应。第二个是基于视频序列在时间维度上的冗余进行背景区域高分辨率重建。利用上一帧中部分信息对当前帧中的部分背景区域进行重建,从而提高该区域的清晰度。通常方法仅考虑背景区域与高分辨区域之间的相似性,重建效果较差。本文利用最优化理论中的惩罚函数法对遗传算法中的目标函数进行了优化,结合了两帧高分辨区域之间的联系,使得重建的结果与周围图像有更好的匹配效果。(3)为量化对比本文方法与传统方法在相同码率条件下得到的视频质量之间的差异,研究了两种视频客观质量评价算法:峰值信噪比PSNR和结构相似性SSIM,其中重点研究了现在被广泛使用的SSIM算法。研究过程中发现SSIM算法在模糊失真下表现不佳,给出的评价值与人眼观感相悖。本文结合图像的边缘信息和感兴趣区域信息,提出了一个结合了人眼注意力特性的改进SSIM算法。实验结果显示经非线性拟合后对测试序列的主观评价值进行估计,相比传统SSIM算法相对误差下降了约50%。这表明本文改进后的SSIM算法更接近人眼的主观感受。(4)使用C++语言和ffmpeg等开源库,编写了一个注意力引导的高效视频解码及显示程序,实现了所设计的视频解码及显示系统,实验表明本系统实现了设计目标。使用本文改进的SSIM算法,进行了本文方法与传统H.264方法的对比评估,结果表明在码率有限时本文方法得到的视频客观质量相比H.264提升了约15%,主观质量提升了约30%。
[Abstract]:For video communication, its destination, which is the ultimate receiver, is a human observer. For this point, many traditional video coding and decoding methods are not taken into account. This paper focuses on this problem, considering the features of human visual system in sight and visual field, intensity and contrast, and combining the visual selectivity of human visual system, designs an attention-guided video decoding and display system. The main works are as follows: (1) firstly, the characteristics of human visual system are systematically studied, and the function of human visual system is studied from the physiological structure of human eye, and then the human visual system in sight and field of vision is studied. Strength and contrast above the characteristics. Based on the above basic research, the structure of an efficient video decoding and display system based on attention-guided is designed. (2) in the implementation of the designed high-efficiency video decoding and display system, The first is the frame fusion of multiple data streams. The fusion image is obtained by binarization, corrosion, Gao Si fuzziness and a weighted formula. This method can eliminate the edge effect caused by simple stitching of multi-stream images. The second is high resolution reconstruction of background region based on the redundancy of video sequence in time dimension. Part of the background region in the current frame is reconstructed by using partial information in the previous frame to improve the clarity of the region. Usually, only the similarity between the background region and the high resolution region is considered, and the reconstruction effect is poor. In this paper, the objective function in genetic algorithm is optimized by using the penalty function method in optimization theory, and the relation between two frames of high resolution region is combined. The result of reconstruction has better matching effect with the surrounding image. (3) in order to quantify the difference between the video quality obtained by this method and the traditional method at the same bit rate, In this paper, two kinds of video objective quality evaluation algorithms: peak signal-to-noise ratio (PSNR) PSNR and structural similarity SSIM, are studied. The emphasis is on the widely used SSIM algorithm. In the course of the study, it is found that the SSIM algorithm performs poorly under fuzzy distortion, and the evaluation value given is contrary to the human visual perception. In this paper, an improved SSIM algorithm is proposed, which combines the edge information and the region of interest information of the image. The experimental results show that the subjective evaluation value of the test sequence is estimated by nonlinear fitting, and the relative error of the traditional SSIM algorithm is reduced by about 50%. This shows that the improved SSIM algorithm is closer to the subjective perception of the human eye. (4) using C language and ffmpeg open source libraries, a high-efficiency video decoding and display program with attention-guided is developed. The design of video decoding and display system is realized, and the experiment shows that the system achieves the design goal. Using the improved SSIM algorithm in this paper, the comparison between the proposed method and the traditional H.264 method is carried out. The results show that the objective quality of the video obtained by this method is about 15% higher than that of H. 264 when the code rate is limited. Subjective quality has increased by about 30.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN919.8

【参考文献】