能量受限条件下的手语视频编码方法研究

发布时间：2018-10-15 17:57

【摘要】：手语是由手形、手臂运动并辅之以表情、唇动以及其他体势表达思想的视觉语言,是聋哑人进行交流的最自然方式。与头肩视频不同,手语视频由于增加了手形、手臂运动,并且存在手脸遮挡现象,所以更为复杂,对其进行研究难度更大。和手语视频识别与合成研究相比,目前针对手语视频的编码研究还较少,且大多数都是基于率失真(Rate-Distortion, R-D)理论,以给定编码码率为约束,研究编码码率和失真之间的关系,使重建手语视频的失真最小。但是,随着无线网络带宽的快速增加和新一代视频编码标准H.264的广泛应用,编码码率的约束性已经越来越弱,而无线视频终端在功耗上所受的制约却越来越强。因此,如何在无线视频终端能量有限的约束条件下,使手语视频经编码后的失真最小,减小能耗、延长电池的更新周期已成为一个迫切需要解决的问题。本论文对能量受限条件下的手语视频编码进行了深入的研究,目的是利用聋哑人视觉选择注意机制、功率率失真理论和感兴趣区能量分配视频编码方法实现手语视频编码功耗、编码码率和编码失真之间的动态平衡优化,在确保手语视频主客观编码质量的同时,尽可能降低无线视频终端总体功耗,延长电池更新周期,为解决能量受限条件下聋哑人手语视频编码的最优化参数配置和资源分配提供新理论和新方法。本论文的研究工作主要包括： (1)理论分析和实验统计了影响H.264手语视频编码复杂度的因素,将H.264手语视频编码器参数按照复杂度分为四种不同的级别,每种级别具有不同的编码复杂度和编码质量,然后依据无线视频终端电池能量和视频运动复杂性自适应地选择编码级别。实验结果表明该方法在保证手语视频编码质量基本不变的同时,能够减少编码器计算复杂度,节省无线视频终端系统的计算资源。 (2)综合考虑无线视频终端电池能量的时变性和聋哑人视觉注意机制的不平衡性,建立了感兴趣区能量感知手语视频编码方法,该方法在帧层依据无线视频终端当前可使用电池能量和视频帧复杂度确定参考帧数和搜素范围,在宏块层依据手语视频不同宏块区域的视觉重要性确定宏块预测模式和量化系数,最后根据帧层和宏块层共同确定的参数进行编码。实验结果表明该方法在保证手语视频感兴趣区编码质量的同时,能够进一步减少编码器计算复杂度,节省无线视频终端系统的计算资源。 (3)详细分析了H.264帧内、帧间和跳帧三种编码模式的功率率失真(Power-Rate-Distortion,P-R-D)特性,在此基础上,分别建立了编码一帧手语视频的能耗模型和P-R-D模型,并提出了优化一帧视频中采用帧内、帧间和跳帧编码模式宏块个数的算法,实验表明所提出的P-R-D模型和实测P-R-D性能相吻合。 (4)针对手脸遮挡条件下的手语视频手势检测问题,提出一种基于力场(Force Field)转换的手势检测方法。该方法首先分别计算手脸遮挡帧和纯脸部帧的力场图像,然后将力场图像分块并统计各分块直方图特征,再将相同空间位置的分块直方图对应相减,得到各分块直方图灰度分量差,最后将各分块直方图灰度分量差与灰度阈值进行比较获得手部位置。实验证明该方法能够实时进行手脸遮挡条件下的手势检测。
[Abstract]:Sign language is the most natural way for the deaf and mute to communicate with the visual language of expression, lip movement and other body potential expression. Different from head-shoulder video, sign language video is more complicated and more difficult to study because of the increase of hand shape and arm movement. Compared with the research of sign language video recognition and synthesis, the current coding research for sign language video is less, and most of them are rate-distortion (R-D) theory, and the relationship between coding rate and distortion is studied based on rate-distortion (R-D) theory, so that the distortion of reconstructed sign language video is minimized. However, with the rapid increase of wireless network bandwidth and the wide application of new generation video coding standard H.264, the restriction of coding rate has become weaker and stronger, while the limitation of wireless video terminal in power consumption is becoming stronger and stronger. Therefore, how to minimize the distortion of sign language video, reduce energy consumption and prolong battery renewal cycle has become an urgent problem under the condition of limited energy of wireless video terminal. This paper makes an in-depth study of sign language video coding under energy-limited conditions with the aim of realizing sign language video coding by using the visual selection attention mechanism of the deaf-mute, the power rate distortion theory and the energy distribution video coding method of the region of interest. the dynamic balance optimization between power consumption, coding code rate and coding distortion can reduce the overall power consumption of the wireless video terminal as much as possible while ensuring the subjective and objective coding quality of the sign language video, New theory and new method for optimizing parameter configuration and resource allocation for deaf-mute sign language video coding under energy-limited condition Methods: The research work of this thesis mainly comprises the following steps: (1) theoretical analysis and experiment statistics influence factors influencing the video coding complexity of H.264 sign language, and divides the parameters of the H.264 sign language video coder into four different levels according to the complexity, and then adaptively selects according to the energy of the battery and the complexity of the video motion of the wireless video terminal. The experiment results show that the method can reduce the computational complexity of the encoder and save the wireless video terminal system while ensuring the quality of the sign language video coding is basically unchanged. (2) the energy perception of the region of interest is established by comprehensively considering the imbalance of the energy of the wireless video terminal battery and the visual attention mechanism of the deaf-mute; the method comprises the following steps of: determining the reference frame number and the search element range according to the current available battery energy and the video frame complexity of the wireless video terminal according to the current available battery energy and the video frame complexity of the wireless video terminal; determining the macro block according to the visual importance of different macro block areas of the sign language video at the macro block layer; the measurement mode and the quantization coefficient are finally determined according to the frame layer and the macro block layer; The experimental results show that the method can reduce the computational complexity of the encoder and save the wireless video at the same time of guaranteeing the coding quality of the sign language video ROI. Power-Rate-Distance (P-R-D) characteristics of three coding modes of H. 264 frame, inter-frame and inter-frame coding modes are analyzed in detail. On this basis, the energy consumption model and P-R-D model of coded frame sign language video are respectively set up. An algorithm is used to optimize the number of macro blocks in frame, inter-frame and skip coding mode in one frame of video. The experiment results show that the proposed P-R-D model and reality The performance of P-R-D is matched. (4) The force field (Force F) is proposed for sign language video gesture detection under the shielding condition of hand face. The method comprises the following steps of: respectively calculating a force field image of a hand face shielding frame and a pure face frame, in that method, the gray component difference of each block histogram is obtain, and finally, the gray component difference of each block histogram is equal to that of each block histogram, The gray threshold is compared to obtain the hand position. The experiment proves that the method can be used in real time
【学位授予单位】：兰州理工大学
【学位级别】：博士
【学位授予年份】：2014
【分类号】：TN919.81

【参考文献】