误差恢复视频压缩中的高级可伸缩编码和运动估计

发布时间：2019-06-19 03:29

【摘要】：我们目前正处于一个信息化高度发展的时代,在日常生活中会遇到大量的多媒体内容数据,特别是通过网络进行传输的图片和视频信息。在互联网和无线网络上富媒体的需求止在快速的增长,驱动这些富媒体通信和娱乐服务,不仅需要增强的宽带接入,也需要有力的媒体编码技术,使传输更加有效。一些视频编码标准,例如ISO/IEC MPEG系列和ITU-T视频编码标准,已经开发成功,可以显着地降低数据速率。大部分这些视频压缩方法使用基于块的带有运动补偿的离散余弦变换(DCT:Discrete Cosine Transform)来消除空间和时间冗余。在针对网络传输所设计的视频编码技术中,两个主要问题比较突出：第一个是任何网络系统的性能都希望最佳地输送数据,但并不能保证网络的可靠性。视频数据,相比与其它数据类型,具有更大的数据量,因而网络有限的传输带宽、低的处理器功耗和可用的存储空间可能限制它的传播能力。针对视频应用,高的传输差错带来了附加的成水,例如时延、复杂度和品质。重传是解决网络传输差错一个有效的方式,但它引入了网络附加的负载,可能不适合要求低时延的应用。其主要的目的是保护视频数据,以及在可能的错误中隐藏或恢复视频数据。在大区域网络中异构性是另一个限制视频应用的问题。不同类型的网络有不同的带宽和流量负载。异构视频网络要求提供具有可变品质的视频服务,并且能够自动准确地满足这些需求。视频压缩中最关键的部分是运动估计。运动估计是产生运动矢量的过程。这些矢量决定了从前一帧中生成的用来补偿预测帧的运动参数。它的计算量对算法的实时实现提出了很大的挑战。运动估计算法可以分为时域算法和频域算法。匹配算法和基于梯度的算法是时域算法的重要部分。匹配算法可以分为块匹配算法和特征匹配算法。基于梯度的算法可以分为像素递归和块递归方法。频域算法则应用相位相关、小波域匹配和DCT域匹配的方法。梯度技术通常用于对图像序列的分析。像素递归技术,作为梯度技术的一个子集,应用在图像序列编码中,其中最佳匹配搜索在基于逐像素基础上进行。基于像素的技术要求非常高的计算复杂度,不适合实时应用。频域技术则是依赖与移位图像传输系数之间的关系,没有广泛的应用在图像序列编码中。最终,块匹配技术,其基于最小化特定的代价函数思想,成为编码应用中最广泛使用的方法,它的搜索是在n×n的像素块上进行的。在各种运动估计算法中,块匹配运动估计是最主要的方法。为了最小化块匹配中的搜索时间,一个简单有效的算法是非常关键的。块匹配运动估计(BMME:Block Matching Motion Estimation)是视频编码中最流行和最实际的运动估计方法。H.26X标准系列和MPEG标准系列均使用BMME方法。块匹配是一个相关技术,它寻找当前图像块和参考帧中特定区域的候选图像块间的最佳匹配。块匹配过程至少用到两帧图片,即参考帧和当前帧。当前帧被分解为各个宏模块,运动估计在每个宏模块上单独进行。一个运动估计算法针对当前帧中将要进行编码的宏模块找出在参考帧上最匹配的宏模块。一旦找到最佳匹配的宏模块,最佳匹配的宏模块和当前的宏模块之间的差异或预测误差就被计算,进而进行DCT变换、量化和游程编码。除了编码不同宏模块之间的差异外,两个宏模块之间的相对位移矢量也将被编码。在本论文中,我们首先讨论各种基于块的快速运动估计算法,通过实验在搜索速度和计算复杂性方面对这些算法进行评估。进一步将对性能最好的算法进行仔细的分析。这些算法包括穷举搜索或全搜索(FS:Full Search),三步搜索(TSS:Three Step Search),新三步搜索(NTSS:New Three Step Search),四步搜索法(4SS:Four Step Search),菱形搜索(DS：Diamond Search)和自适应十字模式搜索(ARPS:Adaptive Rood Pattern Search)。其次论文提出了ARPS的新的动态自适应十字搜索算法。它利用了邻块之间的空域相关性,因此我们用ARPS_S来命名,以与ARPS区分。ARPS_S是基于如下的假设：运动矢量的分布不仅与预测的运动矢量高度相关,而且在垂直和水平方向都有高度的相关性,这构成了一个十字阵形。我们所感兴趣的模块周围的模块,其MV的最大值和最小值可以认为是预测MV的估计偏差,这样,他们可以用作臂长的精确估计,从而表示相应方向上的运动动态范围。与ARPS相反,在ARPS_S中四条臂长并不相等。ARPS的初始搜索点数为5,而ARPS_S的初始搜索点数为6。在我们的实验中ARPS_S在搜索速度和视频品质上都比ARPS要优。最后本论文将讨论使用可仲缩编码策略的差错恢复编码技术。可伸缩的视频编解码技术指的是用户把一个视频序列编码为一个若干个比特流,从而支持译码端各种品质级别。本文将介绍和评估两类可伸缩差错恢复编码技术：分层的编解码(LC:Layered Coding)和多描述编解码(MDC:Multiple Descriptions Coding) 压缩视频比特流的特性使得视频差恢复技术具有很大的重要性。例如,在VLC编码视频数据中单一比特的误差可能导致编码器和译码器之间同步的丢失,进一步导致多个视频块的丢失。多个比特误差,其经常发生在突发信道差错或是包丢失情况下,可能导致部分或整个视频帧的丢失,引起时域维度的误差传播。而这个传播是在减少视频时间冗余度时使用运动补偿技术的直接结果。差错恢复和可伸缩性是视频传输过程中极其重要的两个特征。可伸缩的视频编解码技术指的是用户把一个视频序列编码为一个若干个比特流,从而支持译码端各种品质级别。可伸缩性为在某些可接受的信息损失的情况下提供了很好的鲁棒性。同时,它不会给解码带来太大的问题,也不会严重地影响视觉品质。分层的编解码(LC:Layered Coding)和多描述编解码(MDC:Multiple Descriptions Coding)是视频传输中的两种类型的可伸缩性编码技术。鲁棒的视频编解码技术在限制错误传播和提高视觉品质方面起着极为关键的作用。通过同时设计合理的结果和维持在最小复杂度下的可接受冗余,鲁棒的视频编解码技术可以有效的解决错误隐藏问题。分层的编码技术把视频序列分成几层,每层对保真度有不同的重要性。最低层也叫做基层,基层可以被独立地编码。基层以上的层次叫做增强层,他们的译码依赖于基层。基层的视频的品质是最低,随着增强层的增加,视频品质将得到提升。在阻塞的情况下,支持分层服务的网络首先传输对于解码最重要的的基层包。分层的视频编码方法最早被提出来用于对抗在ATM网络中的包丢失,提高传输的鲁棒性。随后,这种编码方法被MPEG-2和MPEG-4两个标准组织接受作为一种主要的错误纠正和可伸缩的编码方法。这种分层的编码也被应用于一些IP中多播的应用,例如Internet多播骨干网。在MDC中整个比特流(描述是同等重要的)。分层编码经常与不均等误差保护(UEP: Unequal Error Protection)相关,进而对传输中最重要的数据,即基层数据,提供了更高的保护性。尽管如此,如果基层发生丢失(如,由于服务器崩溃或是连接失败),或是接收中有大量的错误,那么由于层间的等级性结构,增强层中附加的信息几乎没有用处。MDC技术把视频序列压缩成几个具有相同重要性的比特流。每个比特流(也叫描述)独立解码,而他们之间可以互相增强。当接收器接收到更多的描述时,重建的视频品质更高。因此,并行的可扩展性在多描述编码是天然存在的。本文中的一部分内容就是研究在LC和MDC中如何生成比特流。每一帧首先经过DCT变换,然后被量化和Zigzag编码。在分层的编码中,最重要的DCT系数(前十个系数)被分配给基层,其余的被分配给增强层。在多描述编码中,64DCT系数被等价地分割成奇偶两个部分。仿真结果显示MDC场景要优于LC场景。实验仿真证明,相对于分层编码,如果适当地结合路径多样性或服务器多样性多描述编码技术可以明显的提升实时的视频应用的鲁棒性。在MDC编码中,由于在存在错误的情况下所有接收到的信息都是有用的,这样就避免了尽力而为网络中分层编码的问题,从而在尽力而为的包传输网络中,对于视频传输这种编码方法非常有效。
[Abstract]:At present, we are in an era of high information development, and we will encounter a great deal of multimedia content data in our daily life, especially the pictures and video information to be transmitted through the network. The demand for rich media on the Internet and wireless networks is growing rapidly, driving these rich-media communication and entertainment services, not only for enhanced broadband access, but also strong media coding techniques to make the transmission more efficient. Some video coding standards, such as the iso/ iec mpeg series and the itu-t video coding standard, have been developed to significantly reduce the data rate. Most of these video compression methods use a block-based discrete cosine transform (dct) with motion compensation to eliminate spatial and temporal redundancy. In the video coding technology designed for network transmission, the two main problems are: the first is that the performance of any network system is the best to deliver the data, but it can't guarantee the reliability of the network sex. video data, compared to other data types, have a larger amount of data, so the network's limited transmission bandwidth, low processor power consumption, and available storage space may limit its propagation energy Force. For video applications, high transmission errors bring additional water, such as time delay, complexity, and product quality. Retransmission is an effective way to address network transmission errors, but it introduces a network-attached load that may not be suitable for requiring low latency with. Its main purpose is to protect the video data and to hide or restore the number of videos in the possible errors According to. Heterogeneity in large-area networks is another question of limiting video applications problem. Different types of networks have different bandwidth and flow negative The heterogeneous video network requires the provision of video services with variable quality and is capable of automatically and accurately meeting these requirements Please. The most critical part of video compression is the transport motion estimation. Motion estimation is a motion vector The process. These vectors determine the amount of transport generated in the previous frame to compensate for the predicted frame The real-time implementation of the algorithm is very important to the real-time realization of the algorithm. The motion estimation algorithm can be divided into time domain algorithm and frequency. Domain algorithm. The matching algorithm and the gradient-based algorithm are the weight of the time-domain algorithm. The matching algorithm can be divided into a block matching algorithm and a characteristic piece. The gradient-based algorithm can be divided into pixel recursion and block delivery. the method comprises the following steps of: applying phase correlation, wavelet domain matching and DCT domain matching in a frequency domain algorithm methods. gradient techniques are commonly used for image-to-image processing, The analysis of the sequence. The pixel recursive technique, as a subset of the gradient technique, is applied in the image sequence coding where the best match search is based on pixel-by-pixel on the basis of pixel-based technology requires very high computational complexity, discomfort, In-time application, the frequency-domain technique is the relation between the dependence and the transfer coefficient of the shift image, and it is not widely used in the image finally, the block matching technique, based on the idea of minimizing the particular cost function, becomes the most widely used method in the coding application, block-matched motion estimation in a variety of motion estimation algorithms is the most important method. To minimize the search time in a block match, a simple and effective calculation The block matching motion estimation (BMME) is the most popular and practical in the video coding The motion estimation method of the H.26X standard series and the MPEG standard series the bmme method is used. block matching is a related technique that looks for candidate images of a particular area in the current image block and the reference frame the best match between the blocks. The block matching process uses at least two frame pictures, that is, reference frames and current frames. the current frame is decomposed into individual macro blocks, the motion is estimated at each macro, a motion estimation algorithm finds the macro module to be encoded on the reference frame for the current frame the most matched macro-module, once the best-matched macro-module is found, the difference or the prediction error between the best-matched macro-module and the current macro-module is calculated, and then the DCT transformation is carried out, Quantization and run-length coding. In addition to coding differences between different macro blocks, the relative displacement between the two macro blocks The vector will also be encoded. In this paper, we first discuss various block-based fast motion estimation algorithms, which are based on the search speed and computational complexity. These algorithms are evaluated. The best performance will be The algorithms are carefully analyzed. These algorithms include exhaustive search or full search (FS: Full Search), three-step search (TSS: Three Step Search), new three-step search (NTSS: New Three Step Search), four-step search (4SS: Four Step Search), diamond search (DS: Diamond Search), and adaptive cross-mode search (ARPS: Adaptive Good Patte) (r n Search). Secondly, we put forward the new ARPS The dynamic adaptive cross search algorithm. It uses the spatial correlation between the adjacent blocks, so we use ARPS _ S The ARPS _ S is based on the assumption that the distribution of the motion vector is not only related to the predicted motion vector height, but also has a high degree of correlation in both the vertical and horizontal directions This constitutes a cross-form. The module around the module of interest, the maximum and minimum of the MV, can be considered to be the estimated deviation of the predicted MV, so that they can be used as an accurate estimate of the length of the arm, indicating the phase The dynamic range of motion in the direction. In contrast to ARPS, in ARPS The four arms in the _ S are not equal. The initial search point for ARPS is 5, and ARPS The number of initial search points for _ S is 6. In our lab, ARPS _ S is searching for speed and video The quality is better than the ARPS. In the end, the paper will discuss the use of the scalable the scalable video coding and decoding technique refers to a user encoding a video sequence into a plurality of bit streams, so as to support the various quality levels of the decoding end. The two types of scalable error recovery coding techniques are described and evaluated in this paper: layered coding and decoding (LC: Layered Coding) and multi-description codec (MDC: Multiple Descr) the properties of the compressed video bitstream are such that video difference recovery techniques have a great importance. for example, the error of a single bit in the vlc encoded video data may result in a loss of synchronization between the encoder and the decoder, a loss of a plurality of video blocks is further caused by the loss of a plurality of bit errors, which often occur in the case of a burst channel error or packet loss, which may result in partial or full video frames, The loss of the time-domain dimension is caused by the loss of the time-domain dimension. The direct result of using motion compensation techniques when using motion compensation techniques. Error recovery and scalability are apparent The scalable video coding and decoding technique refers to the fact that the user encodes a video sequence into a number of bits The stream, thus supporting the various quality levels of the decoding end. The scalability is in some acceptable information A good robustness is provided in the event of a loss. At the same time, it does not bring too much to the decoding The problem does not seriously affect the visual quality. The layered codec (LC: Layered Coding) and the multi-description codec (MDC: Multiple Descriptions Coding) are video transmission Two types of scalable coding techniques. Robust video coding and decoding techniques are limiting the propagation and enhancement of errors It plays an important role in the visual quality. The robust video coding and decoding can be achieved by simultaneously designing a reasonable result and maintaining the acceptable redundancy at the minimum complexity The invention can effectively solve the problem of error concealment, In several layers, each layer has a different importance to fidelity. The layer is also called a base layer and the base layer may be independently encoded. sometimes called the enhancement layer, their decoding depends on the base layer. The quality of the video at the base layer is the lowest, with the quality of the video will be improved with the enhancement of the enhancement layer. In the case of congestion, the network that supports the layered service the network first transmits a base layer packet for decoding the most important base layer packet. the layered video encoding method is first proposed to be used to combat the at least one of the at least one of the at least one of the at least one of The packet loss in the m network is lost and the robustness of the transmission is improved. the main error correction and the scalable coding method. this layered coding is also applied to the multicast in some ip With, for example, the internet multicast backbone. The entire bit stream in the MDC (description is equally important). The layered coding is often associated with unequal error protection (UEP: Unfair Error Protection), which in turn is the most important in the transmission the data, that is, the base layer data, provides a higher degree of protection. Nevertheless, if the base layer is lost (e.g., due to a server crash or a connection failure) or a large number of errors are received, The additional information in the enhancement layer is hardly useful in the nature of the structure. The frequency sequence is compressed into several bit streams with the same importance. Each bit stream (also called Description) Independent decoding, and they can be enhanced with each other. When the receiver the reconstructed video quality is higher when more description is received. thus, The parallel scalability is naturally occurring in multi-description coding. A part of this article This is how to generate a bit stream in the LC and MDC. Each frame first passes through D ct transforms and then quantized and zag-coded. in the layered coding, the most important dct coefficients (the first ten systems the number) is assigned to the base layer and the remaining allocated to the enhancement layer. hi the multi-description encoding,6 the 4 dct coefficients are equally divided into odd and even two parts, The simulation results show that the MDC scene is better than the LC scene. the method can obviously improve the robustness of the real-time video application. in the mdc coding, since all the received information is useful in the case of an error, the problem of layered coding in the best-effort network is avoided, so that the best-effort packet transmission network
【学位授予单位】：北京邮电大学
【学位级别】：博士
【学位授予年份】：2014
【分类号】：TN919.81

【相似文献】