自适应视频摘要算法研究
发布时间:2018-03-06 18:24
本文选题:视频摘要 切入点:字典学习 出处:《中国科学技术大学》2017年博士论文 论文类型:学位论文
【摘要】:随着数字录像设备的普及以及网络技术的发展,视频逐渐成为人们记录个人生活、并进行沟通的一种重要形式。每一天都会产生大量的视频,这些视频内容的范围很广,包括新闻、体育赛事、电视剧、综艺节目以及自拍等等。这些海量的视频,一方面给人们带来巨大的观看负担,全部看完非常耗时;另一方面,也给视频服务器、网站带来了巨大的存储压力。因此,人们迫切需要一种方法能够把视频中的关键内容提取出来进行快速观看、有效存储。视频摘要技术就是为了满足这种需求而诞生的。近年来视频摘要技术有了巨大发展,但还未成熟。本文的研究正是针对提高视频摘要的性能展开的。本文对视频摘要技术中存在的问题进行了深入的研究。目前,视频内容种类繁多,内容千差万别;甚至在同一个视频中,也可能会包含很多的场景、且这些场景之间的差异非常大。视频数据的这种多样性,给视频摘要算法的适应性提出了较高的要求。算法需要能够根据视频数据的内容,自适应地调节其提取特征的方式,进行视频分段,提取关键帧,组成视频摘要。瞄准这些需求,在已有的视频摘要算法研究成果的基础上,本文结合目前的字典学习和稀疏表示、深度学习等技术,对视频摘要中的特征提取、视频分段和视频内容重要性评价等环节进行了深入研究,提出了相应的解决方法,并在标准数据集上进行了测试,对结果进行了分析。下面对本文的工作进行简要介绍:1)提出了一种基于图正则化稀疏编码的视频摘要算法。传统的视频摘要算法在特征提取环节,往往直接按照某种事先制定好的规则来计算特征值。但是由于视频内容较为多样,这种事先制定好规则的提取特征方式,往往不能够准确描述多样的视频内容。为了提高算法的适应能力,我们使用字典学习和稀疏表示方法,用无监督特征学习的方式,根据视频内容,自适应地学习出视频内容对应的合适的特征空间,对视频进行特征提取。通过采用这样的方法,视频特征能够更加准确地描述其内容,且具有较强的场景适应性。2)提出了一种基于自适应阈值的视频摘要算法。在提取了视频帧的特征之后,需要进行视频分段,获得视频的结构信息,作为生成视频摘要的参考。现有的视频分段算法,采用的是度量视频帧之间的相似度、用固定阈值的方式来对视频进行分段。然而,由于视频数据的多样性,同一个固定阈值很难在不同视频中达到理想效果。这是因为,在不同的视频中,其视频内容的变化剧烈程度不同,因此其最优的分割阈值也应该不同。为了能够增强分段算法的适应性,文中提出了一种基于自适应阈值的视频摘要算法。该算法能够根据每个视频中视频帧变化的剧烈程度,自适应地调整视频分段的阈值。这样增强了算法的适应能力,有助于提高所生成的视频摘要的质量。3)提出了一种基于自动编码机的视频摘要算法。对视频进行了分段、获得了视频结构信息之后,需要确定不同视频段的重要性程度,并将最重要的部分提取出来作为视频摘要。重要性评价是一个非常重要且复杂的问题。一方面,其评价结果直接影响着视频摘要的结果:另一方面,视频内容的重要性评价比较主观和抽象,很难用一组公式去进行概括和总结。本文首先通过视频标题来收集网络上和视频内容相关的图片;然后,用自动编码机来学习图片和视频中共有的模式信息;最后,用训练好的编码机模型,对视频内容进行重要性评价,依之生成视频摘要。本文的方法,通过使用深度网络对网络图片中的信息进行挖掘,能了解大众对某些事物的判断,因而能够更加准确地判断视频内容的重要性。4)在实验环节,我们将以上提出的方法,在VSUMM,Youtube和SumMe等标准数据集上进行了测试,并进行了详细的分析。结果表明,我们的方法在这些数据集上得到了更好的结果,生成了比现有方法质量更高的视频摘要。
[Abstract]:With the rapid development of the popularity of digital video equipment and network technology, video recording has gradually become an important form of personal life, and communicate. Every day will produce a large number of video, the video content range is very wide, including news, sports, television dramas, variety shows and the self and so on. The massive video, on the one hand to bring huge burden to watch, read all very time-consuming; on the other hand, but also to the video server, the website has brought huge storage pressure. Therefore, it is an urgent need for a method to extract the key contents of the video quickly watch video abstract technology is the effective storage. In order to meet the demands of birth. In recent years, video abstract technology has made great progress, but still immature. This study is aimed at improving the performance of the video. This paper studied the existing problems in the video abstract technology. At present, many kinds of video content, content is different; even in the same video, may also contain a lot of scenes, and the difference between these scenarios is very large. The video data diversity, put forward higher requirements to abstract video adaptive algorithm. The algorithm needs to be able to according to the content of the video data, which adaptively adjust the feature extraction method, video segmentation, key frame extraction, video composition abstract. Aimed at these demands, the existing research results as the frequency algorithm on the basis of combining the dictionary learning and sparse representation, technology deep learning, feature extraction of video abstract, video segmentation and video content importance evaluation and other aspects of the in-depth study, put forward the corresponding solutions, and in the standard Data sets were tested, the results were analyzed. The work of this paper are briefly introduced: 1) proposed a video summarization algorithm of graph regularized sparse encoding based on traditional video summarization algorithm in the feature extraction step, often directly according to some prior made good rules to calculate the eigenvalues. Because the video content is more diverse, extract the features of this pre established rules, and often can not accurately describe the variety of video content. In order to improve the algorithm's adaptability, we use a dictionary learning and sparse representation method for unsupervised feature learning methods, according to the video content, adaptive learning space suitable video features corresponds to the content, the video feature extraction. By using this method, the video features can more accurately describe the content, and has strong adaptation to the scene .2) this paper proposes a video summarization algorithm based on adaptive threshold. After extracting the features of video frames, the need for video segmentation, obtain the structure information of video, video abstraction as reference. The existing video segmentation algorithm is used to measure the similarity between video frames, using a fixed threshold method segmentation of the video. However, due to the diversity of video data, with a fixed threshold is difficult to achieve the desired effect in different video. This is because, in different video, the video content is not the same degree of change, so the optimal segmentation threshold should also be different. In order to improve the segmentation algorithm the adaptability, this paper proposes a video summarization algorithm based on adaptive threshold. The algorithm according to the severity of the video in each video frame change, adaptive adjustment of video segmentation threshold Value. This enhances the algorithm's ability to adapt to the quality,.3 helps to improve the generated video) Abstract This paper proposes a video encoding algorithm based on the automatic machine. The video segment, after obtaining the video information, to determine the importance degree of different video segments, and will be the most important as part of the extract video summary. The importance of evaluation is a very important and complicated problem. On the one hand, the evaluation results directly affect the result of video abstract: on the other hand, to evaluate the importance of video content is subjective and abstract, it is difficult to use a formula to summarize. Firstly, through the video title to collect the network video content and related images; then, using automatic encoding machine to learn pictures and videos of common mode information; finally, use the trained model encoding machine, video content into For the importance of evaluation, according to the generated video abstract. This method of mining depth through the use of network information on the network image, to understand public opinion about some things, so it can more accurately judge the importance of video content.4) in the experiment, we will put forward the above method in VSUMM, Youtube and SumMe standard data sets were tested and analyzed in detail. The results show that our method on these data sets to get better results, generating higher than the existing methods of quality video abstract.
【学位授予单位】:中国科学技术大学
【学位级别】:博士
【学位授予年份】:2017
【分类号】:TP391.41
【相似文献】
相关博士学位论文 前1条
1 李佳桐;自适应视频摘要算法研究[D];中国科学技术大学;2017年
,本文编号:1575969
本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/1575969.html