当前位置:主页 > 文艺论文 > 动漫艺术论文 >

基于内容和语义的视频短镜头分类

发布时间:2019-03-12 12:19
【摘要】: 随着近几年来多媒体技术和网络技术的发展,网络上涌现出了越来越多的视频短镜头和在线视频网站,因此,基于内容和语义的视频短镜头的分类检索也成为了一个人们研究领域。 视频短镜头是由时间上连续的帧图像组成的集合,因此对视频的分析包括空间和时间两个方面。空间上的分析,可以利用现有的图像特征提取技术,提取有效的视觉特征;对时间的分析,就需要对短镜头的数据进行结构化分析和处理。静态和动态特征的结合形成描述短镜头内容的特征空间。另一方面,由于传统的视频镜头分类系统没有考虑镜头的高级语义信息,这样导致了底层视觉特征和高层语义信息之间存在着语义鸿沟,因此在分类系统中加入对语义特征的分析和研究是十分有必要的,尝试由视频短镜头的底层特征推知高层语义信息,从而实现基于高级语义的镜头分类系统。 因此,本文主要从以上两个方面进行了研究,并根据现有方法的特点和不足,提出了相应的解决办法。 在提取了多种视频短镜头的视觉特征的基础上,采用互信息的方法研究单一的视觉特征的鉴别力,该方法理论基础强,不依赖于分类器的种类,从特征含类别的信息量的多少来分析特征的鉴别力,表达了图像特征与类别之间的内在联系,试验中基于SVM分类器的分类错误率也反映了使用互信息进行特征分析和选择的正确性和有效性。接下来使用SVM分类器,分析各种视觉特征之间的互补或冗余关系,从而进行最优特征组合的选择。研究确定的针对真人/动漫类别的最佳特征是RGB改进颜色矩+边缘动态特征的组合特征,针对人物/风景类别的最佳特征是RGB改进颜色矩+Gabor纹理特征+边缘动态特征的组合特征,针对体育/娱乐类别的最佳特征是边缘方向直方图+颜色动态特征。 最后在针对球类比赛的视频短镜头分类系统中加入了高级语义特征的提取和研究,利用镜头内关键帧的比例和关键帧内球场区域像素比例的特征组合,将短镜头数据库分成场内和场外场景,利用球场区域的比例进一步将场内镜头分为远景和近景镜头,同时利用边缘区域的像素比例将场外场景分成教练员和观众镜头,从而形成了一种针对球类运动的分等级的短镜头分类器。
[Abstract]:With the development of multimedia technology and network technology in recent years, more and more video short shots and online video websites have emerged on the network. The classification and retrieval of video short shots based on content and semantics has also become a research field. Video short shot is a collection of time-continuous frame images, so the analysis of video includes two aspects: space and time. Spatial analysis can make use of the existing image feature extraction techniques to extract effective visual features, and the analysis of time requires the structural analysis and processing of short lens data. The combination of static and dynamic features forms a feature space that describes the content of a short lens. On the other hand, the traditional video shot classification system does not consider the high-level semantic information of the shot, which leads to the semantic gap between the underlying visual features and the high-level semantic information. Therefore, it is necessary to analyze and study the semantic features in the classification system. We try to infer the high-level semantic information from the low-level features of video short shots, so as to realize the shot classification system based on the high-level semantics. Therefore, this paper mainly from the above two aspects of research, and according to the characteristics and shortcomings of the existing methods, put forward the corresponding solutions. On the basis of extracting the visual features of a variety of video short lenses, the method of mutual information is used to study the discriminating power of a single visual feature. The method has a strong theoretical basis and does not depend on the classification of classifiers. The discriminating power of the feature is analyzed from the amount of information contained in the feature category, and the inherent relationship between the image feature and the category is expressed. The classification error rate based on SVM classifier in the experiment also reflects the correctness and effectiveness of using mutual information for feature analysis and selection. Next, the SVM classifier is used to analyze the complementary or redundant relations among various visual features, so as to select the optimal feature combination. The best feature identified for real-life / animation categories is the combination of RGB's improved color moment edge dynamic features. The best feature of person / scenery category is the combination feature of RGB improved color moment Gabor texture feature edge dynamic feature, and the best feature of sports / entertainment category is edge direction histogram color dynamic feature. Finally, the extraction and research of advanced semantic features are added to the video short shot classification system for ball games. The feature combination of the ratio of keyframes in the shot and the ratio of the pixels in the field area in the keyframes is used to extract and study the high-level semantic features. The short lens database is divided into in-field and off-field scenes, and the in-field lenses are further divided into long-range and close-range lenses by using the scale of the field area, and the off-field scenes are divided into coaches and spectators by using the pixel ratio of the edge area. Thus, a hierarchical short lens classifier for ball motion is formed.
【学位授予单位】:上海交通大学
【学位级别】:硕士
【学位授予年份】:2009
【分类号】:TP391.41

【引证文献】

相关硕士学位论文 前1条

1 邓克捷;基于主题的体育新闻视频检索的研究[D];中南大学;2011年



本文编号:2438765

资料下载
论文发表

本文链接:https://www.wllwen.com/wenyilunwen/dongmansheji/2438765.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户47707***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com