视音频信息融合算法研究

发布时间：2018-11-28 14:54

【摘要】：近年来,随着计算机信息化进程的发展,越来越多的视频设备以及技术应用到人们的学习以及日常生活中。视频会议、视频搜索引擎技术以及视频数据查询等等技术的应用,在包括电影、电视、会议记录、科学文献等众多领域产生了大量的非文本数据。对于个人而言,个人摄影设备的普及,以及互联网技术的改进,让普通人发布个人拍摄视频变得极其简单,也因此产生了大量的视频数据。如何处理如此众多的多媒体信息,如何组织数据并对其建立索引进行检索,对现有的视频处理技术是个严峻考验。早期的多媒体信息检索算法已经偏离了便宜操作的最初目的,未来检索算法的设计需要融合底层更多具有代表性的视觉、听觉、语义特征。视频信息的多模态性质为信息融合提供了基础。现有的分析融合技术大多针对单一模态,但是视频是具有多模态性质的特殊数据,并且在描述同一主题时,其包含的多种模态具有很大关联性。因此需要一种有效的方法对视频进行融合分析,用于更加准确地对视频进行分类和检索。本文在处理视频特征、融合视频特征过程中的主要工作如下： 1、针对目前处理视频数据的模型定义局限于新闻、广告等特定领域,并且处理过程中使用的处理技术过于单一、陈旧,本文采用研究分析证明的一系列相对高效的视频处理技术定义了一个相对完备的视频检索预处理模型。该模型利用视频底层特征的多模态性质,提取出视频的时间结构,然后对内容进行特征提取,从原始视频中构造出视频数据的子集。本文基于此过程提取出视频的关键帧,并从视频的音频流中提取出音频特征。为简化运算,对提取出的底层特征统一进行降维处理,本文采用的降维算法为Shuicheng Yan等人最新研究的——边际fisher分析降维算法,该方法优于目前通常采用的PCA、LDA等降维算法。根据得到的各种特征向量,利用鲁棒性较好的支持向量机SVM分类器分类处理。 2、在对基于多模态特征的分类结果进行融合时,提出了一种改进的MGR融合算法。依据特征向量经分类器处理后输出的样本序号矩阵,基于Melnik等设计的融合框架,为实现置信度和优先权的优化,设计了一个融合分数函数来改进MGR算法。改进后的算法比起MGR算法,降低了计算量,并且减少了参数数量,在识别率方面也有一定的改善。
[Abstract]:In recent years, with the development of computer information technology, more and more video equipment and technology are applied to people's learning and daily life. The application of video conference, video search engine technology and video data query technology has produced a lot of non-text data in many fields, such as film, television, meeting record, scientific literature and so on. For individuals, the popularity of personal photography devices and improvements in Internet technology have made it extremely easy for ordinary people to publish personal videos, resulting in a lot of video data. How to deal with so many multimedia information and how to organize and index the data is a severe test to the existing video processing technology. The early multimedia information retrieval algorithm has deviated from the original purpose of cheap operation. In the future, the design of retrieval algorithm needs to integrate more representative visual, auditory and semantic features. The multimodal nature of video information provides the basis for information fusion. Most of the existing analysis fusion techniques are aimed at single mode, but video is a special data with multi-modal properties, and when describing the same topic, it contains a lot of modes with great relevance. Therefore, an effective method for video fusion and analysis is needed to classify and retrieve video more accurately. The main work of this paper in the process of processing video features and merging video features is as follows: 1. The definition of model for processing video data is limited to specific fields such as news, advertising and so on. And the processing technology used in the processing process is too single and obsolete. In this paper, a relatively complete video retrieval preprocessing model is defined by a series of relatively efficient video processing techniques proved by research and analysis. In this model, the temporal structure of video is extracted by using the multi-modal properties of the bottom features of video, and then the content is extracted and a subset of video data is constructed from the original video. Based on this process, the key frame of video is extracted and audio features are extracted from audio stream of video. In order to simplify the operation and reduce the dimension of the extracted bottom features uniformly, the dimensionality reduction algorithm used in this paper is the marginal fisher analysis dimension reduction algorithm, which is recently studied by Shuicheng Yan et al. This method is superior to the PCA,LDA equal-dimension reduction algorithm which is usually used at present. According to the obtained feature vectors, a robust support vector machine (SVM) SVM classifier is used. 2. An improved MGR fusion algorithm is proposed when the classification results based on multi-modal features are fused. Based on the sample ordinal matrix of the feature vector processed by classifier and based on the fusion framework designed by Melnik and so on, a fusion fraction function is designed to improve the MGR algorithm in order to optimize confidence and priority. Compared with the MGR algorithm, the improved algorithm reduces the computational complexity, reduces the number of parameters, and improves the recognition rate.
【学位授予单位】：太原理工大学
【学位级别】：硕士
【学位授予年份】：2011
【分类号】：TP391.41

【相似文献】