基于上下文的移动多媒体信息标注和管理及关键技术研究

发布时间:2019-06-11 01:00
【摘要】:近年来,随着计算机通信和多媒体压缩技术的飞速发展以及存储成本的不断下降,尤其是智能手机的流行和各种社交网站的出现,视频、图片等视觉数据的规模呈现爆炸性增长,如何有效的管理和获取这些数据成为一个亟待解决的问题。为了利用文本管理和检索技术实现对这些数据的直接访问,视频和图片的语义标注技术逐渐发展起来,而由于人工标注效率低,成本高,主观性强,目前常用的解决方案是利用计算机对视觉数据进行自动标注。基于语义概念的自动标注是目前常用的标注技术之一,虽然取得了一定的成功,但仍旧存在一些问题影响了自动标注技术的进一步发展,其中包括对训练数据的依赖和视觉语义的局限性等。本文试图从一个新的角度来对待和处理视觉数据的自动标注问题。从本质上讲,视频和图片等视觉数据是视觉传感器对现实世界的实体和事件的描述载体,数据标注试图在视觉描述的基础上实现对原始语义的解析并以语言描述的形式进行还原,以方便组织和管理。视觉传感器是将其功能范围内目标的视觉表现进行记录,而大量与目标语义相关的上下文信息被忽略掉。目前该领域的研究重点仍是如何充分挖掘视觉数据包含的语义信息,与此不同,本文将注意力放在视觉数据的产生过程。随着物联网技术的发展,各种可穿戴感知设备逐渐普及,本文旨在利用可穿戴感器实现对视觉目标相关的上下文信息进行收集和利用,以帮助视觉数据的语义解析,主要研究成果如下:·常规视频中人脸检测和跟踪技术需要处理视频中的每一帧图像,本文提出了一种快速人脸检测和跟踪算法,通过利用传感器收集的上下文信息过滤大量无脸视频帧,从而降低处理时间,减少人脸误报和漏报,提高了人脸检测和跟踪的性能和效率。·在利用传感器进行快速人脸识别的基础上,通过深入挖掘不同感知模式中目标身体运动方向的一致性,提出了一种视频中正面脸部图像识别的方法。与前述的身份识别类似,可穿戴传感器引入使识别过程摆脱了对样本数据的依赖,实验证明,该方法具有更好的鲁棒性。·传统的视频中目标身份识别方法为了保证准确性,需要针对每个目标收集大量高质量的样本数据。本文提出了一种基于运动匹配的身份识别方法,该方法利用同一目标在不同感知模型中运动特征的内在一致性,通过引入可穿戴传感器来协助解决视频中的目标身份识别问题,该方法避开了传统的处理流程,摆脱了对样本数据的依赖,具有逻辑简单,计算复杂度低,可靠性高的特点。·提出了一种视频自动标注方法,该方法分别利用两种不同种类的感知数据进行动作识别,并且通过融合不同感知模式下的判定结果,揭示了目标的身份,最终达到以时间、地点、人物、动作的形式对视频内容进行标注的目的。
[Abstract]:In recent years, with the rapid development of computer communication and multimedia compression technology and the continuous decline of storage costs, especially the popularity of smart phones and the emergence of various social networking sites, video, The scale of visual data such as pictures is exploding. How to effectively manage and obtain these data has become an urgent problem to be solved. In order to use text management and retrieval technology to access these data directly, the semantic tagging technology of video and picture has been gradually developed, but because of the low efficiency, high cost and subjectivity of manual tagging, At present, the commonly used solution is to use computer to automatically mark visual data. Automatic tagging based on semantic concept is one of the commonly used tagging technologies at present. Although it has achieved some success, there are still some problems that affect the further development of automatic tagging technology. It includes the dependence on training data and the limitation of visual semantics. This paper attempts to deal with and deal with the problem of automatic marking of visual data from a new point of view. In essence, visual data such as video and pictures are the description carriers of real-world entities and events by visual sensors. Data tagging attempts to analyze the original semantics and restore them in the form of language description on the basis of visual description, so as to facilitate organization and management. The visual sensor records the visual performance of the target in its functional range, and a large number of contextual information related to the semantics of the target is ignored. At present, the research focus in this field is still how to fully mine the semantic information contained in visual data. Unlike this, this paper focuses on the generation process of visual data. With the development of Internet of things technology, a variety of wearable perceptual devices are becoming more and more popular. the purpose of this paper is to use wearable sensors to collect and utilize the context information related to visual objects in order to help the semantic analysis of visual data. The main research results are as follows: face detection and tracking technology in conventional video needs to deal with every frame of image in video. In this paper, a fast face detection and tracking algorithm is proposed. By filtering a large number of faceless video frames by using the context information collected by the sensor, the processing time is reduced and the false positives and missed positives of the faces are reduced. The performance and efficiency of face detection and tracking are improved. On the basis of using sensor for fast face recognition, the consistency of target body motion direction in different perception patterns is deeply excavated. In this paper, a method of front face image recognition in video is proposed. Similar to the above identification, the introduction of wearable sensors makes the recognition process get rid of the dependence on sample data, and the experimental results show that this method has better robustness. In order to ensure the accuracy of the traditional target identification method in video, A large number of high-quality sample data need to be collected for each target. In this paper, an identification method based on motion matching is proposed, which makes use of the inherent consistency of the motion features of the same target in different perceptual models, and helps to solve the problem of target identification in video by introducing wearable sensors. This method avoids the traditional processing flow and gets rid of the dependence on sample data. It has the characteristics of simple logic, low computational complexity and high reliability. A video automatic marking method is proposed. This method uses two different kinds of perceptual data for action recognition, and reveals the identity of the target by combining the decision results of different perceptual modes, and finally achieves the identity of the target at time, place and character. The purpose of marking the video content in the form of action.
【学位授予单位】:北京邮电大学
【学位级别】:博士
【学位授予年份】:2015
【分类号】:TP391.41


本文编号:2496875

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/2496875.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户a9add***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com