关于互联网视觉媒体若干问题的研究和应用

发布时间：2019-05-29 18:29

【摘要】：随着互联网的飞速发展,越来越多的图片、视频和文字等多媒体信息被大量的上传到互联网。其中,图片和视频作为能高效地提供直观视觉效果的媒体,在社交网络中更是成为了最为活跃的一类信息载体。基于互联网视觉媒体的信息处理是指运用当前网络上存在的大量图片／视频等媒质,以及这些媒质所附带的标注、评论、用户喜好等信息,进行多源异质的媒体信息分析、处理及应用。其研究内容涉及到计算机图形学、计算机视觉以及机器学习等多个领域,目的是充分利用现有的视觉媒体资源,开发出适应用户需求的智能应用。基于上述背景,本文在具体应用中若干关键问题的驱动下,运用图像处理和计算机视觉中相关方法与技术,将互联网上多种视觉媒体资源进行智能整合和多样化重现,研究了以下三个方面的内容：多模态艺术化的图像渲染；皮影戏中人脸图像渲染和视频动画交互；面向家具风格的特征提取和视觉分类。本文的主要研究内容及创新之处概述如下： 1、提出了融合文字信息的多模态图像艺术化渲染方法,并设计实现了Picwords系统；图像的艺术化渲染是将图像风格抽象化和艺术化的图像处理技术。本文提出了一种全新的图像艺术化渲染方法,将图片和文字两种模态所携带的语义信息进行有机融合,以丰富原图的语义信息。该方法利用原图像的主体结构关联低频信息和整体效果,同时将文本进行几何形变并作为重构目标的高频细节信息,进而完成图片和文字两种模态的视觉融合。基于上述方法,本文设计并实现了多模态图像渲染系统Picwords。该系统将输入图像及其相关的关键词融合进同一张图片中,同时对关键词的权重进行了自动调整。该系统输出结果最大限度地保持了图像的整体视觉效果,并传达了更多语义信息,在海报设计、广告宣传和社交网络中都有得到广泛的应用； 2、提出了面向皮影戏的人脸艺术化渲染方法和动画交互方法,并设计了皮影戏遗产电子化保护系统；为了保护中国皮影戏这一宝贵的非物质文化遗产,本文设计了一个面向皮影戏的遗产电子化系统。该系统包括皮影戏创作模块和皮影戏操作模块,旨在利用网络上与皮影戏相关的图片和视频等视觉媒体资源,将皮影戏的创作个性化,操作简洁化。其创作模块根据用户提供的人脸图片,通过人脸渲染方法生成个性化的皮影戏头像并保持皮影戏人物的特点。操作模块可通过动画交互方式将皮影的表演动作的操作转化为由脚本命令进行控制,在保持了皮影戏表演特点的同时,简化了操作的复杂度。该系统可有力辅助皮影戏这一文化遗产的保护和传承。 3、提出了基于深度学习并融合传统图像分类的家具风格图片分类方法；家具风格是家具最具判别力的外观视觉特征。利用该特征进行家具风格的智能挑选与推荐,可提升现代家居生活质量,兼具学术与应用价值。传统的目标分类和家具的风格分类的不同之处在于：前者是以家具的结构和功能作为分类依据；而后者更注重发掘和分析家具细节上的不同,如花纹、材料、颜色等。本文对此展开了以下工作：首先,根据目前家具市场的风格选择需求,建立了家具风格的图像数据集,这也是第一个针对家具风格研究而建立的视觉数据集；其次,分别比较了传统的图像分类方法和基于深度神经网络的图像分类方法在家具风格分类上的性能,并提出了多尺度的图像卷积特征；最后,在深度学习的基础上融合传统图像分类方法,对16类家具风格分类进行实验(分类正确率达到了70%)并对实验结果进行了深入分析。
[Abstract]:With the rapid development of the Internet, more and more multimedia information such as pictures, videos and characters are uploaded to the Internet in a large amount. Among them, pictures and videos are the media that can provide visual visual effect with high efficiency, and the most active type of information carrier is in the social network. The information processing based on the Internet visual media refers to the analysis, processing and application of multi-source heterogeneous media information by using media such as a large number of pictures/ videos that exist on the current network, as well as the information such as the annotations, the comments, the user preference and the like that are attached to the media. The purpose of this study is to make full use of the existing visual media resources, and to develop an intelligent application to meet the needs of users. Based on the above background, under the driving of several key problems in the specific application, this paper uses the related methods and techniques of image processing and computer vision to realize the intelligent integration and diversification of various visual media resources on the Internet, and studies the following three aspects: Appearance: multi-modal and artistic image rendering; human face image rendering and video animation interaction in a shadow play; feature extraction and visual segmentation for furniture style The main contents and innovations of this paper are as follows: Next:1. The multi-modality image rendering method of the fusion word information is put forward, and the Picword is designed and implemented. s system; an artistic rendering of an image is a diagram that abstracts and arizes the style of an image In this paper, a new image rendering method is presented, which combines the semantic information carried by the two modes of the picture and the text, so as to enrich the original image. The method uses the main structure of the original image to relate the low-frequency information and the overall effect, and simultaneously carries out the geometric transformation of the text and is used as the high-frequency detail information of the reconstruction target, so as to finish the two modes of the picture and the text, Based on the above method, the multi-modality image rendering system (Pic) is designed and implemented. the system merges the input image and its associated keywords into the same picture, and the weight of the key words Automatic adjustment. The output of the system keeps the overall visual effect of the image to the maximum extent, and conveys more semantic information, which is available in the report design, advertising and social network extensive In this paper, the rendering method and the animation interaction method for the face of the shadow play are put forward, and the inheritance of the shadow play is designed. Electronic protection system; in order to protect the precious intangible cultural heritage of China's shadow play, this paper designs a shadow-oriented shadow The system includes a shadow play creation module and a shadow play operation module, Personalization and operation are simple. The authoring module generates a personalized skin and shadow play head by means of a human face rendering method according to the face picture provided by the user. the operation module can convert the operation of the acting action of the shadow to be controlled by the script command through an animation interaction mode, Simplifies the complexity of the operation. The system can be used to help the shadow play. The protection and inheritance of the heritage.3. The classification of traditional images based on depth learning and fusion is put forward. The furniture-style picture classification method; the furniture style is a home The visual features of the appearance with the most discriminating force can be used to make the intelligent selection and recommendation of the furniture style, and the modern home life can be improved. The difference between the traditional object classification and the style classification of the furniture is that the former is based on the structure and function of the furniture, and the latter is more focused on the excavation and analysis of the details of the furniture Different, such as pattern, material, color, etc. In this paper, the following work is carried out: firstly, according to the current furniture market style selection requirement, the image data set of the furniture style is set up, which is also the first to research the furniture style Secondly, the performance of the traditional image classification method and the image classification method based on the depth neural network in the classification of the furniture style is compared, and the multi-scale image convolution characteristics are put forward; and finally, in the depth study, On the basis of the traditional image classification method, the classification of the 16-class furniture style is carried out (the classification accuracy is up to 70%).
【学位授予单位】：合肥工业大学
【学位级别】：博士
【学位授予年份】：2014
【分类号】：TP391.41

【参考文献】