电视视频中的文字识别及检索技术的研究

发布时间：2018-06-06 04:01

本文选题：视频字幕 + 文字检测　；参考：《北京邮电大学》2016年硕士论文

【摘要】：信息时代的今天,视频多媒体内容呈现爆炸式增长,对大量视频信息进行自动分析整理成为了当下学术界和工业界的迫切需求。视频字幕与视频内容相关度较强,特征明显,并且表达了丰富的视频高级语义信息。本文通过研究视频字幕检测识别问题,依此建立视频检索系统,来解决视频内容提取和检索的相关问题,具体工作如下:(1)提出基于高对比度图进行图像文字二值化的算法。本文通过分析了视频字幕文字的常见特点,利用自适应局部对比度算法得到文字的高对比度图像。然后,利用大津滤波和基于文字图像灰度统计分布的方法进行文字图像二值化。(2)关注字符切分定位算法,通过分析汉字字形特点和常见切分错误,使用基于字宽聚类的方法对二值化文字图像进行单字切分定位。另外,根据字幕文字在视频流中停留的特点,利用帧间字符融合的方法对滤波的二值化文字进行图像去噪。(3)实现了对大量视频快速检索方法。本文以视频字幕为中心对视频信息进行结构化分析,并根据镜头检测算法提取出字幕对应的关键帧。倒排索引和和空间向量模型的引入使得系统检索的效率大大提升。(4)提出视频字幕识别和检索的前后端架构并进行代码实现。前端系统负责对视频流进行文字滤波提取以及识别,由PC或者DSP实现,识别结果回传后端服务器进行建立索引等信息综合操作。实验证明,本文提出的算法对于多种样式的字幕文字都具有较好的效果。本文根据不同视频字幕的特点,建立了视频测试数据集,结果表明,在具有84%左右字幕识别准确度的前提下,系统仍然具有很好的实时性,并且具有多路并行视频处理的潜力。
[Abstract]:With the explosive growth of video multimedia content in the information age, the automatic analysis of a large number of video information has become an urgent need of academia and industry. Video subtitles have strong correlation with video content and are characterized by abundant advanced semantic information. By studying the problem of video subtitle detection and recognition, this paper establishes a video retrieval system to solve the related problems of video content extraction and retrieval. The main work is as follows: 1) an algorithm for binarization of image text based on high contrast graph is proposed. In this paper, we analyze the common features of video captioned text, and use adaptive local contrast algorithm to obtain the high contrast image of text. Then, the text image binarization is carried out by using the Otsu filter and the method based on the grayscale statistical distribution of the character image.) the algorithm of character segmentation localization is concerned. By analyzing the character of Chinese characters and common segmentation errors, A method based on word width clustering is used to locate the binary character image by single word segmentation. In addition, according to the characteristic of subtitle text staying in video stream, a fast retrieval method for a large number of video is realized by using the method of inter-frame character fusion to remove image noise from filtered binary text. In this paper, video subtitles are taken as the center for structured analysis of video information, and key frames corresponding to subtitles are extracted according to shot detection algorithm. With the introduction of inverted index and spatial vector model, the efficiency of system retrieval is greatly improved. (4) the front and back architecture of video subtitle recognition and retrieval is proposed and implemented in code. The front-end system is responsible for the text filtering and recognition of video stream, which is implemented by PC or DSP, and the result is sent back to the back-end server to build the index and other information synthesis operations. Experimental results show that the proposed algorithm is effective for various subtitles. According to the characteristics of different video subtitles, the video test data set is established in this paper. The results show that the system still has good real-time performance and has the potential of multi-channel parallel video processing under the premise of accuracy of about 84% subtitle recognition.
【学位授予单位】：北京邮电大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.41

【相似文献】