当前位置:主页 > 科技论文 > 电气论文 >

基于学习的目标检测方法研究与应用

发布时间:2018-10-29 08:25
【摘要】:目标检测是目标跟踪与识别等研究的基础,其方法的好坏与跟踪、识别以及处理的精度息息相关。现实生活中,由于图像或视频采集的过程中受到不同光照或气候、局部遮挡、阴影、拍摄视角、目标尺度改变以及旋转等因素的影响,导致目标的外观形态特征有了很大变化,这些都给目标检测带来了巨大的挑战。针对静止场景和动态背景两种情况下目标检测存在的主要问题,本文主要研究基于学习的目标检测方法。提取高压输电线路中的电力线、监测输电线路以及监测铁塔形变和故障点是电力自动巡检中的主要任务。在直升机电力巡检中对拍摄的视频图像进行电力铁塔的检测,这对于铁塔类型的判定、铁塔形变和故障的判断均有着至关重要的作用。本文提出了基于学习的两级塔台检测方法,主要由以下步骤组成:首先,从直升机/无人机拍摄的输电线路巡检视频中剪切丰富的塔台和非塔台图片构成训练样本集,并进行正负样本标注;其次,对训练样本集提取局部二值模式(LBP,Local Binary Pattern)特征,把特征集和标注信息送入自适应增强(ADABOOST,Adaptive Boosting)模型进行训练,学习生成分类器classifier1;然后,设计深度学习CNN模型结构,把训练样本集和标注信息送入快速特征嵌入的卷积结构(CAFFE,Convolution Architecture for Fast Feature Embedding)架构下的卷积神经网络(CNN,Convolution Neural Network)模型,学习生成分类器classifier2;最后,在多尺度下,将滑动窗内的测试视频图像块送入训练生成的分类器classifier1,根据classifier1输出得到塔台候选区域;将塔台候选区域送入训练得到的分类器classifier2,判定其是否为塔台目标,根据classifier2输出得到塔台的准确定位。自然场景中文字信息的获取服务机器人在盲人辅助导航、视觉定位等领域有着广泛的应用前景。由于自然场景中出现的文字其位置、方向、字体、颜色、尺寸的多样性和模糊、污染、遮挡等的影响,导致自然场景中文字的检测定位本身就是一个极具挑战性的问题。本文提出了一种基于视觉词袋(BOVW,Bag of Visual Word)模型的文字标牌高效检测方法,主要由训练和测试两部分组成。在训练部分,首先选用计算简单且对尺度变化和旋转有一定鲁棒性的二进制鲁棒不变可扩展关键点(BRISK,Binary Robust Invariant Scalable Keypo-ints)作为文字标牌的纹理特征;接着提取图像的BRISK特征并进行自生长和自组织神经网络(SGONG,Self-Growing and Self-Organized Neural Gas network)聚类,得到视觉字典;再提取训练正负样本图像的BRISK特征并在视觉字典上进行特征量化,得到BRISK形状直方图特征,同时提取其HS颜色直方图特征,并进一步计算得到HS颜色不变性直方图特征;然后融合两个特征得到文字标牌的强区分性特征;最后选用ADABOOST分类器作为文字标牌的分类算法,对文字标牌样本集进行训练得到文字标牌检测器。在测试部分,首先利用最大稳定色彩区域(MSCR,Maximally Stable Color Regions)算法对自然场景中的文字标牌进行初检以降低直接使用分类器进行检测的复杂度;然后在MSCR检测得到的候选区域中,采用学习得到的文字标牌检测器对文字标牌进行细检,得到定位的文字标牌。使用时长共约30分钟的测试视频对本文直升机电力巡检系统中的塔台检测方法进行测试,结果表明,该方法可以达到92%的召回率和79%的准确率,其加权调和平均值为85%,其平均每帧检测耗时大约为0.33秒,相比于级联ADABOOST方法和深度学习CNN方法,具有更好的检测性能。由此可见,为进一步执行检修任务并进行故障判断,本文的塔台检测方法可直接应用在直升机电力巡检系统中的塔台自动检测方面。使用678幅街景图像(含661个文字标牌)对本文自然场景中的文字标牌检测方法进行测试,结果表明,该方法对于远、中、近文字标牌的检测率分别达到了 76%、81%和90%,识别文字标牌的能力也分别为58%、78%和89%,相比于采用HS颜色不变性特征、尺度不变特征转换(SIFT,Scale Invariant Feature Transform)特征、SIFT+HS、BRISK、快速视网膜关键点(FREAK,Fast Retina Keypoint)或FREAK+HS特征进行文字标牌检测的方法,其对文字标牌的定位更准确,误检较少,检测准确率较高,且检测耗时较少,因而具有较好的检测性能。因此,为了更好地进行文字分割与识别,本文的文字标牌检测方法可直接应用于自然场景中的文字标牌检测定位方面。
[Abstract]:Target detection is the basis of research on target tracking and recognition, and its methods are closely related to tracking, identification and processing precision. in real life, due to the influence of different illumination or climate, partial shielding, shading, shooting angle, target scale change and rotation in the process of image or video acquisition, the appearance morphological characteristics of the target are greatly changed, These have brought enormous challenges to target detection. Aiming at the main problems of target detection in two cases of stationary scene and dynamic background, this paper mainly studies the target detection method based on learning. extracting power line in high-voltage transmission line, monitoring transmission line and monitoring tower deformation and fault point are main tasks in automatic power inspection of electric power. The detection of the power tower is carried out in the helicopter power inspection, which plays an important role in the determination of tower type, iron tower deformation and fault diagnosis. The paper puts forward a two-stage tower detection method based on learning, which mainly comprises the following steps: firstly, a training sample set is formed by cutting a rich tower and a non-tower picture in a transmission line inspection video shot by a helicopter/ unmanned aerial vehicle and carrying out positive and negative sample labeling; secondly, extracting a local two-valued mode (LBP, Local Binary Pattern) feature on the training sample set, sending the feature set and the annotation information into an adaptive enhancement (ADABOSS, Adaptive Boosting) model for training, and learning to generate a classifier classfier 1; then, designing a depth learning CNN model structure, the training sample set and the dimension information are fed into a convolution neural network (CNN) model under the structure of a fast feature embedded convolution structure (CAFFE, Convolant Architecture for Fast Transfer Embedded), and the classifier classfier 2 is generated; and finally, under the multi-scale, the test video image block in the sliding window is sent to the training-generated classifier classfier 1, the tower candidate area is obtained according to the output of the classfier 1, the tower candidate area is sent to the trained classifier classfier 2, and whether the tower candidate area is a tower target is judged, and the accurate positioning of the tower is obtained according to the output of the classfier 2. The retrieval service robot of the text information in the natural scene has a wide application prospect in the fields of assistant navigation, vision positioning and the like of the blind. Due to the influence of the position, direction, font, color and size of the characters appearing in the natural scene, the detection and positioning of the characters in the natural scene is a very challenging problem. This paper presents a high-efficiency detection method based on BOVW, Bag of Visual Word (BOVW, Bag of Visual Word) model, which mainly consists of two parts: training and testing. In the training section, we first select the binary roubar invariant extensible key points (BRIK, Binary Robust Invariant Scalable Keyo-points) which are simple to calculate and have certain robustness to the scale change and rotation as the texture features of the text label; then extracting the BRISK feature of the image and carrying out self-growth and self-organizing neural network (SGONG, Self-Growth and Self-Organic Materials network) clustering to obtain a visual dictionary, extracting the BRISK features of the training positive and negative sample images and performing feature quantization on the visual dictionary to obtain the BRISK shape histogram feature, At the same time, the HS color histogram feature is extracted, and the HS color invariant histogram feature is further calculated; then the strong distinguishing feature of the character tag is obtained by fusing the two features; and finally, the ADABOOST classifier is selected as the classification algorithm of the text label, and training the text sign sample set to obtain a text label detector. In the test part, firstly, a maximum stable color region (MSCR) algorithm is utilized to perform initial detection on the text sign in the natural scene to reduce the complexity of directly using the classifier to detect, and then, in the candidate area obtained by the MSCR detection, A text label detector obtained by learning is used for carrying out fine inspection on the character placard to obtain the positioning text label. The test results show that the method can achieve 92% recall rate and 79% accuracy, and the weighted harmonic mean value is 85%. The average per frame detection time is about 0.33s, and has better detection performance than the cascaded ADABOLDA method and the depth learning CNN method. Therefore, in order to further carry out the maintenance task and make fault judgment, the tower detection method in this paper can be directly applied to the automatic detection of the tower in the helicopter power patrol system. Using 678 street view images (including 661 text signs) to test the characters in the natural scenes in this paper, the results show that the detection rate of the method is 76%, 81% and 90%, respectively. The ability to identify a text label is also 58%, 78%, and 89%, respectively, compared to a method of text label detection using the HS color invariance feature, the Scale Invariant Feature Transform (SIFT, Scale Invest Transform) feature, SIFT + HS, BRIK, Fast Retina Key (FREAK, Fast Retina Keypoint), or FREAK + HS feature, respectively. The method has the advantages of more accurate positioning, less false detection, high detection accuracy, less detection time consumption and better detection performance. Therefore, in order to better carry out character segmentation and recognition, the text label detection method can be directly applied to the detection and positioning of text tags in natural scenes.
【学位授予单位】:西安理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.41;TM755

【参考文献】

相关期刊论文 前10条

1 李楠;;基于深度学习框架Caffe的路面裂缝识别研究[J];工程技术研究;2017年03期

2 刘亮;王平;孙亮;;基于区域灰度变化的自适应FAST角点检测算法[J];微电子学与计算机;2017年03期

3 何希平;张琼华;刘波;;基于HOG的目标分类特征深度学习模型[J];计算机工程;2016年12期

4 赵栋杰;;改进的LBP算子和稀疏表达分类在人脸表情识别上的应用[J];电子设计工程;2016年20期

5 张元军;李清华;;电力铁塔运行状态智能在线监测的研究及应用[J];科技视界;2016年22期

6 钟_,

本文编号:2297147


资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/dianlidianqilunwen/2297147.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户b6bda***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com