基于多模态深度学习算法的机器人自主抓取技术研究

发布时间：2018-06-26 02:15

本文选题：自主抓取 + 深度学习　；参考：《哈尔滨工业大学》2017年硕士论文

【摘要】：机器人自主抓取问题是机器人研究领域的一个重要问题,过去几十年,研究人员从不同角度展开探索,涉及夹持器结构设计、抓取规划控制、多传感器融合等,从分析式方法到数据驱动式方法的发展轨迹可以看出,抓取研究的进展伴随新技术、新方法的不断应用。当前,大数据、深度学习时代已经到来,将新的数据规模与新的研究方法结合能够产生不可估量的效果。深度学习的产生,给其他研究领域带来了大量启发,为此,本文在机器人自主抓取问题上应用深度学习技术,建立抓取分类模型,实现抓取检测系统的构建。基于深度学习建立了抓取分类器。本文区分数据模态之间的不同,采用数据末端融合的方法,建立了基于多模态卷积神经网络的抓取分类模型。首先,在图像数据和深度数据上分别应用卷积神经网络训练两个抓取分类器,然后将两个分类器作为特征提取器,分别提取样本矩形的图像抓取特征和深度抓取特征,最后,融合两种抓取特征再次构建一个顶层分类模型。联合两个特征提取器以及顶层分类网,本文构建了一个基于多模态深度学习算法的抓取分类器,实现了抓取矩形的精确分类。构建了自主抓取系统。首先,应用中值滤波填充Kinect深度图像缺失,标定Kinect获得机器人基坐标系下的点云以及对齐的图像数据,这部分内容实现场景传感数据的获取。其次,基于随机采样一致算法提取桌面实现目标分割,借助Sobel算子检测目标物体主方向,离散抓取五维搜索空间,这部分内容完成抓取矩形的采样工作。接着,在给定场景点云上剔除离群点并进行均值滤波,为点云法向量估计做准备,对检测获得的最优抓取矩形映射一组最优抓取参数,这部分内容完成抓取参数的获取。最后,调用逆运动学求解服务计算Baxter机器人抓取位姿的关节角位置,驱动Baxter到达此关节角位置实现对目标物体的成功抓取。整合以上各部分构建了自主抓取系统。完成了抓取分类模型的评估以及机器人抓取系统的实验研究。本文对获得的三个抓取分类模型在数据集上进行检测对比分析,检测结果表明多模态抓取分类器比单一模态的抓取分类器分类性能更优。最后,基于多模态分类器搭建了自主抓取系统,系统中Kinect实现场景数据的获取,工作站完成最优抓取参数的推理,Baxter机器人实现目标物体的抓取等。实验表明,本文的抓取系统是可行且稳定的。
[Abstract]:Autonomous robot capture is an important problem in the field of robot research. In the past few decades, researchers have explored from different angles, involving the structure design of gripper, grab planning control, multi-sensor fusion, etc. From the development of analytical method to data-driven method, it can be seen that the development of capture research is accompanied by new technology and new methods. At present, big data, the era of deep learning has come, combining new data scale with new research methods can produce incalculable results. The generation of deep learning has brought a lot of inspiration to other research fields. Therefore, this paper applies depth learning technology to autonomous robot grab problem, establishes grab classification model, and realizes the construction of grab detection system. A grab classifier is established based on deep learning. In this paper, the data terminal fusion method is used to distinguish the differences between data modes, and a grab classification model based on multi-modal convolution neural network is established. Firstly, two grab classifiers are trained by convolution neural network on image data and depth data, and then two classifiers are used as feature extractors to extract image grab feature and depth grab feature of sample rectangle, respectively. A top-level classification model is constructed by combining the two grab features. Combined with two feature extractors and the top-level classification network, a grab classifier based on multi-modal depth learning algorithm is constructed in this paper. An autonomous grab system is constructed. First, using median filter to fill in the missing Kinect depth image, calibrating Kinect to obtain the point cloud and aligned image data in the robot coordinate system, which realizes the acquisition of scene sensing data. Secondly, the object segmentation is realized by extracting the desktop based on the random sampling consistent algorithm, the main direction of the object is detected by Sobel operator, and the five-dimensional search space is grabbed discretely, which completes the sampling of the grab rectangle. Then, outliers are removed from the point cloud in a given scene and mean value filtering is carried out to prepare the point cloud normal vector estimation, and a set of optimal grab parameters are obtained for the detection of the optimal grab rectangle mapping, which completes the acquisition of the grab parameters. Finally, the inverse kinematics solution service is used to calculate the joint angle position of the Baxter robot, and the Baxter is driven to the joint angle position to achieve the successful capture of the target object. Integrate the above parts to build an autonomous grab system. The evaluation of grab classification model and the experimental study of robot grab system are completed. In this paper, the three grab classification models are compared with each other on the dataset. The results show that the multi-mode grab classifier is better than the single-mode grab classifier. Finally, an autonomous grab system based on multi-modal classifier is built. Kinect realizes scene data acquisition in the system, and a Baxter robot accomplishes the inference of optimal grab parameters. Experiments show that the grab system is feasible and stable.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41;TP242

【参考文献】