平面几何图像中实体信息的抽取与表示
发布时间:2018-12-14 11:03
【摘要】:随着现代教育技术和人工智能技术的迅速发展,对学科题目机器解答的研究再一次变得火热起来。相较于其他学科,数学是一门以数量和关系为基础的学科,研究数学题目的机器解答是研究机器解答技术的一个很好的切入点。本文为了帮助实现平面几何题目的机器解答,对题目平面几何图像中实体信息的抽取和表示问题进行了研究。针对几何实体检测过程中遇到的图形重叠结构、虚线等情况,根据平面几何图像的特点,有针对地测试了实体检测的相关算法,并提出了多种后期优化处理策略,实现了较为鲁棒的实体检测流程和较高的检测精度。并随后从检测结果中抽取出了几何实体的有用信息,这些信息既可以通过一致化表示作为结果直接展示,帮助学生理解并自主探索题目的解答,又可以和文本信息整合,得到题目更为完整的信息,帮助实现平面几何题目的机器解答。本文研究内容主要包括两个部分。第一个部分是几何实体的检测部分,主要包括图像预处理、几何实体检测和检测优化三个步骤。通过实验分析与比较,本文选取自适应高斯核二值化算法对平面几何图像进行二值化,并对二值化后的图像进行8-连通域标记,以分割出相应的平面几何图形区域和标识字符区域。对于其中的平面几何图形区域,首先利用RANSAC圆检测方法对圆实体进行检测,并在检测后消除图像中圆实体的相关像素点,然后用渐进概率霍夫变换进行线段实体的检测,最后再通过大量的后期优化处理以保证更为鲁棒的检测效果,包括连通域标记优化、虚线的检测与恢复等,得到所有几何实体基于坐标系统的原始信息。第二部分是几何实体信息的抽取与表示部分,主要包括标识字符的OCR、实体信息抽取、实体信息表示三个步骤。其中对标识字符区域的OCR过程使用BP神经网络进行训练识别,并把对应的标识字符结果整合到离当前字符区域中心距离最近的点实体的属性信息中。同时,总结了平面几何图像中有效的实体信息类型,并给出了基于坐标系统的对应抽取方法。最后,根据所抽取到的实体信息使用谓词扩展表示形式、方程系统表示形式、自然语言表示形式三种方式进行一致化表示。本文最终形成了一个鲁棒的几何实体信息抽取与表示的统一框架,并在收集的图像数据集上进行了大量实验,对该框架的合理性与鲁棒性进行了验证。
[Abstract]:With the rapid development of modern educational technology and artificial intelligence technology, the research on machine solution of subject question has become hot again. Compared with other disciplines, mathematics is a subject based on quantity and relationship. In this paper, the extraction and representation of solid information in plane geometry images are studied in order to help realize the machine solution of plane geometry problems. Aiming at the overlapping structure and dashed line of geometric entity detection, and according to the characteristics of plane geometry image, this paper tests the relevant algorithms of entity detection, and puts forward a variety of post-optimization processing strategies. A more robust entity detection process and high detection accuracy are realized. And then the useful information of geometric entities is extracted from the detection results. This information can be displayed directly as a result by uniform representation, which can help students understand and explore the solution of the problem independently, and can integrate with the text information. Get more complete information to help realize the machine solution of plane geometry problem. This paper mainly includes two parts. The first part consists of three steps: image preprocessing, geometric entity detection and detection optimization. Through experimental analysis and comparison, this paper selects adaptive Gao Si kernel binarization algorithm to binary plane geometry image, and marks the binary image with 8-connected domain. In order to segment the corresponding plane geometry and identification character areas. For the plane geometry region, the circular entity is first detected by RANSAC circle detection method, and the pixels of the circular entity are eliminated after the detection, and then the line segment entity is detected by the progressive probability Hough transform. Finally, through a large number of post-optimization processing to ensure a more robust detection effect, including connected domain label optimization, dashed line detection and recovery, all geometric entities based on the original coordinate system information is obtained. The second part is the extraction and representation of geometric entity information, which consists of three steps: OCR, entity information extraction and entity information representation. The OCR process of identifying character region is trained and recognized by BP neural network, and the corresponding result of identification character is integrated into the attribute information of the point entity nearest to the center of the current character region. At the same time, the effective entity information types in plane geometry image are summarized, and the corresponding extraction method based on coordinate system is given. Finally, according to the extracted entity information, the extended predicate representation, the equation system representation and the natural language representation are used for consistent representation. In this paper, a robust unified framework for extracting and representing geometric entity information is formed, and a large number of experiments are carried out on the collected image data sets, and the rationality and robustness of the framework are verified.
【学位授予单位】:华中师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.41;O182.1
本文编号:2378507
[Abstract]:With the rapid development of modern educational technology and artificial intelligence technology, the research on machine solution of subject question has become hot again. Compared with other disciplines, mathematics is a subject based on quantity and relationship. In this paper, the extraction and representation of solid information in plane geometry images are studied in order to help realize the machine solution of plane geometry problems. Aiming at the overlapping structure and dashed line of geometric entity detection, and according to the characteristics of plane geometry image, this paper tests the relevant algorithms of entity detection, and puts forward a variety of post-optimization processing strategies. A more robust entity detection process and high detection accuracy are realized. And then the useful information of geometric entities is extracted from the detection results. This information can be displayed directly as a result by uniform representation, which can help students understand and explore the solution of the problem independently, and can integrate with the text information. Get more complete information to help realize the machine solution of plane geometry problem. This paper mainly includes two parts. The first part consists of three steps: image preprocessing, geometric entity detection and detection optimization. Through experimental analysis and comparison, this paper selects adaptive Gao Si kernel binarization algorithm to binary plane geometry image, and marks the binary image with 8-connected domain. In order to segment the corresponding plane geometry and identification character areas. For the plane geometry region, the circular entity is first detected by RANSAC circle detection method, and the pixels of the circular entity are eliminated after the detection, and then the line segment entity is detected by the progressive probability Hough transform. Finally, through a large number of post-optimization processing to ensure a more robust detection effect, including connected domain label optimization, dashed line detection and recovery, all geometric entities based on the original coordinate system information is obtained. The second part is the extraction and representation of geometric entity information, which consists of three steps: OCR, entity information extraction and entity information representation. The OCR process of identifying character region is trained and recognized by BP neural network, and the corresponding result of identification character is integrated into the attribute information of the point entity nearest to the center of the current character region. At the same time, the effective entity information types in plane geometry image are summarized, and the corresponding extraction method based on coordinate system is given. Finally, according to the extracted entity information, the extended predicate representation, the equation system representation and the natural language representation are used for consistent representation. In this paper, a robust unified framework for extracting and representing geometric entity information is formed, and a large number of experiments are carried out on the collected image data sets, and the rationality and robustness of the framework are verified.
【学位授予单位】:华中师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.41;O182.1
【参考文献】
相关期刊论文 前2条
1 张景中,杨路,侯晓荣;几何定理机器证明的WE完全方法[J];系统科学与数学;1995年03期
2 吴文俊;初等几何判定问题与机械化证明[J];中国科学;1977年06期
相关博士学位论文 前3条
1 葛强;限制条件下的几何自动推理及应用研究[D];华中师范大学;2011年
2 叶征;平面几何的动态可视证明研究[D];浙江大学;2010年
3 江建国;iGeo:智能几何软件的定理证明器[D];中国科学院研究生院(成都计算机应用研究所);2006年
相关硕士学位论文 前4条
1 温少营;动态几何图形匹配算法研究[D];辽宁师范大学;2013年
2 于文涛;平面几何图形自动识别研究[D];广州大学;2011年
3 常新立;手绘几何图形的识别研究[D];武汉理工大学;2009年
4 孙华丽;基于Web的动态几何作图系统研究[D];华中师范大学;2008年
,本文编号:2378507
本文链接:https://www.wllwen.com/kejilunwen/yysx/2378507.html