当前位置:主页 > 科技论文 > 软件论文 >

基于深度学习的动词检错算法的研究及其设计

发布时间:2018-04-25 16:22

  本文选题:英文批改 + 规则语法 ; 参考:《中国科学技术大学》2017年硕士论文


【摘要】:英文作文自动批阅,是近年来快速发展的一个领域。它逐渐替代了教师人工批阅,成为缓解英语教师教学负担的重要工具。同时,通过文献调研可知,动词的一致性错误和动词时态错误是英文作文中出错率最高的两类语法错误。所以,对于动词错误的检测结果能反映一个自动批阅系统的实用性和有效性。现阶段,主流的自动批阅系统有冰果、句酷等。在经过调研后,这些系统对于动词的一致性错误和动词时态错误的检测结果不满足学习者的要求。本文针对这个情况,研究出一种基于深度学习的动词语法检错算法。通过研究和分析发现,动词一致性错误和动词时态错误的出现与上下文中出现的单词和短语相关性较大,而深度学习模型LSTM(Long Short-Term Memory)能够在训练时有效的保留上下文中有效的信息,所以本文决定采用LSTM作为训练模型对于已标注的训练语料进行建模。同时,如何将英文作文中的文字信息转换为数值以供后续计算,也是自动批阅中重要的一步,主流工具中大多使用词袋模型,即按照每个单词在词典中的顺序,对每个单词进行编码。这种编码方式虽然简单易用,但是既会导致向量丢失文字的顺序信息,也容易出现维度灾难。所以本文采用词嵌入模型对于文字进行编码,将文本信息按照顺序映射到一个低维度的向量空间,这样既不丢失文字的位置信息,也避免了维度灾难。之后,本文收集了一定的语料样本,将本文算法与句酷和冰果进行对比验证,验证结果表明本文算法在动词检错上的优越性。本文针对主流的自动批阅系统在动词语法检错上的效果较差的情况,提出了基于深度学习的动词语法检错,算法整体的正确率、召回率和F1度都优于目前主流的自动批阅系统。
[Abstract]:The automatic marking of English composition is a field of rapid development in recent years. It gradually replaces teachers' manual reading and becomes an important tool to ease the burden of English teachers' teaching. At the same time, through literature research, we can see that verb consistency error and verb tense error are the two kinds of grammatical errors with the highest error rate in English composition. Therefore, the result of verb error detection can reflect the practicability and validity of an automatic marking system. At this stage, the mainstream automatic marking system has ice fruit, sentence cool and so on. After investigation, the results of the detection of verb consistency errors and verb tense errors do not meet the learners' requirements. In this paper, a verb grammar error detection algorithm based on deep learning is proposed. Through the research and analysis, it is found that the occurrence of verb consistency errors and verb tense errors is highly correlated with the words and phrases that appear in the context, while the in-depth learning model (LSTM(Long Short-Term memory) can effectively retain the valid information in the context during training. So this paper uses LSTM as the training model to model the tagged training corpus. At the same time, how to convert the text information from English composition to numerical value for subsequent calculation is also an important step in automatic marking. Most mainstream tools use the word bag model, that is, according to the order of each word in the dictionary. Encode each word. This coding method is simple and easy to use, but it can cause vector to lose the sequence information of text, and it is prone to dimensionality disaster. So this paper uses the word embedding model to encode the text and map the text information to a low-dimensional vector space according to the sequence so that the location information of the text is not lost and the dimensionality disaster is avoided. After that, we collect some corpus samples, compare the algorithm with sentence cool and ice fruit, and verify the superiority of the algorithm in verb error detection. In view of the poor effect of the mainstream automatic marking system on verb grammar error checking, this paper puts forward that the verb grammar error detection based on in-depth learning, the overall correct rate, recall rate and F1 degree of the algorithm are all superior to those of the current mainstream automatic marking system.
【学位授予单位】:中国科学技术大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1

【参考文献】

相关期刊论文 前2条

1 张海粟;马大明;邓智龙;;基于维基百科的语义知识库及其构建方法研究[J];计算机应用研究;2011年08期

2 冯志伟;;自然语言处理的新发展与语言文字规范化[J];现代语文;2006年04期

相关硕士学位论文 前1条

1 刘雷;英语作文智能批改中语法检查的研究与实现[D];北京邮电大学;2013年



本文编号:1802084

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1802084.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户4a331***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com