大学英语四级写作自动评分中的隐式篇章关系相关性的评定
本文关键词: 大学英语四级写作 隐式篇章关系 相关性 潜伏语义分析 奇异值降解 出处:《湖北工业大学》2017年硕士论文 论文类型:学位论文
【摘要】:合理的写作自动评分系统应包括语言质量评分和内容质量评分两个方面。区别于作文的语言质量评分,作文内容评分更复杂,需要以语篇为框架分析语块(单词、短语、小句)间的有机联系。大学英语四级写作的评分准则是以内容为主语言为辅的总体评分准则,即作文内容是衡量作文质量的主要标尺。而文本内容就是隐式篇章关系,这是本研究的选题依据之一。自动评分系统的构想如下:计算机计算出待评分作文与已评分作文在隐式篇章关系上的相关性,再参考已评分作文的评分数据,给待评分作文自动评分。判断隐式篇章关系的相关性在整个自动评分系统中处于核心地位,也是本研究的论点。研究隐式篇章关系有两大模型,分别是传统的向量空间模型和潜伏语义分析模型。前者视除停用词外的所有词项为特征向量,并以这些特征向量表征文本。该方法的弊端在于无法解决一词多义及多词同义问题;后者也是从语篇的最小组成成分词汇出发来分析隐式篇章关系,但它辅以语言哲学为视角来探究语言习得乃至知识习得中的相似性及概括性问题,即柏拉图的困惑:人类如何凭借有限的线索信息习得大量知识?本研究的理论依据是后者。潜伏语义分析理论认为,文本中的词汇不是孤立存在的,它们通过某种潜在的语义网络紧密相连。但不是所有的词汇都与该潜在的语义网络直接相关,即我们需要提取与该潜在的语义网络直接相关的特征词汇。特征词项抽取过程分为两步:粗略提取特征词项即文本的预处理,包括完成大小写折叠、去除停用词及词根归一化;调用数学处理软件matlab中的奇异值降解功能函数再次提取特征词项,具体做法分为以下几个步骤:首先构建一个粗提取的特征词项x文本矩阵;然后进行奇异值降解,该函数可将原始矩阵表征为三个小矩阵的乘积;再观察分解后的三个小矩阵的每列的数值,依据具体情况选择前k列数值;调用奇异值降解的反向函数,将三个列数缩减为k的小矩阵相乘重构为一个新矩阵。新矩阵屏蔽了大量噪声信息,保留了原始矩阵中的重要信息,实现了真正意义上地特征抽取。计算机即是以该方法模拟人类识别相似性和实现概括性。这也是本文的理论核心。本文首先以一个经典的精简案例展示了潜伏语义分析理论在评定隐式篇章关系相关性中的重要作用。其次,我们以湖北工业大学非英语专业的本科生四级写作文本作为数据,进行了深入的分析,得出结论:隐式篇章关系的相关系数与人工评分的数据结果的确存在一定的联系。
[Abstract]:A reasonable automatic writing scoring system should include two aspects: language quality score and content quality score. Different from the language quality score of composition, the content score of composition is more complicated, and the text should be used as the frame to analyze the chunks (words, phrases, phrases). The score criterion of CET-4 writing is the general scoring criterion supplemented by content-oriented language, that is, the composition content is the main measure of composition quality, and the text content is the implicit text relation. This is one of the basis of this study. The conception of automatic scoring system is as follows: the computer calculates the correlation between the graded composition and the graded composition in the implicit text relationship, and then refers to the score data of the graded composition. To judge the relevance of implicit text relation is the core of the whole automatic scoring system, which is also the argument of this study. There are two models to study implicit text relationship. They are the traditional vector space model and the latent semantic analysis model. The disadvantage of this method is that it can not solve the problem of polysemy and multi-word synonym, which is also based on the smallest component vocabulary of the text to analyze the implicit text relationship. But from the perspective of linguistic philosophy, it explores the similarity and generality in language acquisition and knowledge acquisition, that is, Plato's puzzlement: how can human beings acquire a large amount of knowledge with limited clue information? The theoretical basis of this study is the latter. The theory of latent semantic analysis holds that the vocabulary in the text does not exist in isolation. They are closely connected through a potential semantic network, but not all words are directly related to that underlying semantic network. In other words, we need to extract the feature words which are directly related to the potential semantic network. The extraction process of feature items is divided into two steps: rough extraction of feature items, namely, preprocessing of text, including completion of case-and-case folding, removal of stop words and root normalization; The singular value degradation function in the mathematical processing software matlab is used to extract the feature terms again. The specific steps are as follows: firstly, a coarse extracted X text matrix of feature terms is constructed; then singular value degradation is carried out. The function can represent the original matrix as the product of three small matrices, observe the values of each column of the three small matrices after decomposition, select the first k column values according to the specific conditions, call the inverse function of singular value degradation, A new matrix is reconstructed by multiplying three small matrices whose number of columns is reduced to k. The new matrix shields a lot of noise information and retains the important information in the original matrix. The computer is used to simulate the similarity and generality of human recognition. This is also the core of this paper. Firstly, this paper shows the latent language with a classic reduced case. The important role of semantic analysis theory in assessing the relevance of implicit text relations. Secondly, Taking the CET-4 writing text of non-English majors in Hubei University of Technology as the data, we make an in-depth analysis and draw a conclusion that the correlation coefficient of implicit text relationship is really related to the data result of artificial score.
【学位授予单位】:湖北工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:H319.3
【参考文献】
相关期刊论文 前10条
1 雷晓东;;英语作文自动评价系统技术的国内研究与应用[J];科技视界;2015年35期
2 张双祥;;大学英语写作教学中在线写作自动评价系统应用研究[J];当代教育理论与实践;2014年11期
3 严为绒;洪宇;朱珊珊;车婷婷;姚建民;朱巧明;;基于语义场景的隐式篇章关系检测方法[J];山东大学学报(理学版);2014年11期
4 曾华人;牛洁珍;陈周云;;英语学习者应用自动作文评分系统的个案研究[J];考试与评价(大学英语教研版);2014年04期
5 唐锦兰;;探究写作自动评价系统在英语教学中的应用模式[J];外语教学理论与实践;2014年01期
6 刘卫忠;余力;;基于链语法的英语作文自动评分研究[J];电脑知识与技术;2014年02期
7 张牧宇;宋原;秦兵;刘挺;;中文篇章级句间语义关系识别[J];中文信息学报;2013年06期
8 江进林;;近五十年来自动评分研究综述——兼论中国学生英译汉机器评分系统的新探索[J];现代教育技术;2013年06期
9 何旭良;;句酷批改网英语作文评分的信度和效度研究[J];现代教育技术;2013年05期
10 杨玲;;作文自动评价系统在高水平学生英语写作学习中的应用[J];现代教育技术;2013年05期
相关博士学位论文 前1条
1 李金辉;使用潜伏语义分析理论研究计算机改中国学生英语作文[D];广东外语外贸大学;2009年
,本文编号:1508922
本文链接:https://www.wllwen.com/shoufeilunwen/zaizhiboshi/1508922.html