当前位置:主页 > 教育论文 > 教师素养论文 >

评分过程与评分员信念—评分员差异的内在因素研究

发布时间:2018-05-31 00:30

  本文选题:评分员信念 + 评分过程 ; 参考:《广东外语外贸大学》2009年博士论文


【摘要】: 主观性考试中,评分员差异是影响考试信效度和公平性的最为重要的因素之一。和大多数利用统计方法描述评分员误差的研究不同,本研究从评分员自身入手,深入探讨他们在评分中产生差异的内在原因,并期望通过对较好和较差评分员的对比研究,找到评分员能够准确一致地进行评分的内在决定因素,以期对改进评分员培训和评分流程以及提高考试信效度提供实证证据及有效反馈。本研究的背景是全国大学英语四级考试的写作评分,所有受试均为参加过四级考试正式评分环节的评分员,评分标准和作文题目均来自真实的四级考试。实证研究包括三个数据收集环节:独立评分,有声思维和开放式半结构式访谈。在利用多层面Rasch模型对评分员的评分情况进行统计分析的基础上,作者根据受试的评分与专家评分的吻合程度将评分员分为较好和较差两组。利用受试在有声思维时产出的口头报告以及一对一的访谈记录,,作者对比了两组评分员在其评分思维过程以及评分信念上的异同。 分析的结果揭示了较好和较差两组评分员在很多方面都存在差异。首先,在评分过程中,不同的评分员倾向于关注不同的文章特征。好评分员关注的语言特点更为全面,包括文章内容,整体结构安排,语篇特征,句子结构,词汇等;而较差评分员更多地关注一些孤立的、零散的语言特征,比如词汇的多样性,句子的长短和复杂程度,连接词的使用等。其次,两组评分员对所关注的信息有不同的处理方式。好评分员更善于将语言错误分类,总结信息,进行推断,并且能更加有效地自己的评分过程和评分准确性进行自我监控。此外,不同评分员的评分信念也不同。最主要的区别是他们对于评分对象和评分标准的认识和理解。好评分员与较差评分员相比,对写作能力的定义更加清楚、全面。相应地,他们对文章中反映写作能力的语言特征的定义也更为全面,系统化,并有系统、一致的标准来区分这些特征的权重。好评分员对评分标准中抽象描述语的理解和操作化定义包括了更为全面的语言特征。研究结果还表明好评分员之间的评分信念更为一致,与专家的期望和考试大纲中的构念定义也更为接近。 通过比较,作者尝试将评分员的评分结果与他们内在的思维过程与信念联系起来,并发现评分员的内在差异,尤其是他们在信念上的差异,是他们评分行为上差异的根源。这对于评分员培训的启示是:培训的目的和重点在于统一评分员对于评分对象和评分工具以及对与自身责任与任务等方面的理解和认识,只有在内在信念上达成一致,形成较为统一的认识,评分员的评分才能准确反映考试开发者和管理者的意图,体现考试所要测量的潜在能力,在某种意义上形成一个评价共同体。
[Abstract]:One of the most important factors affecting reliability, validity and fairness in subjective tests is the difference of raters. Different from most studies that describe the error of graders by statistical methods, this study starts with the raters themselves, and probes into the internal causes of their differences in scoring, and looks forward to a comparative study of better and worse graders. To find out the intrinsic determinants of grading, to provide empirical evidence and effective feedback to improve the training and scoring process of the graders and to improve the reliability and validity of the test. The background of this study is the writing score of CET-4. All the subjects are all graders who have taken part in the formal grading process of CET-4. The scoring criteria and composition questions are all from the real CET-4 test. The empirical study consists of three data collection sections: independent score, sound thinking and open semi-structured interviews. Based on the statistical analysis of the grader's score by using the multi-level Rasch model, the author divides the grader into better and worse groups according to the degree of agreement between the score and the expert score. Using oral reports and one-to-one interview records, the authors compared the differences and similarities between the two groups in the process of scoring thinking and scoring beliefs. The results of the analysis revealed that there were differences between the better and the worse groups of graders in many ways. First, different raters tend to focus on different characteristics of the article during the scoring process. The good graders pay more attention to the language characteristics, including the content of the article, the overall structure arrangement, the text features, sentence structure, vocabulary and so on, while the poor graders pay more attention to some isolated and scattered language features. For example, the variety of words, the length and complexity of sentences, the use of conjunction words and so on. Second, the two groups of raters had different approaches to the information they were concerned with. Good graders are better at classifying language errors, summarizing information, inferring, and more effectively monitoring their own grading process and scoring accuracy. In addition, different raters have different scoring beliefs. The main difference is their knowledge and understanding of rating objects and criteria. Good graders have a clearer and more comprehensive definition of writing ability than poor graders. Accordingly, their definitions of linguistic features that reflect writing competence are more comprehensive, systematic, and systematic, with consistent criteria to distinguish the weight of these features. The understanding and operational definition of abstract descriptors in the scoring criteria by good graders includes more comprehensive language features. The results also show that the scoring beliefs of the good graders are more consistent with the expectations of experts and the definition of constitution in the exam syllabus. Through comparison, the author tries to link the score result of the grader with their inner thinking process and belief, and finds out that the internal difference of the grader, especially the difference in their belief, is the root of the difference in their scoring behavior. The inspiration for the training of raters is that the purpose and emphasis of the training is to unify their understanding and understanding of the scoring objects and scoring tools, as well as their own responsibilities and tasks, and only to reach agreement on their internal beliefs. In order to form a unified understanding, the scoring system can accurately reflect the intention of the test developer and administrator, reflect the potential ability of the test to be measured, and form an evaluation community in a certain sense.
【学位授予单位】:广东外语外贸大学
【学位级别】:博士
【学位授予年份】:2009
【分类号】:G424.74

【引证文献】

相关期刊论文 前3条

1 徐鹰;;大学英语写作能力构念的操作定义研究[J];考试与评价(大学英语教研版);2012年06期

2 李航;;基于概化理论和多层面Rasch模型的CET-6作文评分信度研究[J];外语与外语教学;2011年05期

3 徐鹰;;不同性别评分人差异的实证研究[J];外语测试与教学;2013年03期

相关博士学位论文 前1条

1 李航;评分员与评分量表间的交互作用对EFL作文评分结果与过程的影响[D];浙江大学;2012年



本文编号:1957512

资料下载
论文发表

本文链接:https://www.wllwen.com/jiaoyulunwen/jsxd/1957512.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户17836***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com