基于IRT的锚题设计下同时估计和分离估计的比较
发布时间:2018-11-11 14:41
【摘要】:测验等值是通过对考核同一种心理品质的多个测验形式作出测量分数系统的转换,进而使得这些不同测验形式的测验分数之间具有可比性。当代许多大规模的考试都是采用同一测验的不同形式,那么我们如何对使用不同试卷的考生的分数进行比较,以及在教育测量上,随着学生年级的上升,如何刻画出学生本身纵向的能力发展状况和趋势,如何比较不同年龄段或者年级段或者年度之间的成绩等问题的解决都依赖于等值技术,那么就必然涉及到该选择哪种等值方式,因此等值方式之间的比较显得尤为重要。 在项目反应理论框架下,等值有两种方式,一种称为分离估计(separatecalibration),即,首先分别估计各个测验的项目参数,由于在IRT模型中,量尺的位置的不确定性,不同测验上项目参数之间是线性相关的,那么就要进行线性转换以至于所有的项目参数都在相同的尺度上,主要转换方式有:均值均值法,均值标准差法,Haebara法,Stocking-Lord法。另一种称为同时估计,即项目参数一次性用软件估计出来的方法。 在以往的研究中,关于同时估计与分离估计的研究有如下的不足和问题:1,关于同时估计与分离估计比较的指标不统一,指标的不同很可能会影响研究的结果。2,研究中仅仅从误差值的大小来判断孰优孰劣,这并不能从统计上说明哪种方法的显著优势。 针对以上不足,在本研究中,1,采用绝对偏差和偏差来分别考察研究中的随机误差和系统误差,以保证结果的精确性。2,,使用统计检验的办法来考察同时估计与分离估计在统计上是否有显著性差异。根据本研究的结果,我们得出,1,在a参数上同时估计的等值效果要显著优于分离估计的等值效果,在b参数上,从偏差的角度看,HA法的等值效果要显著优于其他4种,即分离估计显著优于同时估计,从绝对偏差的角度看,同时估计、SL和HA法的等值效果并无显著性差异。2,样本量越大,等值的效果越好。
[Abstract]:Test equivalence is a system transformation of measuring scores in multiple test forms that assess the same psychological quality, thus making the scores of these different test forms comparable. Many of today's large-scale tests take different forms of the same test, so how do we compare the scores of candidates who use different papers, and in terms of educational measurement, as students' grades rise, How to depict the development and trend of students' own vertical ability, how to compare the achievement of different ages, grades or years depends on the equivalent technology. So it is necessary to choose which way of equivalence should be chosen, so the comparison between the modes of equivalence is particularly important. In the framework of item response theory, there are two ways of equivalence. One is called separate estimation of (separatecalibration), that is, the item parameters of each test are estimated separately at first, because of the uncertainty of the position of the ruler in the IRT model. There is a linear correlation between item parameters in different tests, so that all item parameters are in the same scale. The main conversion methods are mean method, mean standard deviation method, Haebara method, Stocking-Lord method. Another method is called simultaneous estimation, that is, the project parameters are estimated by software at one time. In previous studies, the research on simultaneous estimation and separation estimation has the following shortcomings and problems: 1. With regard to the disunity of the indicators of simultaneous estimation and separation estimation, the difference of indicators may well affect the results of the study. In the study, only the magnitude of the error value is used to judge which method is the best, which does not show the significant advantage of the method. In order to ensure the accuracy of the results, the absolute deviation and the deviation are used to investigate the random error and the systematic error respectively. The statistical test is used to investigate whether there is significant statistical difference between simultaneous estimation and separation estimation. According to the results of this study, we conclude that: 1, the equivalent effect of simultaneous estimation on a parameter is significantly better than that of separation estimation, and in terms of b parameter, the equivalent effect of HA method is significantly better than that of the other four methods. That is, the separation estimation is significantly better than the simultaneous estimation, and from the angle of absolute deviation, the equivalent effect of SL and HA method is not significantly different. 2. The larger the sample size, the better the equivalent effect.
【学位授予单位】:江西师范大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:B841
本文编号:2325151
[Abstract]:Test equivalence is a system transformation of measuring scores in multiple test forms that assess the same psychological quality, thus making the scores of these different test forms comparable. Many of today's large-scale tests take different forms of the same test, so how do we compare the scores of candidates who use different papers, and in terms of educational measurement, as students' grades rise, How to depict the development and trend of students' own vertical ability, how to compare the achievement of different ages, grades or years depends on the equivalent technology. So it is necessary to choose which way of equivalence should be chosen, so the comparison between the modes of equivalence is particularly important. In the framework of item response theory, there are two ways of equivalence. One is called separate estimation of (separatecalibration), that is, the item parameters of each test are estimated separately at first, because of the uncertainty of the position of the ruler in the IRT model. There is a linear correlation between item parameters in different tests, so that all item parameters are in the same scale. The main conversion methods are mean method, mean standard deviation method, Haebara method, Stocking-Lord method. Another method is called simultaneous estimation, that is, the project parameters are estimated by software at one time. In previous studies, the research on simultaneous estimation and separation estimation has the following shortcomings and problems: 1. With regard to the disunity of the indicators of simultaneous estimation and separation estimation, the difference of indicators may well affect the results of the study. In the study, only the magnitude of the error value is used to judge which method is the best, which does not show the significant advantage of the method. In order to ensure the accuracy of the results, the absolute deviation and the deviation are used to investigate the random error and the systematic error respectively. The statistical test is used to investigate whether there is significant statistical difference between simultaneous estimation and separation estimation. According to the results of this study, we conclude that: 1, the equivalent effect of simultaneous estimation on a parameter is significantly better than that of separation estimation, and in terms of b parameter, the equivalent effect of HA method is significantly better than that of the other four methods. That is, the separation estimation is significantly better than the simultaneous estimation, and from the angle of absolute deviation, the equivalent effect of SL and HA method is not significantly different. 2. The larger the sample size, the better the equivalent effect.
【学位授予单位】:江西师范大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:B841
【参考文献】
相关期刊论文 前5条
1 焦丽亚;;基于IRT的共同题非等组设计中五种项目参数等值方法的比较研究[J];考试研究;2009年02期
2 马洪超;;考生样本量对项目反应理论(IRT)等值稳定性的影响[J];考试研究;2011年02期
3 焦丽亚;;测验等值研究综述[J];中国考试(研究版);2009年06期
4 马洪超;;锚题参数特征对IRT真分数等值的影响[J];中国考试;2010年08期
5 王菲;任杰;张泉慧;曹文静;;等级记分模型下几种等值方法的比较研究[J];中国考试;2013年06期
本文编号:2325151
本文链接:https://www.wllwen.com/shekelunwen/xinlixingwei/2325151.html