可修改答案的CD-CAT的研究

发布时间：2018-03-26 21:05

本文选题：认知诊断计算机化自适应　切入点：可修改答案　出处：《江西师范大学》2016年硕士论文

【摘要】：和以往的纸笔测验(Paper And Pencil Based Test,PP)相比计算机化自适应测验(Computerized Adaptive Testing,CAT)根据被试的作答反应自适应地选择题目,CAT既减少了测验的长度,并且显著提高了测验的精度。认知诊断计算机化自适应测验(Cognitive Diagnostic Computerized Adaptive Testing,CD-CAT)是认知诊断理论和计算机化自适应测验的理论相结合的产物,它不仅具有CAT的特点,同时还具有诊断的功能,CD-CAT旨在对个体的认知过程、加工技能或知识结构进行诊断,从而为后续的补救性教学提供有效的借鉴,它更强调考察被试内部的加工过程。然而,目前绝大多数CAT和CD-CAT不允许被试修改答案,研究者主要担心修改答案会降低它们的有效性。允许修改答案符合被试一贯的测验习惯,修改之后的分数更能反映被试真实的水平,从而能够进一步促进CAT和CD-CAT在实际中的应用。已有的研究主要从三个方面提出了可修改答案CAT的控制方法:一是测验设计;二是改进选题策略;三是建构模型。Han(2013)提出的题目口袋法(Item Pocket,IP)是目前具有较好应用前景的可修改答案的CAT(Reviewable CAT)控制方法,IP法的思路是计算机为被试提供了一种题目口袋选择,即允许被试作答过程中,随时可以把待修改的题目或者暂时想跳过的题目放入IP中,然后接着作答下一个题目,放入IP内的题目不参与当前能力估计。当IP容量已满后,被试需要替换一题才能再次放入。IP法的缺点是其容量不容易控制,容量过大将导致较大的估计误差。本文在IP方法的基础之上加以改进,提出了新计分的题目口袋法(Modified IP,MIP),即对放入IP内修改的题目重新计分。与IP法相比,Stocking(1997)的设计对修改行为有较多的限制,Stocking设计1允许考生在答完所有题目后,返回修改固定数量的题目,修改后的作答并没有体现在自适应选题中;Stocking设计2是允许被试修改单独限时题目单元内的答案。在测验过程中将题目按照先后顺序划分为固定长度的题目单元,题目单元长度根据实际需要而规定。被试可以在单元内对题目进行检查并修改,计算机根据被试当前单元的作答来选择下一个单元,提交答案后的单元不允许再次返回修改。与Stocking设计一相比,被试在Stocking设计二中对测验有更多的掌控,不管测验被分割为多少个小单元,被试还可以修改所有的题目,但设计一中只能修改固定数量的题目。另外设计二中被试修改单元内答案会影响下一个单元的选择,计算机会根据被试修改后的能力估计值选择下一个单元。而设计一中修改题目之后的能力估计值并没有体现在自适应选题上。从这点来看设计二要比设计一更符合CAT的规则。以上几种修改答案的设计方法在CAT的应用中各有优缺点(IP、MIP、Stocking设计1和Stocking设计2),CD-CAT是由CAT的进一步发展而来,但两者之间又有很大的区别,为了验证上述方法在可修改答案的CD-CAT(Reviewable Cognitive Diagnostic Computerized Adaptive Testing,RCD-CAT)的效果,模拟研究分别采用了DINA和R-RUM模型,假设被试知识状态和题库都服从均匀分布,模拟生成5000个被试,300容量的题库,知识状态的估计采用极大似然估计方法(Maximum Likelihood Estimation,MLE),属性考察个数分别是5个和7个,选题策略包括:Kullback Leibler(KL)、Posterior Weighted KL(PWKL)、Hybrid KL(HKL)和Modified Posterior-Weighted KL(MPWKL),测验长度分为10题和20题。通过通过蒙特卡洛模拟研究发现:第一,与传统不修改答案的CD-CAT相比,本文提到的RCD-CAT方法,可以在不损失诊断精度和题库曝光率的基础上,允许学生修改答案,这符合学生一般作答行为习惯,减少学生做答的负担及焦虑程度,更易被大众接受。第二,当研究采用了DINA模型的时候,MIP法和IP法的效果没有太大区别,结果表明MIP的效果依赖于被试的作答概率分布。第三,在所有的研究条件下,Stocking设计的模式判准率要高于其他三种方法,其中,Stocking设计2的模式判准率略微优于Stocking设计1,结果表明Stocking设计在RCD-CAT的应用中具有良好的前景。总之,RCD-CAT不仅符合被试一贯的测验习惯,并且通过修改答案有助于提高诊断的准确率,进一步而言本研究有助于进一步为RCD-CAT和RCAT的研究和实践应用提供理论和方法支持。
[Abstract]:And the paper and pencil test (Paper And Pencil Based Test, PP) compared with the computerized adaptive test (Computerized Adaptive, Testing, CAT) according to the responses of subjects to choose the adaptive problem, CAT can not only reduce the test length, and improve the precision of the test. The computer adaptive test (Cognitive Diagnostic Computerized cognitive diagnosis Adaptive Testing, CD-CAT) is a product of the theory of cognitive diagnosis and computerized adaptive test combined with the theory, it not only has the characteristics of CAT, but also has the function of diagnosis, CD-CAT to the cognitive process of individual diagnosis, processing skills or knowledge structure, so as to provide effective reference for subsequent remedial teaching, it more emphasis on the internal process of study subjects. However, the vast majority of CAT and CD-CAT were not allowed to modify the answer, researchers worry about repair Change your answer will reduce their effectiveness. Allows you to modify the answer with subjects consistently test habit, after the modification of the scores can reflect the true level of the subjects, so as to further promote the application of CAT and CD-CAT in practice. The previous research mainly from three aspects put forward the control method can modify the answer: CAT is a test design; two is to improve the selection strategy; three is to construct the model of.Han (2013) put forward the topic of pocket (Item Pocket, IP) method is a good application prospect can be modified to answer CAT (Reviewable CAT) control method, IP method is the idea of computer provides a topic selection for pocket try, which allows participants to answer process, ready to take the title to be modified or temporarily want to skip the title in IP, and then answer the next question, put into IP subject is not involved in the current capacity when the IP volume estimation. After the full amount, the subjects need to replace the one to put the.IP back the disadvantage of its capacity is not easy to control, the capacity will lead to larger estimation error. This paper improved on the basis of IP method, put forward the new method of scoring title pocket (Modified IP, MIP), that is to put into IP modify title re score. Compared with IP, Stocking (1997) design are more restricted to modify the behavior of Stocking 1 in the design allows candidates to answer all questions, return to modify a fixed number of questions, answers are not reflected in the modified adaptive selection; Stocking design are allowed 2 subjects modification of separate units within the limit question answer. In the test process will be subject in accordance with the order is divided into fixed length unit title, title of unit length stipulated according to the actual needs. The subjects in unit of title can check and Modify the computer to select the next unit according to answer participants of the current unit, the unit is not allowed to submit the answer back again. And a modified Stocking design compared to the subjects in the Stocking design of the second test has more control, no matter how many tests are divided into small units, participants can also modify all the title, but a design can only modify the fixed number of topics. The second was another design modification unit the answer will be under the influence of a unit selection, the computer will according to the ability of the modified estimates to select the next unit. Design capacity after a modified subject is not reflected in the adaptive estimation topic. From this point of view to design a more than two design conform to the rules of CAT. The design methods of the above several changing answers have both advantages and disadvantages in the application of CAT (IP, MIP, Stocking and Stocking 1 2), CD-CAT is a further development of the CAT and come, but also there is the big difference between the two, in order to verify the method can modify the answer to CD-CAT (Reviewable Cognitive Diagnostic Computerized Adaptive Testing, RCD-CAT) the effect of simulated respectively using DINA and R-RUM model, hypothesis subjects knowledge and questions obey uniform distribution, simulation of 5000 subjects, 300 capacity questions, estimation method to estimate the state of knowledge of the maximum likelihood (Maximum Likelihood, Estimation, MLE), the number of properties investigated were 5 and 7, the strategies include: Kullback Leibler (KL), Posterior Weighted KL (PWKL). Hybrid KL (HKL) and Modified Posterior-Weighted KL (MPWKL), the test length is divided into 10 and 20 questions. Through the research by Monte Carlo simulation found that: first, compared with the traditional answer does not modify the CD-CAT, the RC The D-CAT method can, without loss of diagnostic accuracy and database exposure rate, allows the student to change the answer, which is consistent with the general answer student behavior, reduce the burden of students and the degree of anxiety and answer, more easily accepted by the public. In second, when the time of the DINA model, MIP method and IP method the effect is not much difference, the results show that the MIP effect depends on the probability distribution of the subjects to answer. Third, in all conditions, Stocking design pattern match ratio is higher than the other three methods, the design of Stocking 2 model accuracy slightly better than the Stocking 1, the results show that the Stocking design has a good prospect in the application of RCD-CAT. In short, RCD-CAT is not only in conformity with the subjects consistently test habits, and by modifying the answer is helpful to improve the accuracy of diagnosis, further this study helps to further RCD-CAT It provides theoretical and method support for the research and practical application of RCAT.

【学位授予单位】：江西师范大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：B841

【参考文献】