协同过滤算法在教育数据挖掘中学生成绩预测的研究
发布时间:2018-04-08 17:42
本文选题:教育数据挖掘 切入点:K最近邻 出处:《昆明理工大学》2016年硕士论文
【摘要】:目前,在学校对学生的教育中累积了大量较为明显的各种数据,例如学生的入学率、辍学率以及学生各科成绩分数的数据,具体到课堂中的学生回答题目的正确率、知识点的掌握程度等信息。显然,这些教育领域中的各种数据是不断变化的,会随着信息化的发展而累积增加,那么如何提取这些复杂繁冗数据中有用的信息,具有很好的研究价值。本文结合协同过滤算法在电子商务等领域数据分析的相似性,将协同过滤算法应用到教育数据领域,重点对学生成绩预测进行研究,对KDD Cup 2010比赛中从ITS智能导师系统中选取的890万条数据作为实验数据集,进行学生成绩预测的教育数据挖掘实践探索。实验数据集中特征量大,取值范围也较大,多为文本类型的数据,部分数据稀疏等特点。针对这些问题,本文主要进行如下工作:(1)采用渐进抽样方式,确定最优的训练集样本大小,大幅缩减训练集记录量;结合数据集时间特性,抽取训练集最新的N个数据;删除隐含作答结果集空值比例大的特征,分离部分复杂结构属性。(2)将单一分类算法K最近邻和奇异值分解模型SVD应用到教育数据集中进行验证,对测试集中的Correct First Attempt(CFA)属性进行预测,并以此作为评价内容,同时对比两种算法的预测效果。(3)本文还依据两种基础算法的特点互补性,将SVD降维与K最近邻算法相结合预测学生成绩。进行实验可以分析出,该算法能使数据稀疏性得到一定程度上的缓解,但只保留了数据的基本特征,因降维造成的部分数据丢失对评价效果会造成些许影响。
[Abstract]:At present, a large number of obvious data have been accumulated in school-to-student education, such as student enrolment, drop-out rates, and scores of students in various subjects, in particular the correct rate of students answering questions in the classroom.Knowledge of the degree of mastery of information.Obviously, all kinds of data in the field of education are constantly changing and will accumulate with the development of information, so how to extract useful information from these complex and redundant data has good research value.Based on the similarity of collaborative filtering algorithm in electronic commerce and other fields, this paper applies the collaborative filtering algorithm to the field of educational data, and focuses on the research of student achievement prediction.8.9 million data selected from the ITS intelligent tutor system in the KDD Cup 2010 competition are used as experimental data sets to explore the educational data mining practice of student achievement prediction.The experimental data set features a large number of features and a large range of values, mostly text type data, part of the data sparse and so on.Aiming at these problems, this paper mainly carries out the following work: 1) to determine the optimal sample size of the training set, to reduce the record amount of the training set, to extract the latest N data of the training set according to the time characteristics of the training set, and to adopt the method of gradual sampling.Removing the feature of large proportion of null value in implicit answer result set, separating part of complex structure attribute. (2) the single classification algorithm K nearest neighbor and singular value decomposition model SVD are applied to educational data set to verify.This paper predicts the Correct First AttemptCFAs in the test set, and takes them as the evaluation contents. At the same time, the prediction results of the two algorithms are compared.SVD dimension reduction and K-nearest neighbor algorithm are combined to predict students' scores.Experimental results show that the algorithm can alleviate the data sparsity to a certain extent, but only retain the basic characteristics of the data. Some data loss caused by dimensionality reduction will have some impact on the evaluation effect.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP311.13
,
本文编号:1722657
本文链接:https://www.wllwen.com/jingjilunwen/dianzishangwulunwen/1722657.html