基于大数据平台的MOOC混合推荐算法的研究及应用
发布时间:2018-06-29 13:56
本文选题:大规模公开线上课程 + 推荐系统 ; 参考:《杭州电子科技大学》2017年硕士论文
【摘要】:得益于互联网的高速发展,传统教育领域正在发生翻天覆地的变化。近年来,一种无门槛、费用低廉、学习资源丰富的新兴教育方式正在普及——MOOC。但随着MOOC平台的迅速发展,MOOC课程数量大幅度增长,造成信息过载的问题。用户很难从大量的MOOC课程中选取自己需要的课程,造成“选课难”的问题。因此使用智能算法解决MOOC平台信息过载问题,帮助用户选取合适的课程,同时让优秀的课程脱颖而出是很有必要的。推荐系统被认为是一种解决信息过载问题更加高效的方法。虽然推荐系统已经成功应用于很多领域,但是在MOOC领域应用推荐系统的国内外相关研究依旧很少。如果直接生搬硬套以往的使用经验,不考虑MOOC应用的场景特征,那么课程推荐结果的准确率会比较低。为了解决MOOC平台的“选课难”问题,本文提出了MOOC隐式评分模型,并且根据当下互联网大数据环境,设计实现了一个基于大数据平台的MOOC推荐系统。本文的主要贡献和创新有:(1)提出MOOC隐式评分模型。该模型根据MOOC平台的应用场景特征,利用用户学习行为,并借鉴以往推荐系统在其他领域的成功经验。(2)利用MOOC隐式评分模型改进了传统的基于物品的协同过滤推荐算法和矩阵分解算法。通过实验结果证明,使用MOOC隐式评分模型可以提高传统推荐算法在MOOC应用中的推荐准确率。(3)设计基于大数据平台的MOOC推荐系统以便于应对当今互联网的大数据环境。该系统根据大数据MOOC应用的业务特点分为六个模块,每个模块都采用微服务架构实现,方便系统以后的扩展和维护。(4)利用MapReduce计算模型给出了基于MOOC隐式评分模型的协同过滤推荐算法的并行化解决方案。然后针对迭代式算法的特点,使用Spark MLlib实现矩阵分解算法,大大减少计算时间和对大规模数据集的处理能力。
[Abstract]:Thanks to the rapid development of the Internet, the traditional field of education is undergoing earth-shaking changes. In recent years, a non-threshold, low-cost, learning resources-rich emerging education is popularizing-MOOC. However, with the rapid development of MOOC platform, the number of MOOC courses has increased greatly, resulting in the problem of information overload. It is difficult for users to choose their own courses from a large number of MOOC courses, which results in the problem of difficult course selection. So it is necessary to use intelligent algorithm to solve the problem of information overload in MOOC platform, to help users select appropriate courses and to make outstanding courses stand out. Recommendation system is considered to be a more efficient way to solve the problem of information overload. Although the recommendation system has been successfully applied in many fields, the research on the application of the recommendation system in the field of MOOC is still few at home and abroad. If the previous experience is directly applied and the scenario features of MOOC application are not considered, the accuracy of the course recommendation results will be low. In order to solve the problem of "difficult course selection" on MOOCs platform, this paper proposes an implicit scoring model of MOOCs, and designs and implements a moc recommendation system based on big data platform according to the current Internet big data environment. The main contributions and innovations of this paper are as follows: (1) an implicit scoring model for MOOC is proposed. According to the characteristics of the application scenarios of MOOC platform, the model utilizes user learning behavior. And draw lessons from the successful experience of the previous recommendation system in other fields. (2) using MOOC implicit scoring model to improve the traditional object-based collaborative filtering recommendation algorithm and matrix decomposition algorithm. The experimental results show that using MOOC-based implicit scoring model can improve the accuracy of traditional recommendation algorithms in MOOC-based applications. (3) the moc recommendation system based on big data platform is designed to deal with the current big data environment of the Internet. The system is divided into six modules according to the service characteristics of the big data MOOC application, each module is implemented by micro-service architecture. It is convenient to extend and maintain the system in the future. (4) the parallel solution of collaborative filtering recommendation algorithm based on MOOC implicit scoring model is presented by using MapReduce computing model. Then, according to the characteristics of iterative algorithm, Spark MLlib is used to implement matrix decomposition algorithm, which greatly reduces the computing time and processing ability of large-scale data sets.
【学位授予单位】:杭州电子科技大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.3
【参考文献】
相关期刊论文 前8条
1 王文礼;;MOOC的发展及其对高等教育的影响[J];江苏高教;2013年02期
2 李国杰;程学旗;;大数据研究:未来科技及经济社会发展的重大战略领域——大数据的研究现状与科学思考[J];中国科学院院刊;2012年06期
3 李青;王涛;;MOOC:一种基于连通主义的巨型开放课程模式[J];中国远程教育;2012年03期
4 林伟伟;;一种改进的Hadoop数据放置策略[J];华南理工大学学报(自然科学版);2012年01期
5 杨博;赵鹏飞;;推荐算法综述[J];山西大学学报(自然科学版);2011年03期
6 蔺丰奇;刘益;;信息过载问题研究述评[J];情报理论与实践;2007年05期
7 张尧庭;我们应该选用什么样的相关性指标?[J];统计研究;2002年09期
8 王继成,萧嵘,孙正兴,张福炎;Web信息检索研究进展[J];计算机研究与发展;2001年02期
,本文编号:2082332
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2082332.html