基于Spark的高考推荐系统设计与实现
本文选题:大数据 切入点:推荐系统 出处:《山东师范大学》2017年硕士论文 论文类型:学位论文
【摘要】:为了解决用户无法获取有价值信息和信息无法被需要的用户所利用的困境,人们提出推荐系统的概念。随着大数据时代的到来,推荐系统也开始面临难以处理海量数据的困境,为了走出困境,与大数据处理技术相结合是必然的趋势。Spark作为大数据处理技术中的佼佼者,提出了RDD的数据模型与基于内存的计算模式,现已被广泛应用于电子商务、视频、社交等领域。但在教育领域内,无论是推荐系统还是大数据处理技术,都涉及较少。高考作为教育领域中的大事件,其志愿填报更是考生关注的焦点。历年的考生志愿录取信息作为考生志愿填报的重要参考数据,因其数据庞大且复杂的特点造成其利用率极低。本文将推荐系统与大数据处理框架Spark相结合,应用于推荐系统与Spark较少涉及的教育领域,帮助考生解决高考志愿填报环节的志愿选择问题。本文完成的工作有以下几点:(1)利用HTML+CSS级联样式表+JSP的前端开发技术,设计开发了高考志愿推荐的Web前端界面。其中包括用户注册界面、用户登录界面、志愿推荐结果展示界面以及相关高考信息(政策、新闻、高校信息与专业信息)的浏览界面。在保证本系统实用性和易用性的同时为用户提供良好的交互体验。(2)以Web前端作为用户日志的生产方,设计性能良好的日志收集模块。首先,采用Flume日志收集工具收集日志信息;其次,通过Sink组件将收集到的信息传送给Kafka消息中间件,利用其功能对日志信息进行统一下发;最后,使用Spark Streaming流式处理框架对Kafka中收集到的日志信息进行清理与提取,并将其存储于HDFS文件系统中。(3)设计高考志愿场景下的志愿推荐引擎。首先,通过阅读大量高考志愿填报文献,选取合适的用户属性,计算相似性,建立相似矩阵,寻找相似用户;其次,分析几种最常见的推荐算法,结合高考志愿填报的真实场景选择基于用户的协同过滤算法作为本系统的推荐算法;最后通过Spark计算框架的并行化计算方式生成最终的推荐列表。(4)搭建Spark分布式集群开发环境,实现系统整体的开发和相关测试。首先,阅读相关文档,在实验室实际环境中搭建具有三个节点的Spark分布式集群开发环境;其次,使用Scala语言编写相关代码,实现系统开发;最后,系统开发完成后对日志收集工具以及Spark相关组件进行性能,确保系统正确高效运行,同时对推荐结果准确度以及整体系统满意度进行测试,保证用户的良好体验。
[Abstract]:In order to solve the dilemma that users can not obtain valuable information and information can not be used by users, people put forward the concept of recommendation system. With the arrival of big data era, recommendation system also began to face the dilemma of dealing with massive data. In order to get out of the dilemma, it is an inevitable trend to combine with big data's processing technology. As a leader in big data processing technology, Spark has put forward the data model and memory-based computing model of RDD, which has been widely used in electronic commerce, video, etc. But in the field of education, neither the recommendation system nor big data's handling techniques are involved. College entrance examination is a major event in the field of education. It is the focus that candidates pay more attention to. The information of candidates' voluntary admission over the years is regarded as an important reference data for candidates to fill in voluntary information. Because of its huge and complex data, its utilization rate is very low. This paper combines the recommendation system with big data processing framework Spark, and applies it to the educational field which is seldom involved in recommendation system and Spark. To help the examinee solve the problem of volunteer selection in the process of filling in the college entrance examination. The work accomplished in this paper is as follows: 1) using the front-end development technology of HTML CSS cascading style sheet JSP, The Web front-end interface of college entrance examination voluntary recommendation is designed and developed, which includes user registration interface, user login interface, volunteer recommendation result display interface and related college entrance examination information (policy, news, etc.). The browsing interface of university information and professional information. While ensuring the practicability and ease of use of this system, it provides a good interactive experience for users. The Web front-end is used as the producer of user log, and a log collection module with good performance is designed. The Flume log collection tool is used to collect log information. Secondly, the collected information is transported to the Kafka message middleware through Sink components, and the log information is distributed uniformly using its functions. Finally, The Spark Streaming streaming processing framework is used to clean up and extract the log information collected in Kafka, and it is stored in the HDFS file system. Select appropriate user attributes, calculate similarity, build similarity matrix, find similar users. Secondly, analyze several common recommendation algorithms. Combined with the real scene of college entrance examination voluntary report, the user-based collaborative filtering algorithm is selected as the recommendation algorithm of the system. Finally, the final recommendation list. 4 is generated by parallelizing the Spark computing framework. Finally, the distributed cluster development environment of Spark is built. First, read the relevant documents, build a three-node Spark distributed cluster development environment in the laboratory environment; secondly, use Scala language to write the relevant code to realize the system development. Finally, after the development of the system, log collection tools and Spark components are performed to ensure the correct and efficient operation of the system. At the same time, the accuracy of the recommended results and the overall system satisfaction are tested to ensure the user's good experience.
【学位授予单位】:山东师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.3
【参考文献】
相关期刊论文 前10条
1 陈功平;王红;;改进Pearson相关系数的个性化推荐算法[J];山东农业大学学报(自然科学版);2016年06期
2 徐海荣;陈闵叶;张兴媛;;基于Flume、Kafka、Storm、HDFS的航空维修大数据系统[J];上海工程技术大学学报;2015年04期
3 杨忠斌;;高考学生填报志愿制约因素分析及对策[J];管理观察;2015年12期
4 LIU Qingwen;XIONG Yan;HUANG Wenchao;;Combining User-Based and Item-Based Models for Collaborative Filtering Using Stacked Regression[J];Chinese Journal of Electronics;2014年04期
5 孟小峰;慈祥;;大数据管理:概念、技术与挑战[J];计算机研究与发展;2013年01期
6 肖灿;张自力;何小明;夏大飞;;基于商务智能的高考志愿填报指导系统设计与实现[J];西南师范大学学报(自然科学版);2012年05期
7 朱郁筱;吕琳媛;;推荐系统评价指标综述[J];电子科技大学学报;2012年02期
8 郭明昌;;志愿填报:是分尽其用,还是各取所得?[J];高校招生;2012年03期
9 王国霞;刘贺平;;个性化推荐系统综述[J];计算机工程与应用;2012年07期
10 杨博;赵鹏飞;;推荐算法综述[J];山西大学学报(自然科学版);2011年03期
相关硕士学位论文 前5条
1 高大月;基于Hadoop的数据仓库引擎的设计与实现[D];北京交通大学;2015年
2 杨志伟;基于Spark平台推荐系统研究[D];中国科学技术大学;2015年
3 胡于响;基于Spark的推荐系统的设计与实现[D];浙江大学;2015年
4 唐振坤;基于Spark的机器学习平台设计与实现[D];厦门大学;2014年
5 肖灿;数据挖掘系统支撑下的高考志愿填报在线咨询系统设计与实现[D];西南大学;2012年
,本文编号:1643171
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1643171.html