基于Hadoop的HSK汉语考试系统的设计与实现

发布时间：2018-06-02 11:04

本文选题：HSK考试 + 协同过滤　；参考：《江苏大学》2017年硕士论文

【摘要】：HSK是指汉语水平等级考试,其目的是为测试母语为非汉语者的汉语水平而建立的官方国家级标准化考试。为此我们课题组于2013年开发了HSK汉语考试系统,在北美、欧洲的多家孔子学院获得应用。经过3年多的运行,由于系统积累了大量的历史用户数据,组卷速度在高并发条件下逐渐变慢,且当时设计的HSK汉语系统对考生的个性化特征考虑不多,动态组卷的功能不强。针对上述问题,我们采用基于Hadoop生态圈的分布式存储计算并行化地实现协同过滤推荐算法,通过分析考生行为数据进行个性化的组卷。本文的主要研究内容和工作如下:1.在分析了HSK汉语考试系统的功能需求的基础上,采用B/S分层,规划整体设计目标、逻辑结构,实现了系统的核心模块。2.针对HSK汉语考试系统中随机组卷的不足,提出了基于物品的协同过滤组卷算法,为考生推荐个性化的试题。设计并实现了基于Hadoop平台的协同过滤组卷,将计算任务分解为一系列MapReduce作业流程,在Hadoop平台上进行分布式处理。3.为了降低组卷模块与系统其他模块的逻辑耦合,设计并实现了基于消息队列和Spark Streaming框架的数据收集系统,通过Kafka消息队列完成了考生实时评分记录的转换,并通过Streaming实时流处理模块完成考生评分记录的收集。
[Abstract]:HSK refers to the Chinese proficiency level examination, which is an official national standardized test established for the purpose of testing the Chinese proficiency of non-native speakers. Therefore, our team developed the HSK Chinese examination system in 2013, which has been applied to many Confucius institutes in North America and Europe. After running for more than three years, because the system has accumulated a large amount of historical user data, the speed of test paper formation has gradually become slower under the condition of high concurrency, and the HSK Chinese language system designed at that time has not considered many individualized features of examinees, and the function of dynamic composition test paper is not strong. To solve the above problems, we use distributed storage computing based on Hadoop ecosystem to implement collaborative filtering and recommendation algorithm, and analyze the data of examinee behavior to form individual test papers. The main contents and work of this paper are as follows: 1. Based on the analysis of the functional requirements of the HSK Chinese examination system, this paper adopts the B / S stratification, the overall design goal, the logical structure, and realizes the core module of the system. In order to solve the problem of random test composition in HSK Chinese examination system, a collaborative filtering algorithm based on articles is proposed to recommend individualized test questions for examinees. In this paper, we design and implement the collaborative filtering based on Hadoop platform, decompose the computing task into a series of MapReduce workflow, and process .3on the Hadoop platform. In order to reduce the logical coupling between the test paper generating module and other modules of the system, a data collection system based on message queue and Spark Streaming framework is designed and implemented. Through the Kafka message queue, the candidates' real-time score record conversion is completed. And through the Streaming real-time stream processing module to complete the examinee score record collection.
【学位授予单位】：江苏大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.52

【参考文献】