基于Hadoop平台的图书馆读者兴趣分析与导向系统模型的建立

发布时间：2018-06-30 03:57

本文选题：大数据 + Hadoop　；参考：《长春工业大学》2017年硕士论文

【摘要】：近年来,特别是党的十八大以来,在创新驱动国家战略的引领下,在大众创业,万众创新的大时代背景下,互联网+高校信息化管理成为了炙手可热的话题。作为学校信息数据量巨大的部门——图书馆,随着移动网络、大数据、云计算以及物联网等新兴技术的逐渐成熟和高校对图书馆信息化的日益重视,已经对越来越多的图书资源实现了信息化管理,并且使其呈现出了功能多样化的百花齐放的局面。广大师生在图书馆检索查阅过程中会产生非常大的数据量,这些数据正是我们对读者的阅读和检索兴趣进行深入分析的依据。值得注意的是,读者的检索多样化和读者兴趣分析及图书应用推荐智能化需求也为图书馆带来了巨大的挑战:一方面,单一的服务器外加磁盘阵列架构的图书馆智能化硬件设施已无法适应大量的数据检索分析及存储的需要,并且,以上硬件存在成本较高和难以适应大数据时代需求的缺点;另一方面,当前对高校的图书馆借阅流通数据的分析方法仍旧停留在对各种单一指标对象的分析上,分析得不够全面;最后,现行的读者兴趣分析只是得出了一个统计结果,而不是能将统计结果进一步转化为对学校图书馆如采购等其他工作产生直观指导的数据。为解决以上问题,通过对长春某大学及其周边部分省属高校图书馆进行数据分析与实地调研,结合大数据理论,结合导师的研究方向和相关横向课题,在深入研究Hadoop大数据技术的基础上,建立兴趣分析和导向模型可以对高校图书馆信息化建设的研究起到一定的借鉴作用。本文主要在以下四方面展开研究:一是将Hadoop大数据技术与C#语言数据分析技术应用于读者兴趣分析与导向分析工作中,同时为了解决大数据存储与运算的高成本问题,本文选择了成本较低的Microsoft Azurez作为服务器群,用来构建Hadoop数据平台;二是利用NoSQL分布式数据库和HBase数据库对图书馆读者借阅检索日志(图书电子资源访问来源)进行分析,通过日志分析,实现了对图书馆电子资源的使用信息进行监控、优化;三是通过对读者的文献借阅的历史数据进行挖掘,构建文献推荐模型框架;第四,将文献推荐模型框架生成的图书推荐列表进行转化,最终为采编部生成采购清单。本论文实现了以下三个方面的创新:一是将现流行的Hadoop大数据分析技术引入了较为传统的高校图书馆读者兴趣分析与导向工作中,实现了在廉价的计算机上构建文献借阅分析集群平台,充分利用微软的云平台,使海量文献借阅信息的分布式存储问题迎刃而解。二是在数据分析方面,将C#编程语言引入Hadoop框架,为兴趣分析与导向模型的建立提供了高效、直观的数据分析方案,把读者的借阅历史数据进行大数据分析与比对,最终实现针对读者的个性化推荐,进而为图书馆的馆藏图书建设、馆内图书资源结构调整提供强有力的决策支持。第三是实现了个性化推荐图书,为图书馆的图书采购部采购图书提供参考。
[Abstract]:In recent years, especially in the eighteen major party, under the guidance of innovation driven national strategy, the Internet + university information management has become a hot topic under the background of public entrepreneurship and great innovation. As a huge department of information and data in schools, the library, with the mobile network, large data, cloud computing and things With the growing maturity of emerging technologies such as networking and the increasing attention of colleges and universities to the information of libraries, more and more book resources have been realized by information management, and they have shown a variety of diversified functions. The vast amount of data will be produced in the process of searching and consulting the library by the teachers and students. These data are positive. It is the basis for the in-depth analysis of readers' reading and retrieval interest. It is noteworthy that the diversity of readers' retrieval, the analysis of readers' interest and the intelligent demand for the recommendation of library application have also brought great challenges to the library. On the one hand, the single server and the magnetic disk array architecture of library intelligent hardware facilities have already been used. It is unable to adapt to a large number of data retrieval analysis and storage needs, and the above hardware has a high cost and is difficult to adapt to the needs of the large data age. On the other hand, the current analysis method of library circulation data in Colleges and universities remains on the analysis of a variety of single target objects, and the analysis is not comprehensive; finally, The current analysis of readers' interest only draws a statistical result, but not the data that can be further transformed into a direct guide to other work such as procurement, such as the school library. In order to solve the above problems, the data analysis and field research are carried out on the provincial university libraries in Changchun and its surrounding provinces. Combining with the big data theory, combining the direction of the tutor's research and the related lateral issues, on the basis of the in-depth study of the Hadoop data technology, the establishment of an interest analysis and guidance model can play a certain reference for the research of the information construction of the university library. This paper mainly studies the following four aspects: one is the large data technology of Hadoop And C# language data analysis technology is applied to the reader's interest analysis and guidance analysis. At the same time, in order to solve the high cost problem of large data storage and operation, this paper chooses the lower cost Microsoft Azurez as the server group to construct the Hadoop data platform; two is to use the NoSQL distributed database and the HBase database to book the books. The library reader borrows the retrieval log (the source of the book electronic resource access) to carry on the analysis, through the log analysis, realizes the library electronic resources use information to carry on the monitoring, the optimization; three is through the historical data that the literature borrowed from the reader to excavate, constructs the literature recommendation model frame; fourth, the literature recommendation model frame generates The book recommendation list is transformed, and the purchase list is finally generated for the editing department. The following three innovations are realized in this paper. One is to introduce the popular Hadoop data analysis technology into the more traditional reader interest analysis and guidance work of the university library, and to build a literature review and analysis set on a cheap computer. The group platform makes full use of Microsoft's cloud platform to solve the distributed storage problem of borrowing information in mass literature. Two, in data analysis, the C# programming language is introduced into the Hadoop framework to provide an efficient and intuitive data analysis scheme for the establishment of interest analysis and guidance model, and the reader's borrowed historical data is carried out large data. Analysis and comparison, finally realize the personalized recommendation for readers, and then provide a strong decision support for the library collection books construction and the book resource structure adjustment in the library. Third the personalized recommendation books are realized, and the library book purchasing department purchase books are provided for reference.
【学位授予单位】：长春工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.13;TP311.52

【参考文献】