Compatible Study of Hadoop for Efficient Analyzing and Proce

发布时间:2021-01-02 04:02
  在利用计算机的同时,数据不断产生和积累。导致的问题是在哪里保存这些数据?过去解决此问题,存储成本过大。然而,由于近来技术的发展,存储费用已减少。大数据是数据集的集合,而数据集的规模更大且涉及面更广,使用传统的数据库管理工具很难处理。同时,使用传统方法处理大量数据集非常耗时,因此,比传统方法更快,效率更高的Hadoop框架被广泛使用。主要目标是对不断产生的数据进行处理,效率更高,耗时更少,并且不用存储数据。数据主要分为三类:结构化数据、非结构化数据和半结构化数据。为了处理这些巨大的数据集,Hadoop中提供了不同类型的框架。我们主要关注Pig、Hive和Impala这三个不同的框架,围绕如何有效分析结构化数据集并减少结构化数据集的时间消耗展开系统研究。我们通过将三种Hadoop框架应用于两个不同的数据集进行实验比较,检查数据处理效率。具体来说,我们在Hive,Pig和Impala上执行类似的任务并完成实验结果评测。结果表明,Impala比Hive和Pig效率更高,因为执行任务所需的时间更少。 

【文章来源】:西南科技大学四川省

【文章页数】:59 页

【学位级别】:硕士

【文章目录】:
摘要
Abstract
CHAPTER1 INTRODUCTION
    1.1 Introduction
    1.2 Big Data Definitions
    1.3 Research Background
        1.3.1 Big Data Applications
        1.3.2 Challenges of Big Data
        1.3.3 Apache Hadoop
        1.3.4 Hadoop Environment
        1.3.5 Hadoop Architecture and Design
        1.3.6 Hadoop Distributed File System(HDFS)
        1.3.7 MapReduce
        1.3.8 Hadoop Ecosystem
    1.4 Objective of Research
    1.5 Contributions and Significance of Research
CHAPTER2 Related Work/Review of Literature
    2.1 INTRODUCTION
    2.2 Review of Literature
Chapter3 Methodology
    3.1 Completely Unstructured Data
    3.2 Semi-Structured Data
    3.3 Structured Data
    3.4 Estimation Technique
    3.5 Apache PIG-based Calculating
    3.6 Apache HIVE-based Data Storage
    3.7 Apache IMPALA-based Data Management
Chapter4 Experiment and Results
    4.1 Dataset
    4.2 System Requirements
    4.3 Apache Pig
        4.3.1 Contents of our Input File
        4.3.2 Copying the Input File
        4.3.3 Executing the Pig commands on File
        4.3.4 Mapper and Reducer Running Job
        4.3.5 Output
    4.4 Apache Hive
        4.4.1 Create Table and Loading the Data
        4.4.2 Query Execution
        4.4.3 Mapper and Reducer Running Job
    4.5 Apache Impala
        4.5.1 Contents of Input File
        4.5.2 Create Table and Loading the Data
        4.5.3 Query Execution
        4.5.4 Output
    4.6 Comparison of Results(Pig,Hive Impala)
Chapter5 Conclusion and Future Work
    5.1 Conclusion
    5.2 Future Work
Reference
ACKNOWLEDGEMENTS
Academic Achievements
DEDICATION



本文编号:2952612

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/shengwushengchang/2952612.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户42bc8***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com