基于Hadoop架构的大数据文本分析研究
发布时间:2021-02-15 00:04
我们正处于“大数据”时代,大数据的出现为处理海量数据带来了新的机遇和挑战。大数据在现代社会发挥了重要作用,为了从大量的数据中找到有用的信息,需要对数据进行分析。数据分析需要从文本、图像、视频或社交媒体帖子等出现在网络上的非结构化数据中获取信息。本文概述了大数据的优势和研究范围,介绍了 Hadoop架构及其组件中的大数据文本分析,还重点研究了大数据在数据挖掘中的应用。文本分析是工业分析中最复杂的数据分析之一。原因是在开发文本挖掘时需要处理非结构化数据(电子邮件、Facebook、Twitter和Linkedin提要),没有明确定义观察和变量(行和列)。因此,要进行任何类型的数据分析,都需要先将这个非结构化数据转换为结构化数据集,然后继续使用普通的建模框架。将非结构化数据转换为结构化格式的附加步骤由单词字典提供便利,需要一本字典来做任何类型的信息提取,情感分析词典可以在网上找到。然而,对于某些特定的分析,用户需要创建自己的字典。本文用Hadoop eco系统描述了文本分析的两个概念部分,以及具体的MapReduce。第一种方法是从2013年的tweets中收集一个大的文本文件(CSV文件)...
【文章来源】:兰州理工大学甘肃省
【文章页数】:76 页
【学位级别】:硕士
【文章目录】:
中文摘要
Abstract
Chapter 1 Introduction
1.1 Background and Motivation
1.1.1 Background
1.1.2 Motivation
1.2 Research Status at Home and Abroad
1.3 Objective of Research Work
1.4 Methodology
1.5 Required Resources
1.6 Structure of Thesis
Chapter 2 Overview of Big Data Text Analysis Basedon Hadoop Architecture
2.1 Big Data Overview
2.2 Big Data Characteristics
2.2.1 Volume
2.2.2 Velocity
2.2.3 Variety
2.2.4 Veracity
2.2.5 Value
2.3 Different Types of Data
2.3.1 Structured Data
2.3.2 Unstructured Data
2.3.3 Semi-Structured Data
2.3.4 Metadata
2.4 Data Analysis
2.5 Big Data Adoption and Planning Considerations
2.5.1 Data Procurement
2.5.2 Privacy
2.5.3 Security
2.5.4 Provenance
2.5.5 Limited Realtime Support
2.5.6 Distinct Performance Challenges
2.6 Hadoop Overview
2.6.1 Hadoop Architecture
2.6.2 MapReduce
2.6.3 Hadoop Distributed File System
Chapter 3 Implementation plan
3.1 Big Data Analysis Techniques
3.2 Quantitative Analysis
3.3 Qualitative Analysis
3.4 Data Mining
3.5 Statistical Analysis
3.5.1 A/B Testing
3.5.2 Correlation
3.5.3 Regression
3.5.4 Conclusions
Chapter 4 Experimental result
4.1 Text Analysis Within Hadoop
4.2 Test Cases
4.3 Dataservices vs Hadoop: Comparing the Results
4.4 Transferring Text Data Processing Libraries To The Hadoop Cluster
4.5 Optimizing Text Data Processing For Use In The Hadoop Framework
4.6 HDFS Source File Formats
4.7 Text Data Processing Pushed Down to Hadoop
4.8 Problem Tracking
4.9 Other Type of Errors to Watch Out
4.10 University Mobile App for Collecting Big Data
4.11 Using GPS Location Coordinate for Text Analysis
4.12 Results Comparison
Chapter 5 Summary and future work
5.1 Summary
5.2 Future Work
References
Acknowledgements
Appendix A. Mobile Apps Developed during the master's degree program
Appendix B. Key Codes used in this thesis
本文编号:3034074
【文章来源】:兰州理工大学甘肃省
【文章页数】:76 页
【学位级别】:硕士
【文章目录】:
中文摘要
Abstract
Chapter 1 Introduction
1.1 Background and Motivation
1.1.1 Background
1.1.2 Motivation
1.2 Research Status at Home and Abroad
1.3 Objective of Research Work
1.4 Methodology
1.5 Required Resources
1.6 Structure of Thesis
Chapter 2 Overview of Big Data Text Analysis Basedon Hadoop Architecture
2.1 Big Data Overview
2.2 Big Data Characteristics
2.2.1 Volume
2.2.2 Velocity
2.2.3 Variety
2.2.4 Veracity
2.2.5 Value
2.3 Different Types of Data
2.3.1 Structured Data
2.3.2 Unstructured Data
2.3.3 Semi-Structured Data
2.3.4 Metadata
2.4 Data Analysis
2.5 Big Data Adoption and Planning Considerations
2.5.1 Data Procurement
2.5.2 Privacy
2.5.3 Security
2.5.4 Provenance
2.5.5 Limited Realtime Support
2.5.6 Distinct Performance Challenges
2.6 Hadoop Overview
2.6.1 Hadoop Architecture
2.6.2 MapReduce
2.6.3 Hadoop Distributed File System
Chapter 3 Implementation plan
3.1 Big Data Analysis Techniques
3.2 Quantitative Analysis
3.3 Qualitative Analysis
3.4 Data Mining
3.5 Statistical Analysis
3.5.1 A/B Testing
3.5.2 Correlation
3.5.3 Regression
3.5.4 Conclusions
Chapter 4 Experimental result
4.1 Text Analysis Within Hadoop
4.2 Test Cases
4.3 Dataservices vs Hadoop: Comparing the Results
4.4 Transferring Text Data Processing Libraries To The Hadoop Cluster
4.5 Optimizing Text Data Processing For Use In The Hadoop Framework
4.6 HDFS Source File Formats
4.7 Text Data Processing Pushed Down to Hadoop
4.8 Problem Tracking
4.9 Other Type of Errors to Watch Out
4.10 University Mobile App for Collecting Big Data
4.11 Using GPS Location Coordinate for Text Analysis
4.12 Results Comparison
Chapter 5 Summary and future work
5.1 Summary
5.2 Future Work
References
Acknowledgements
Appendix A. Mobile Apps Developed during the master's degree program
Appendix B. Key Codes used in this thesis
本文编号:3034074
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/3034074.html