基于文本挖掘技术分析糖尿病推文的研究
发布时间:2022-05-08 20:44
目前,全球的糖尿病患者与日俱增,如何快速又成功的治疗糖尿病是针对健康问题的一项重大挑战。随着信息和科技的快速发展,研究学者在计算机系统的功能及安全上投入了更多精力,以期为糖尿病患者开发出更安全和更方便的护理程序。以往的研究大多基于存储在电子医疗设备或系统中的患者数据,但最近的研究发现社交媒体上关于糖尿病的文本数据具有重要的应用价值,如何有效地利用这些大量的非结构化的数据来设计和开发糖尿病患者的支持系统是目前研究的热点也是难点。本文以Twitter、Google及百度上关于糖尿病的讨论信息为研究对象,利用多种文本挖掘技术、LDA主题建模技术及SVM算法,实现关于糖尿病文本信息的挖掘,同时为糖尿病的预测提供了有效的手段,主要研究内容如下:1.Tweet文本信息下载及量化,并对信息进行特征和标签选择。利用Python Twitter API函数将Twitter网站上的tweets下载并以CSV格式存储到本地,采用Spacy库实现文本数据的分词,经TF-IDF算法计算特征词的权重,并通过主成分分析(PCA)算法对数据矩阵进行降维,以降低数据集的复杂度。为了确保只分析与糖尿病相关的tweet,计...
【文章页数】:75 页
【学位级别】:硕士
【文章目录】:
Abstract
摘要
Chapter1 Introduction
1.1 Background
1.2 Research Significance
1.3 Literature review
1.3.1 The Diabetes Challenge
1.3.2 The Internet as a Source of Health Information
1.4 Motivation and contributions
1.4.1 Motivation
1.4.2 Contributions
1.5 Research contents
1.6 Thesis Structure
Chapter2 Overview of Related Technology
2.1 Natural Language Processing
2.1.1 Application and Challenges of NLP
2.2 Text Mining Techniques
2.3 Topic Modeling Techniques
2.3.1 Topic Modeling Algorithms
2.4 Twitter as a Data Source
2.5 The Datasets
2.5.1 Twitter Data Set
2.5.2 Google Search Data
2.5.3 Baidu Search Data
2.6 Hashtag Selection
2.7 Text to Vector Transformation
2.8 Data Pre-processing Technique
2.9 Support Vector Machines(SVM)
2.10 Python Programming Language
2.11 Coherence Measures
2.12 Summary
bChapter3 Tweet Analysis and Identification of Insights
3.1 Analyzing Diabetes Discussion for Depression Related Insights
3.1.1 Data Set
3.1.2 Feature Engineering
3.1.3 Annotation
3.1.4 Experimental Analysis
3.2 Topic Analysis of Food Mentions in Tweets
3.2.1 System design
3.2.2 Experiment setup
3.2.3 Experimental Analysis
3.3 Summary
Chapter4 Association of Topics of Discussion Topics
4.1 Topic Association between Twitter Communication,Google and Baidu Web Searches
4.1.1 The LDA Algorithm
4.1.2 Word Relevance ratio/Similarity task
4.1.3 Topic Labeling
4.2 Association of Topics with Google and Baidu Web Search Results
4.2.1 Association of twitter and google data
4.2.2 Association of twitter and baidu data
4.3 Determining significance
4.4 Summary
Conclusion and Future Work
References
Acknowledgement
Publication and Awards
本文编号:3652344
【文章页数】:75 页
【学位级别】:硕士
【文章目录】:
Abstract
摘要
Chapter1 Introduction
1.1 Background
1.2 Research Significance
1.3 Literature review
1.3.1 The Diabetes Challenge
1.3.2 The Internet as a Source of Health Information
1.4 Motivation and contributions
1.4.1 Motivation
1.4.2 Contributions
1.5 Research contents
1.6 Thesis Structure
Chapter2 Overview of Related Technology
2.1 Natural Language Processing
2.1.1 Application and Challenges of NLP
2.2 Text Mining Techniques
2.3 Topic Modeling Techniques
2.3.1 Topic Modeling Algorithms
2.4 Twitter as a Data Source
2.5 The Datasets
2.5.1 Twitter Data Set
2.5.2 Google Search Data
2.5.3 Baidu Search Data
2.6 Hashtag Selection
2.7 Text to Vector Transformation
2.8 Data Pre-processing Technique
2.9 Support Vector Machines(SVM)
2.10 Python Programming Language
2.11 Coherence Measures
2.12 Summary
bChapter3 Tweet Analysis and Identification of Insights
3.1 Analyzing Diabetes Discussion for Depression Related Insights
3.1.1 Data Set
3.1.2 Feature Engineering
3.1.3 Annotation
3.1.4 Experimental Analysis
3.2 Topic Analysis of Food Mentions in Tweets
3.2.1 System design
3.2.2 Experiment setup
3.2.3 Experimental Analysis
3.3 Summary
Chapter4 Association of Topics of Discussion Topics
4.1 Topic Association between Twitter Communication,Google and Baidu Web Searches
4.1.1 The LDA Algorithm
4.1.2 Word Relevance ratio/Similarity task
4.1.3 Topic Labeling
4.2 Association of Topics with Google and Baidu Web Search Results
4.2.1 Association of twitter and google data
4.2.2 Association of twitter and baidu data
4.3 Determining significance
4.4 Summary
Conclusion and Future Work
References
Acknowledgement
Publication and Awards
本文编号:3652344
本文链接:https://www.wllwen.com/kejilunwen/shengwushengchang/3652344.html
最近更新
教材专著