基于Storm框架的微博用户潜在需求实时分析评估系统
[Abstract]:With the popularity and development of the Internet, Weibo, as an open information exchange and sharing platform, can generate hundreds of millions of levels of data every day. Mining the potential purchase behavior of users from these massive data and analyzing it will produce great economic value to the enterprise. However, the current research and analysis methods have the following shortcomings: the real-time analysis of Weibo is insufficient, resulting in a certain lag in the analysis results; at present, Weibo analysis is not targeted enough, and the value of specific groups has not been fully excavated. Aiming at the problems existing in the existing analysis methods for mining the potential purchase behavior of Weibo users, an efficient and real-time Weibo user behavior analysis and evaluation system based on Storm is designed and implemented in this paper. The specific work includes: firstly, the problem of uneven task distribution in the existing scheduling strategies of Storm is proposed and verified by experiments, and then an adaptive scheduling model based on CPU weights is proposed. In order to solve the problem of low efficiency caused by the time delay between internal nodes and the local characteristics of messages. Then it is the design and implementation of the real-time analysis system: it is divided into data source module, data access module, data analysis module and data display module: the data source obtains Weibo data through crawler and Sina API; The data access module solves the problem of data flow delay by building Kafka cluster, realizes the Spout and Bolt interface of Storm, realizes the data analysis module, uses Chinese word segmentation technology to segment the data, and uses K-means to analyze the data. The data storage module and Hbase are used to save the data, and SpringMVC and ECharts are used to realize the data display module. The experimental results show that the performance of the improved scheduling strategy is obviously better than that of the existing scheduling strategies, especially in CPU-intensive scheduling tasks, the performance of the improved scheduling strategy is obviously improved by about 50%. The real-time analysis system can analyze the potential purchase behavior of users in real time, and enterprises can carry out related research and marketing according to the analyzed behavior characteristics.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP393.092;TP311.13
【参考文献】
相关期刊论文 前9条
1 赵林莉;杨晓光;;基于Hadoop的多最小支持度关联规则挖掘研究[J];数字技术与应用;2015年10期
2 燕明磊;;Hadoop集群中作业调度研究[J];软件导刊;2015年04期
3 靳永超;吴怀谷;;基于Storm和Hadoop的大数据处理架构的研究[J];现代计算机(专业版);2015年04期
4 李川;鄂海红;宋美娜;;基于Storm的实时计算框架的研究与应用[J];软件;2014年10期
5 柴昱含;李道全;;基于Storm的滑动窗口实现[J];电脑知识与技术;2014年16期
6 黄静;张琦;江文斌;;基于改进K-Means算法的蚕茧自动计数方法的研究[J];丝绸;2014年01期
7 杜政颉;王鹏;黄焱;郎福通;;一种基于Storm编程模型的迭代Topology方案[J];成都信息工程学院学报;2014年01期
8 张榆;马友忠;孟小峰;;一种基于HBase的高效空间关键字查询策略[J];小型微型计算机系统;2012年10期
9 林大云;;基于Hadoop的微博信息挖掘[J];计算机光盘软件与应用;2012年01期
相关博士学位论文 前1条
1 田野;基于微博平台的事件趋势分析及预测研究[D];武汉大学;2012年
相关硕士学位论文 前9条
1 南海京;一种基于STORM的交通流数据实时处理系统设计与实现[D];北方工业大学;2015年
2 马瑞;基于Storm的短信诈骗拦截提示系统的设计与实现[D];北京邮电大学;2014年
3 周茜;基于网络爬虫的信息采集分类系统设计与实现[D];厦门大学;2013年
4 李浩;基于Twitter Storm的云平台监控系统研究与实现[D];东北大学;2013年
5 史冬冬;云队列:一个基于Hadoop的大规模消息基础平台[D];东华大学;2012年
6 石安磊;基于文本相似度评分的中医案例分析系统研究与实现[D];西北大学;2011年
7 徐晓明;专利文本聚类及关键短语抽取的研究[D];东北大学;2011年
8 董长春;基于Hadoop的倒排索引技术的研究[D];辽宁大学;2011年
9 苏旋;分布式网络爬虫技术的研究与实现[D];哈尔滨工业大学;2006年
,本文编号:2479467
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2479467.html