社交信息传播时序预测算法
发布时间:2018-09-12 13:04
【摘要】:日益流行的社交网络为信息传播预测研究提供了广泛的数据基础和应用场景。信息传播预测研究是指基于已知的信息传播过程,利用方法对社交信息在未来一段时间内的传播趋势进行预测,以预先了解信息传播的整个过程。借助信息传播预测方法,网络公司可以更好地为用户提供个性化推荐服务和政府部门采取及时有效的舆论控制和引导。信息传播预测研究涉及到大规模数据并行处理,社交网络拓扑结构分析和文本内容分析等多个领域,吸引了来自大数据与云计算,复杂网络和自然语言处理等研究领域的学者们的关注。信息传播预测是社交网络研究的一个重要方向,近期的研究方法分为图和非图的方法。大多数非图的方法采用传染病模型和分类模型而很少考虑到社交时间序列的聚类特性。在基于聚类的时序预测算法CTP中,每个聚类质心作为一类传播模式,因此预测可以通过分类找出预测对象的最近邻传播模式来实现,即CTP把预测对象的最近邻聚类质心作为其预测结果。故CTP的预测性能依赖于预测对象与其最近邻聚类质心间的拟合度,拟合度越高,则CTP的预测性能越好。通过分析缩放距离的物理意义,本文观察到缩放距离能更好度量时间序列间的相似性。本文认为预测对象的基于缩放距离的最近邻聚类质心可能更加拟合预测对象从而获得更高的预测性能,而CTP的相关文献缺乏对预测性能受到缩放距离影响的研究。故本文基于CTP和缩放距离提出了基于缩放型聚类的时序预测算法S-CTP,改进后的S-CTP把预测对象的缩放后的最近邻聚类质心作为预测结果以提高其与预测对象的拟合度进而提高预测性能。twitter和phrase数据集上的实验结果表明,S-CTP提高了 CTP的泛化性能。在CTP中,预测对象的一部分最近邻聚类成员与预测对象的相似度较高而另一部分与预测对象的相似度较低,这导致CTP获得了较低的预测性能。针对CTP的预测性能较低的问题,本文基于CTP和时间序列分段特性提出了基于分段聚类的时序预测算法D-CTP。为选取与预测对象最相似的聚类成员,改进后的D-CTP始终把预测对象作为聚类质心并在预测对象的已知长度时序段进行聚类然后在已知长度和预测长度时序段精炼聚类质心。同S-CTP的提出类似,本文基于D-CTP和缩放距离提出了基于缩放型分段聚类的时序预测算法。twitter和phrase数据集上的实验结果表明同时考虑缩放距离和分段聚类的时序预测算法在S-CTP的基础上进一步提高了 CTP的泛化性能。
[Abstract]:The increasingly popular social networks provide a wide range of data bases and application scenarios for the prediction of information dissemination. The research of information dissemination prediction is based on the known information dissemination process, using methods to predict the trend of social information in the future, in order to understand the whole process of information dissemination in advance. With the help of information dissemination and prediction method, network companies can better provide personalized recommendation services for users and government departments to take timely and effective public opinion control and guidance. The research of information dissemination prediction involves many fields, such as large-scale data parallel processing, social network topology analysis and text content analysis, which attracts big data and cloud computing. The attention of scholars in the fields of complex networks and natural language processing. Information dissemination prediction is an important research direction in social networks. Recent research methods can be divided into graph and non-graph methods. Most non-graph methods use infectious disease model and classification model, and seldom consider the clustering characteristics of social time series. In the clustering based time series prediction algorithm (CTP), each cluster centroid is regarded as a kind of propagation pattern, so the prediction can be realized by classifying the nearest neighbor propagation pattern of the prediction object. That is, CTP takes the nearest neighbor clustering centroid of the predicted object as its prediction result. Therefore, the prediction performance of CTP depends on the fit between the prediction object and its nearest clustering centroid. The higher the fitting degree is, the better the prediction performance of CTP is. By analyzing the physical meaning of the scaling distance, it is observed that the scaling distance can better measure the similarity between time series. This paper holds that the nearest neighbor centroid based on the scaling distance of the predicted object may be more suitable for the prediction object to obtain higher prediction performance. However, there is a lack of research on the effect of scaling distance on the prediction of CTP. Therefore, based on CTP and zoom distance, this paper proposes a scalable clustering based time series prediction algorithm S-CTP. The improved S-CTP takes the nearest neighbor clustering centroid of the predicted object as the prediction result to improve its fitting degree with the predicted object. The experimental results on the prediction performance. Twitter and phrase datasets show that S-CTP improves the generalization performance of CTP. In CTP, the similarity between some nearest neighbor clustering members and predictive objects is higher, and the other part is lower, which leads to lower prediction performance of CTP. In order to solve the problem of low prediction performance of CTP, a time series prediction algorithm D-CTP based on piecewise clustering is proposed based on the characteristics of CTP and time series segmentation. In order to select the cluster members most similar to the prediction object, the improved D-CTP always takes the prediction object as the cluster centroid and then refines the cluster centroid in the known length time series of the predicted object and the predicted length time series. Similar to S-CTP 's proposal, In this paper, based on D-CTP and zoom distance, a series prediction algorithm based on scalable piecewise clustering. Twitter and phrase data sets are proposed. The experimental results show that the time series prediction algorithm based on S-CTP is based on both zooming distance and segment clustering. The generalization performance of CTP is improved in one step.
【学位授予单位】:西南交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP393.09
本文编号:2239087
[Abstract]:The increasingly popular social networks provide a wide range of data bases and application scenarios for the prediction of information dissemination. The research of information dissemination prediction is based on the known information dissemination process, using methods to predict the trend of social information in the future, in order to understand the whole process of information dissemination in advance. With the help of information dissemination and prediction method, network companies can better provide personalized recommendation services for users and government departments to take timely and effective public opinion control and guidance. The research of information dissemination prediction involves many fields, such as large-scale data parallel processing, social network topology analysis and text content analysis, which attracts big data and cloud computing. The attention of scholars in the fields of complex networks and natural language processing. Information dissemination prediction is an important research direction in social networks. Recent research methods can be divided into graph and non-graph methods. Most non-graph methods use infectious disease model and classification model, and seldom consider the clustering characteristics of social time series. In the clustering based time series prediction algorithm (CTP), each cluster centroid is regarded as a kind of propagation pattern, so the prediction can be realized by classifying the nearest neighbor propagation pattern of the prediction object. That is, CTP takes the nearest neighbor clustering centroid of the predicted object as its prediction result. Therefore, the prediction performance of CTP depends on the fit between the prediction object and its nearest clustering centroid. The higher the fitting degree is, the better the prediction performance of CTP is. By analyzing the physical meaning of the scaling distance, it is observed that the scaling distance can better measure the similarity between time series. This paper holds that the nearest neighbor centroid based on the scaling distance of the predicted object may be more suitable for the prediction object to obtain higher prediction performance. However, there is a lack of research on the effect of scaling distance on the prediction of CTP. Therefore, based on CTP and zoom distance, this paper proposes a scalable clustering based time series prediction algorithm S-CTP. The improved S-CTP takes the nearest neighbor clustering centroid of the predicted object as the prediction result to improve its fitting degree with the predicted object. The experimental results on the prediction performance. Twitter and phrase datasets show that S-CTP improves the generalization performance of CTP. In CTP, the similarity between some nearest neighbor clustering members and predictive objects is higher, and the other part is lower, which leads to lower prediction performance of CTP. In order to solve the problem of low prediction performance of CTP, a time series prediction algorithm D-CTP based on piecewise clustering is proposed based on the characteristics of CTP and time series segmentation. In order to select the cluster members most similar to the prediction object, the improved D-CTP always takes the prediction object as the cluster centroid and then refines the cluster centroid in the known length time series of the predicted object and the predicted length time series. Similar to S-CTP 's proposal, In this paper, based on D-CTP and zoom distance, a series prediction algorithm based on scalable piecewise clustering. Twitter and phrase data sets are proposed. The experimental results show that the time series prediction algorithm based on S-CTP is based on both zooming distance and segment clustering. The generalization performance of CTP is improved in one step.
【学位授予单位】:西南交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP393.09
【参考文献】
相关期刊论文 前10条
1 游新年;刘群;;基于传染病模型的微博信息传播预测研究[J];计算机应用与软件;2016年05期
2 李洋;陈毅恒;刘挺;;微博信息传播预测研究综述[J];软件学报;2016年02期
3 周雪峰;徐恪;张蓝珊;张赛;;社交网络的传播测量与时间序列聚类分析[J];小型微型计算机系统;2015年07期
4 孔庆超;毛文吉;;基于动态演化的讨论帖流行度预测[J];软件学报;2014年12期
5 曹玖新;吴江林;石伟;刘波;郑啸;罗军舟;;新浪微博网信息传播分析与预测[J];计算机学报;2014年04期
6 毛佳昕;刘奕群;张敏;马少平;;基于用户行为的微博用户社会影响力分析[J];计算机学报;2014年04期
7 王昊;李义萍;冯卓楠;冯铃;;流行病模型在微博转发预测中的应用(英文)[J];中国通信;2013年03期
8 易成岐;鲍媛媛;薛一波;姜京池;;新浪微博的大规模信息传播规律研究[J];计算机科学与探索;2013年06期
9 韩忠明;陈妮;乐嘉锦;段大高;孙践知;;面向热点话题时间序列的有效聚类算法研究[J];计算机学报;2012年11期
10 张赛;徐恪;李海涛;;微博类社交网络中信息传播的测量与分析[J];西安交通大学学报;2013年02期
,本文编号:2239087
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2239087.html