当前位置:主页 > 科技论文 > 信息工程论文 >

基于IPTV机顶盒KPI数据的用户报障预测系统

发布时间:2018-06-24 00:40

  本文选题:IPTV + 非均衡数据集 ; 参考:《南京邮电大学》2017年硕士论文


【摘要】:随着互联网技术的飞速发展,IPTV变得越来越重要,已经成为我们平常生活中不可缺少的一部分。IPTV用户希望获得更好的服务体验。为了更好地服务IPTV用户以及充分利用数据,运营商希望利用IPTV机顶盒KPI数据建立用户报障预测模型,通过对IPTV机顶盒KPI数据的分析,预测出即将报障的用户。运营商可以与这些即将报障的用户进行沟通,以便及时发现问题并且解决问题,这样就能够提高运营商与用户之间的黏度。本论文从数据分析,建模预测与系统部署三个部分,展开了一系列的研究。本论文首先对IPTV机顶盒KPI数据进行数据清洗和相关性分析等数据预处理工作,得到适合建模的数据集。由于数据集为非均衡数据集,在建立用户报障预测模型时,本论文分别从算法层面和数据层面建立模型。在算法层面建立模型时,本论文基于传统决策树算法提出了无偏决策树算法。无偏决策树算法改进了传统决策树算法的特征选择准则和叶节点判定准则,可以直接处理非均衡数据集。在数据层面建立模型时,本论文基于传统的过采样算法提出了新的过采样算法,基于平均距离的自适应合成过采样方法。新的过采样算法利用少数类样本点与它周围多数类样本点之间的平均距离作为一维参数来自适应地生成人造样本点。在数据集均衡后,使用随机森林算法行进建模。实验表明,数据层面的用户报障模型性能较好。在系统部署时,本论文采用Spark on YARN的部署模式处理数据建立模型,最后以页面可视化的方式呈现给IPTV运维人员。
[Abstract]:With the rapid development of Internet technology IPTV has become more and more important and has become an indispensable part of our daily life. IPTV users want to get a better service experience. In order to better serve IPTV users and make full use of data, operators hope to use IPTV set-top box KPI data to set up user barrier prediction model. Through the analysis of IPTV set-top box KPI data, the users about to report obstacles are predicted. Operators can communicate with those users who are going to be in trouble so that problems can be detected and solved in time so that the viscosity between operators and customers can be improved. In this paper, a series of research is carried out from three parts: data analysis, modeling prediction and system deployment. Firstly, the IPTV set-top box (IPTV) KPI data is preprocessed by data cleaning and correlation analysis, and a data set suitable for modeling is obtained. Because the data set is a non-equilibrium data set, this paper builds the model from the algorithm level and the data level respectively when establishing the user barrier prediction model. This paper presents an unbiased decision tree algorithm based on the traditional decision tree algorithm. The unbiased decision tree algorithm improves the feature selection criterion and the leaf node decision criterion of the traditional decision tree algorithm and can deal with the unbalanced data sets directly. In this paper, a new over-sampling algorithm based on traditional over-sampling algorithm and an adaptive composite oversampling method based on average distance are proposed when modeling at the data level. The new oversampling algorithm adaptively generates artificial sample points by using the average distance between a few sample points and most of the sample points around them as one-dimensional parameters. After the data set is equalized, the stochastic forest algorithm is used to model the model. The experimental results show that the performance of the user barrier model on data level is better. When the system is deployed, the deployment mode of Spark on YARN is used to process the data to build the model, and the model is presented to IPTV operators in the form of page visualization.
【学位授予单位】:南京邮电大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN949.292

【参考文献】

相关期刊论文 前1条

1 张世强;吕杰能;蒋峥;张雷;;关于相关系数的探讨[J];数学的实践与认识;2009年19期



本文编号:2059178

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/2059178.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户1ef66***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com