当前位置:主页 > 科技论文 > 数学论文 >

基于多样化Top-k Shapelets的时间序列分类方法研究

发布时间:2018-10-08 20:09
【摘要】:时间序列是指将某种现象某一个统计指标在不同时间上的数值按时间先后顺序形成的序列。由于真实系统或现象的内部通常会受到多种因素的影响,从而导致输出的时间序列具有许多复杂的表现:维度高、结构复杂、存在噪声以及存在相似性变形等。传统时间序列分析方法采用统计学方法对时间序列进行建模,但其复杂的特性使得构建的模型很难满足实际系统的要求,因此基于数据挖掘的时间序列研究方法应运而生,使得时间序列挖掘成为一个活跃的研究领域。时间序列分类是时间序列数据挖掘领域的一类重要研究内容,其任务是通过构建分类器为给定的时间序列数据分配一个类标号。作为一种针对局部形态特征的分类方法,shapelets能够区分子序列之间微小的差别,从而获得良好的分类效果,在医疗诊断、姿势识别等多个领域得到应用,但仍然存在亟待解决的问题。本文针对这些问题,所做的主要研究内容如下:(1)针对现有基于shapelets的分类方法中最优shapelets集合存在冗余的问题,提出了一种基于多样化top-k shapelets转换的时间序列分类方法(Div Top KShapelet)。本文引入数据检索领域的多样化top-k查询方法,提出了多样化top-k shapelets的概念及相对应的多样化top-k shapelets图,对候选的shapelets进行处理,从中选出最具有辨别能力且彼此不相似的shapelets,同时,使用SAX技术对原始的时间序列数据集进行降维。实验结果表明:该方法不仅比传统分类方法具有更高的准确率,而且与使用聚类筛选的方法(Cluster Shapelet)和shapelets覆盖的方法(Shapelet Selection)相比,分类准确率最多提高了48.43%和32.61%;同时在所有15个数据集上均有计算效率的提升,最少加速了1.09倍,最高可达到287.8倍。(2)针对现有shapelets分类方法不能解决不平衡时间序列分类的问题,提出了基于多样化top-k shapelets转换的时间序列分类方法(Div IMShapelet+SMOTE)。将不平衡数据分类评价指标AUC,代替传统的信息熵作为衡量shapelets的标准,并利用多样化top-k shapelets对训练集进行转换,最后使用SMOTE方法对转换后的训练集进行过采样。该方法利用AUC值对不平衡数据不敏感的特性,使shapelets特征更能准确评估分类的准确性,不仅可以有效提取时间序列特征,而且在特征的基础上进行数据集的平衡处理。实验表明:与Div Top KShapelet和INOS+SVM方法相比,Div IMShapelet+SMOTE的效果最好,分类准确率最多提高了38.8%和10.2%,AUC最多提高了0.37和0.08,F-measure最多提高了0.35和0.15,能够有效处理不平衡时间序列数据分类问题。
[Abstract]:A time series is a series in which the values of a certain statistical index in different time are formed in order of time. Because the interior of real system or phenomenon is usually affected by many factors, the output time series have many complex manifestations: high dimension, complex structure, noise and similarity deformation. The traditional time series analysis method uses the statistical method to model the time series, but its complex characteristics make it difficult to meet the requirements of the actual system, so the time series research method based on data mining emerges as the times require. It makes time series mining an active research field. Time series classification is an important research content in the field of time series data mining. Its task is to assign a class number to a given time series data by constructing a classifier. As a classification method based on local morphological features, shapelets can make small differences between molecular sequences, thus obtaining good classification effect. It has been applied in many fields, such as medical diagnosis, posture recognition, etc. But there are still problems to be solved. The main research contents of this paper are as follows: (1) aiming at the redundancy of optimal shapelets set in existing classification methods based on shapelets, a time series classification method based on diversified top-k shapelets transformation, (Div Top KShapelet)., is proposed. In this paper, we introduce the diversified top-k query method in the field of data retrieval, propose the concept of diversified top-k shapelets and the corresponding diversified top-k shapelets diagram, process the candidate shapelets, and select the most discriminative and dissimilar shapelets, simultaneously. Using SAX technology to reduce the dimension of the original time series data set. The experimental results show that the proposed method not only has a higher accuracy than the traditional classification method, but also compares with the clustering filtering method (Cluster Shapelet) and the shapelets covering method (Shapelet Selection). The accuracy of classification is increased by 48.43% and 32.61%, and the computational efficiency is improved on all 15 data sets, which accelerates at least 1.09 times and can reach 287.8 times. (2) the existing shapelets classification method can not solve the problem of unbalanced time series classification. A time series classification method, (Div IMShapelet SMOTE)., based on diversified top-k shapelets transformation is proposed. The unbalanced data classification and evaluation index (AUC,) is used to replace the traditional information entropy as the standard to measure shapelets, and the training set is converted by using diversified top-k shapelets. Finally, the transformed training set is oversampled by SMOTE method. In this method, the AUC value is insensitive to unbalanced data, so that the shapelets feature can evaluate the accuracy of classification more accurately. It can not only extract the feature of time series effectively, but also deal with the balance of data set on the basis of feature. The experimental results show that compared with Div Top KShapelet and INOS SVM methods, Div IMShapelet SMOTE has the best effect. The classification accuracy is increased by 38.8% and 10.2% respectively. The maximum increases of 0.37 and 0.08 F-measure are 0.37 and 0.35 and 0.15, respectively, which can effectively deal with the classification problem of unbalanced time series data.
【学位授予单位】:中国矿业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13;O211.61

【参考文献】

相关期刊论文 前3条

1 原继东;王志海;韩萌;;基于Shapelet剪枝和覆盖的时间序列分类算法[J];软件学报;2015年09期

2 原继东;王志海;韩萌;游洋;;基于逻辑shapelets转换的时间序列分类算法[J];计算机学报;2015年07期

3 叶志飞;文益民;吕宝粮;;不平衡分类问题研究综述[J];智能系统学报;2009年02期



本文编号:2258104

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/yysx/2258104.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户ea4f8***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com