Learning on Evolving Data Streams

发布时间:2023-05-20 06:25
  在当今数字时代,海量流式数据正在各种实际应用场景中不断的自动生成。由于数据流具有无限长度及演化的特性,使得学习算法必须在有限的时间内进行处理,因此如何开发高效的数据流学习算法一直是机器学习面临的挑战。为此,大量概念漂移的数据流学习算法在过去十年中相继提出。然而现有数据流挖掘仍面临一些新的问题和挑战。首先是数据的概念演化(即新类问题)。传统分类器往往聚焦固定的类别,而在实际场景中,新的类别可能会随时间推移而增加。其次是数据标签的稀少性问题。传统的数据流挖掘往往采用监督学习框架。然而数据流的样本标注将需要大量的时间和资源,现实场景往往仅能提供少量标签实例。因此如何设计一种可靠的半监督学习算法是面临的另一个挑战。另外,数据流中的另一个挑战就是数据的高维问题,它可能会严重影响学习算法的性能。针对这些问题,本文提出了一些新的数据流学习算法,其重要的贡献如下:1.针对概念演化问题,本文提出了一种新的数据流分类算法用于检测和学习新类。新提出的算法能够同时处理概念漂移和概念演化问题,同时能够处理数据流中的复杂的类分布,在噪声数据中有效区分概念漂移和演化。在人工和真实数据中表明新提出的方法与前沿方法相比...

【文章页数】:155 页

【学位级别】:博士

【文章目录】:
摘要
ABSTRACT
Chapter1 Introduction
    1.1 Research Background and Significance
        1.1.1 Data Stream Mining
        1.1.2 Challenges
    1.2 Research Progress(State-of-the-art)in Data Stream Mining
        1.2.1 Clustering Data Streams
        1.2.2 Data Stream Classification
            1.2.2.1 Stationary Data Stream Classification
            1.2.2.2 Evolving Data Stream Classification
            1.2.2.3 Data Stream Classification with Novel Class Detection
            1.2.2.4 Semi-supervised Data Stream Classification
    1.3 Research Scope and Thesis Contributions
    1.4 Thesis Organization
Chapter2 Foundation of Concepts
    2.1 Definitions
    2.2 Basis of Stream Clustering Algorithms
    2.3 Taxonomy of Clustering Algorithms
    2.4 Basis of Stream Classification Algorithms
        2.4.1 Learning Structure
        2.4.2 Adaptivity Mechanisms
    2.5 Taxonomy of Classification Algorithms
        2.5.1 Approaches Based on Adaptation Process
            2.5.1.1 Informed or Active Approaches
            2.5.1.2 Blind or Passive Approaches
        2.5.2 Approaches Based on Learning Process
            2.5.2.1 Single Classifier
            2.5.2.2 Ensemble Classifiers
    2.6 Evaluation and Performance Criteria
        2.6.1 Evaluation Metrics
        2.6.2 Estimation Techniques
            2.6.2.1 Prequential Evaluation
            2.6.2.2 Hold-out Evaluation
    2.7 Summary
Chapter3 Data Stream Classification with Novel Class Detection
    3.1 Introduction
    3.2 Related Work
    3.3 Proposed Algorithm
        3.3.1 Problem Formalization
        3.3.2 Overview
        3.3.3 Main modules of EMC
            3.3.3.1 Initial Model Construction
            3.3.3.2 New Class Detection
            3.3.3.3 Classification
            3.3.3.4 Model Update
    3.4 Experiment
        3.4.1 Data sets
        3.4.2 Classification Performance
            3.4.2.1 Comparison Methods
            3.4.2.2 Prediction Performance Analysis
            3.4.2.3 Parameters Sensitivity on Classification Performance
        3.4.3 Evaluation of New Class Detection
            3.4.3.1 Comparison Methods
            3.4.3.2 Evaluation Metrics
            3.4.3.3 Performance Analysis
            3.4.3.4 Parameters Sensitivity
    3.5 Summary
Chapter4 Online Reliable Semi-supervised Learning on Evolving Data Streams
    4.1 Introduction
    4.2 Related Work
    4.3 Proposed Algorithm
        4.3.1 Overview
        4.3.2 Main Building Blocks
            4.3.2.1 Initializing Learning Model
            4.3.2.2 Classification
            4.3.2.3 Online Data Maintenance
    4.4 Experiments
        4.4.1 Data sets
            4.4.1.1 Real-world Data sets
            4.4.1.2 Synthetic Data sets
        4.4.2 Comparison Methods
            4.4.2.1 Semi-supervised algorithms
            4.4.2.2 Supervised algorithms
        4.4.3 Results
            4.4.3.1 Comparison with semi-supervised algorithms
            4.4.3.2 Comparison with supervised algorithms
            4.4.3.3 Parameter Sensitivity Analysis
    4.5 Summary
Chapter5 Learning High Dimensional Evolving Data Streams with Limited Labels
    5.1 Introduction
    5.2 Related Work
        5.2.1 Semi-supervised data stream algorithms
        5.2.2 Synchronization-based data mining
        5.2.3 Denoising autoencoder(DAE)based algorithms
    5.3 Proposed Algorithm
        5.3.1 Notations and symbols
        5.3.2 Overview
        5.3.3 Main parts of the proposed algorithm
            5.3.3.1 Denoising autoencoders(DAE)
            5.3.3.2 Synchronization-based dynamic micro-clusters
            5.3.3.3 Model update
    5.4 Experiments
        5.4.1 Datasets
        5.4.2 Comparison algorithms
        5.4.3 Analysis of results
            5.4.3.1 Performance comparison
            5.4.3.2 Parameter sensitivity analysis
    5.5 Summary
Chapter6 Conclusion
    6.1 Summary
        6.1.1 Classification with novel class identification
        6.1.2 Online semi-supervised classification
        6.1.3 Learning high dimensional evolving data stream with limited labels
    6.2 Future work
Acknowledgements
References
Research Results Obtained During the Study for Doctoral Degree



本文编号:3820716

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/3820716.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户a6b9d***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com