发布时间:2023-06-04 05:07
【文章页数】:71 页
Chapter1 Introduction
1.1 Introduction to Mining Software Repositories
1.2 Background research and objectives
1.2.1 Research Objectives and contribution of the thesis
1.2.2 Background research
1.2.3 Motivation
1.3 Literature Review and Analysis
Chapter2 Theoretical Background
2.1 Version Control Systems
2.1.1 Source Forge
2.1.2 Git Hub
2.2 Bug tracking systems
2.2.1 Bugzilla
2.3 Information retrieval
2.3.1 Common terminology
2.4 Commonly used IR models
2.4.1 Vector Space Model(VSM)
2.4.2 Revised Vector Space Model(r VSM)
2.4.3 Latent Semantic Indexing(LSI)
2.4.4 Probabilistic Latent Semantic Indexing(PLSI)
2.4.5 Latent Dirichlet Allocation
2.5 Word embeddings
2.5.1 Vector space model and statistical language model
2.5.2 Representing text with embeddings
2.5.3 Types of word embeddings
2.6 Abstract Syntax Trees
2.7 Summary
Chapter3 Bridging the Lexical Gap
3.1 Pretrained Word Embedding Models
3.1.1 word2Vec model trained on Stack Overflow posts
3.1.2 Fast Text model trained on Common Crawl
3.1.3 Glo Ve model trained on Common Crawl
3.1.4 fast Text model trained on source code files
3.2 Types of similarity
3.2.1 Lexical similarity
3.2.2 Semantic similarity
3.3 Similarity measures
3.3.1 Cosine similarity
3.3.2 Word Mover distance
3.4 Objective Function and Optimization
3.4.1 Differential evolution
3.5 Structure of the model
3.6 Summary
Chapter4 Experimental Setup And Results
4.1 Data collection
4.2 Parsing and preprocessing
4.2.1 Tokenization and linguistic preprocessing of tokens
4.3 Experiments with different preprocessing techniques
4.3.1 Embedding whole content of source files
4.3.2 Parsing ASTs of source code files
4.4 Experiments with different pretrained vectors
4.5 Evaluation
4.6 Results
4.6.1 fast Text vectors trained on Common Crawl data
4.6.2 Glo Ve vectors trained on Common Crawl data
4.6.3 Word2Vec vectors trained on Stack Overflow data
4.7 Comparison with other models
4.7.1 Comparison with the base r VSM model
4.7.2 Comparison of the proposed model with Bug Locator
4.8 Summary
【文章页数】:71 页
Chapter1 Introduction
1.1 Introduction to Mining Software Repositories
1.2 Background research and objectives
1.2.1 Research Objectives and contribution of the thesis
1.2.2 Background research
1.2.3 Motivation
1.3 Literature Review and Analysis
Chapter2 Theoretical Background
2.1 Version Control Systems
2.1.1 Source Forge
2.1.2 Git Hub
2.2 Bug tracking systems
2.2.1 Bugzilla
2.3 Information retrieval
2.3.1 Common terminology
2.4 Commonly used IR models
2.4.1 Vector Space Model(VSM)
2.4.2 Revised Vector Space Model(r VSM)
2.4.3 Latent Semantic Indexing(LSI)
2.4.4 Probabilistic Latent Semantic Indexing(PLSI)
2.4.5 Latent Dirichlet Allocation
2.5 Word embeddings
2.5.1 Vector space model and statistical language model
2.5.2 Representing text with embeddings
2.5.3 Types of word embeddings
2.6 Abstract Syntax Trees
2.7 Summary
Chapter3 Bridging the Lexical Gap
3.1 Pretrained Word Embedding Models
3.1.1 word2Vec model trained on Stack Overflow posts
3.1.2 Fast Text model trained on Common Crawl
3.1.3 Glo Ve model trained on Common Crawl
3.1.4 fast Text model trained on source code files
3.2 Types of similarity
3.2.1 Lexical similarity
3.2.2 Semantic similarity
3.3 Similarity measures
3.3.1 Cosine similarity
3.3.2 Word Mover distance
3.4 Objective Function and Optimization
3.4.1 Differential evolution
3.5 Structure of the model
3.6 Summary
Chapter4 Experimental Setup And Results
4.1 Data collection
4.2 Parsing and preprocessing
4.2.1 Tokenization and linguistic preprocessing of tokens
4.3 Experiments with different preprocessing techniques
4.3.1 Embedding whole content of source files
4.3.2 Parsing ASTs of source code files
4.4 Experiments with different pretrained vectors
4.5 Evaluation
4.6 Results
4.6.1 fast Text vectors trained on Common Crawl data
4.6.2 Glo Ve vectors trained on Common Crawl data
4.6.3 Word2Vec vectors trained on Stack Overflow data
4.7 Comparison with other models
4.7.1 Comparison with the base r VSM model
4.7.2 Comparison of the proposed model with Bug Locator
4.8 Summary