绍纳语词性标注器词法与转换规则的改进方法研究
发布时间:2023-03-10 19:17
自然语言处理(NLP)是指对人类语言的处理,它是人工智能领域内的一门学科。自然语言处理研究的最终目标是解析和理解语言,然而这个目标还尚未实现。因为这个原因,对自然语言处理的大量研究工作集中在中间任务上,所谓的中间任务就是说只研究一些能理解语言中内在结构的一些方法,而不需要完全的理解语言。其中的一个主要研究任务是词性标注或仅仅进行简单的标注。由于绍纳语缺乏标准的词性标注器,导致绍纳语在机器翻译、拼写检查、词典编纂、和自动句法分析和构造等领域,成为研究者们开展研究的主要困难。到目前为止,还没有绍纳语的词性标注的相关研究工作,词性标注器的性能还没有得到足够的改进。因此,本文的研究目的是使用足够大的训练语料来提高Brill词性标注器在绍纳语上的词法和转换规则方面的能力。因此,我们回顾了绍纳语关于语法和形态的文献以理解绍纳语的性质,并且识别出了可能的标注集合。通过阅读资料,我们确定了26个广泛的标注集,并且从包含6750个不同单词的1100个句子中提取了17473个被标注的单词用于训练和测试。其中,258个句子来自于先前的工作中。由于只有少数现成的标准语料库,而人工标注来得到语料库是一项艰巨的任...
【文章页数】:66 页
【学位级别】:硕士
【文章目录】:
摘要
Abstract
Chapter 1 Introduction
1.1 Background
1.2 Statement of the Problem
1.3 Objective
1.3.1 Specific Objectives
1.4 Methodology
1.4.1 Data Collection
1.4.2 Modeling
1.4.3 Testing and Validation
1.5 Tools and Techniques
1.6 Application of Results
1.7 Organization of the Paper
Chapter 2 Literature
2.1 Literature Review
2.1.1 Statistical Approach
2.1.2 Hidden Markov Model
2.1.3 Maximum Entropy Model
2.2 Rule-Based Approach
2.2.1 Transformation-Based Approach
2.2.2 Artificial Neural Network Approach
2.2.3 Hybrid Approach
2.3 Related Works
Chapter 3 Tag-set preparation
3.1 Introduction
3.2 The Shona Language Phonetics
3.3 The Shona Language Sentence Structure
3.4 Shona Language Word Classes
3.4.1 Shona Noun (Zita)
3.4.2 Shona Pronoun
3.4.3 Shona Adjective
3.4.4 Afaan Oromo Verb (Xumura)
3.4.5 Shona Adverbs
3.4.6 Shona Conjunction
3.4.7 Shona Pre-position
3.4.8 Shona Introjections
3.4.9 Shona Numeral
3.5 Shona Tags and Tag sets
Chapter 4 Design of the POS tagger
4.1 Introduction
4.2 Approaches and techniques
4.3 Designing Transformation-based error-driven learning
4.2.1 Rules
4.2.2 Learning Phase
4.2.3 The Lexical Rule Learner
4.2.4 The Contextual Rule Learner
4.2.5 Brill Tagger Architecture
Chapter 5 Implementation
5.1 Introduction
5.2 Corpus Preparation
5.3 Implementation of the Brill's Tagger
5.3.1 Implementation of the Initial State Tagger (HMM Tagger)
5.3.2 Implementation of the Brill's tagger Learning phase
Chapter 6 Experiment and performance analysis
6.1 Introduction
6.2 Experiments
6.2.1 Brill's Tagger Versus Corpus Size
6.3 Performance Analysis
6.4 Discussion
Chapter 7 Conclusion and Recommendation
7.1 Conclusion
7.2 Recommendations
References
Acknowledgements
本文编号:3758406
【文章页数】:66 页
【学位级别】:硕士
【文章目录】:
摘要
Abstract
Chapter 1 Introduction
1.1 Background
1.2 Statement of the Problem
1.3 Objective
1.3.1 Specific Objectives
1.4 Methodology
1.4.1 Data Collection
1.4.2 Modeling
1.4.3 Testing and Validation
1.5 Tools and Techniques
1.6 Application of Results
1.7 Organization of the Paper
Chapter 2 Literature
2.1 Literature Review
2.1.1 Statistical Approach
2.1.2 Hidden Markov Model
2.1.3 Maximum Entropy Model
2.2 Rule-Based Approach
2.2.1 Transformation-Based Approach
2.2.2 Artificial Neural Network Approach
2.2.3 Hybrid Approach
2.3 Related Works
Chapter 3 Tag-set preparation
3.1 Introduction
3.2 The Shona Language Phonetics
3.3 The Shona Language Sentence Structure
3.4 Shona Language Word Classes
3.4.1 Shona Noun (Zita)
3.4.2 Shona Pronoun
3.4.3 Shona Adjective
3.4.4 Afaan Oromo Verb (Xumura)
3.4.5 Shona Adverbs
3.4.6 Shona Conjunction
3.4.7 Shona Pre-position
3.4.8 Shona Introjections
3.4.9 Shona Numeral
3.5 Shona Tags and Tag sets
Chapter 4 Design of the POS tagger
4.1 Introduction
4.2 Approaches and techniques
4.3 Designing Transformation-based error-driven learning
4.2.1 Rules
4.2.2 Learning Phase
4.2.3 The Lexical Rule Learner
4.2.4 The Contextual Rule Learner
4.2.5 Brill Tagger Architecture
Chapter 5 Implementation
5.1 Introduction
5.2 Corpus Preparation
5.3 Implementation of the Brill's Tagger
5.3.1 Implementation of the Initial State Tagger (HMM Tagger)
5.3.2 Implementation of the Brill's tagger Learning phase
Chapter 6 Experiment and performance analysis
6.1 Introduction
6.2 Experiments
6.2.1 Brill's Tagger Versus Corpus Size
6.3 Performance Analysis
6.4 Discussion
Chapter 7 Conclusion and Recommendation
7.1 Conclusion
7.2 Recommendations
References
Acknowledgements
本文编号:3758406
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/3758406.html