构建集成系统：基于规则与统计数据的机器翻译

发布时间：2018-07-08 07:43

本文选题：机器翻译 + 人工智能　；参考：《广东商学院》2012年硕士论文

【摘要】：机器翻译的发展从最早提出理论设想到今天已经经历了六十多年的历史了。如今主流的机器翻译算法主要分成两大阵营：基于规则以及基于统计数据的机器翻译。基于规则的机器翻译核心是依赖于预先人工设置的语法规则模块作为语法分析的凭据；而对于基于统计数据的机器翻译来说，翻译的核心就是网络爬虫的文件扫描归类机制，以及该机制所创建的动态参考数据库。也就是说，，基于规则的机器翻译是模块性系统，而基于统计数据的机器翻译是基于过程类系统。本论文从乔姆斯基语法的视角下阐述基于规则的机器翻译系统独特的模块化处理优势以及在具体自然语言处理上的不足，并从奈达对翻译过程的理论的视角下分析基于统计数据的机器翻译系统的过程优势以及语法分析不稳定的劣势。本论文通过结合基于规则的翻译系统的“图书馆”和“语法分析器”以及基于统计数据的翻译系统的“爬虫”（也称漫游）机制来建立一个集成模块优势和过程优势的系统，通过将图书馆的语法机制融入奈达的翻译步骤来解决基于统计数据翻译系统中的语法分析不确定性，弥补前者在自然语言处理上的不足以及后者在语法分析上的薄弱。本文最后勾勒了机器发展将来以图书馆和语法分析器为借鉴，以爬虫建立后备资料数据的趋势，并在系统和接口硬件上集成的趋势以及展望
[Abstract]:The development of machine translation has gone through more than 60 years since it was first proposed. Nowadays, the mainstream machine translation algorithms are divided into two main camps: rule-based and statistical-based machine translation. The core of rule-based machine translation is to rely on the pre-set grammar rules module as the credentials of syntax analysis. For machine translation based on statistics, the core of translation is the file scanning and categorization mechanism of the web crawler. And the dynamic reference database created by this mechanism. That is to say, rule-based machine translation is a modular system, while statistical data based machine translation is a process-based system. From the perspective of Chomsky's grammar, this thesis expounds the unique modular processing advantages of rule-based machine translation system and its shortcomings in specific natural language processing. From the perspective of Nida's theory of translation process, this paper analyzes the process advantages of machine translation system based on statistical data and the disadvantage of unstable grammar analysis. In this paper, we combine the "library" and "parser" of the rule-based translation system and the "crawler" mechanism of the statistical data based translation system to establish a system that integrates the advantages of modules and processes. By incorporating the grammatical mechanism of library into Nida's translation steps, the uncertainty of grammatical analysis in the statistical data translation system is solved, and the deficiency of the former in natural language processing and the weakness in grammatical analysis of the latter are remedied. Finally, this paper outlines the trend of machine development in the future, using library and parser as reference, using reptile to build backup data, and integrating system and interface hardware.
【学位授予单位】：广东商学院
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：H085

【相似文献】