从网页和在线中英词典中获取专业术语翻译的方法研究和实现
发布时间:2018-10-05 18:38
【摘要】:特定领域中的专业术语是领域的核心概念,承载着丰富的领域信息。由于专业术语不断增长和变化,,其翻译成为机器翻译和信息检索的难题之一。基于统计和规则的方法在翻译术语时都遇到一定的困难。本文以Web为语料库,结合Web挖掘和知识获取的手段研究中文专业术语的英文翻译问题。不仅有助于解决术语翻译这一难题,实现半自动建立术语双语词典,而且还将对跨语言信息检索、跨语言知识获取等工作产生积极的推动作用。本文主要围绕以下几个方面展开研究。 1)首先分析基于Web的翻译获取面临的主要问题、难点及研究现状,并探讨以往研究的不足,接着给出从网页中获取术语翻译的基本流程和思路。 2)利用基于Web的信息抽取技术和语义预测原则,结合术语部分翻译构建查询项,从搜索引擎中返回术语翻译相关网页,解决了术语翻译双语共现语料难以获取的问题。高质量的术语翻译相关语料获取,为后续的术语抽取打下了良好的基础。 3)利用知识获取技术,结合半结构化文本分析方法以及统计和规则结合的信息抽取方法从网页中抽取术语翻译。提出了基于模板、词典模式和位置模式三种抽取方法相结合的抽取思路,在保证召回率的前提下最大限度提高返回结果的准确性。 4)为了排除翻译结果中的噪声数据,本文利用手工整理的术语双语对齐语料提出端类比对齐、双语对齐度和构词法三种验证方法,对候选翻译进行充分不必要验证。术语翻译验证过程保证了术语翻译的准确性,使系统的实用性和可靠性更高。 5)对常用的术语采用在线中英词典辅助翻译,保证术语翻译精度的同时提高翻译获取系统的效率。 对不同领域术语的获取实验表明,本文从网页和在线词典中获取术语翻译的方法和系统具有很好的准确性,较前人方法有显著提高,且系统耗时少,实用性很强。
[Abstract]:The term in a specific domain is the core concept of the domain, carrying rich domain information. The translation of technical terms has become one of the difficult problems in machine translation and information retrieval due to the increasing and changing of technical terms. Both statistical and rule-based approaches have encountered difficulties in translating terminology. In this paper, Web is used as a corpus to study English translation of Chinese technical terms by means of Web mining and knowledge acquisition. It is not only helpful to solve the problem of term translation, but also has a positive effect on cross-language information retrieval, cross-language knowledge acquisition and so on. This paper mainly focuses on the following aspects. The main contents of this paper are as follows: 1) the main problems, difficulties and current research situation of translation acquisition based on Web are analyzed, and the shortcomings of previous studies are discussed. Then, the basic flow and thinking of obtaining terminology translation from web pages are given. 2) using the information extraction technology based on Web and semantic prediction principle, combining with the partial translation of terms to construct query items, returning the related web pages of term translation from search engine, solving the problem that the bilingual cooccurrence data of term translation is difficult to obtain. High-quality terminology translation related data acquisition for the subsequent term extraction laid a good foundation. 3) using knowledge acquisition technology, combining semi-structured text analysis method and information extraction method of statistics and rules to extract terminology translation from web pages. Based on template, dictionary pattern and location mode, the extraction method is proposed, which can improve the accuracy of the result under the premise of guaranteeing recall rate. 4) in order to eliminate the noise data from the translation results, this paper proposes three verification methods: end analogy alignment, bilingual alignment and word-formation, using the manually compiled term bilingual alignment corpus to verify candidate translation unnecessarily. Term translation verification ensures the accuracy of term translation and makes the system more practical and reliable. 5) On-line Chinese-English dictionaries are used to help the translation of commonly used terms to ensure the accuracy of the translation and improve the efficiency of the translation acquisition system. The experiments on the acquisition of terms in different fields show that the method and system of terms translation from web pages and online dictionaries have good accuracy and are significantly improved compared with previous methods, and the system takes less time and has strong practicability.
【学位授予单位】:江苏科技大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.1
本文编号:2254447
[Abstract]:The term in a specific domain is the core concept of the domain, carrying rich domain information. The translation of technical terms has become one of the difficult problems in machine translation and information retrieval due to the increasing and changing of technical terms. Both statistical and rule-based approaches have encountered difficulties in translating terminology. In this paper, Web is used as a corpus to study English translation of Chinese technical terms by means of Web mining and knowledge acquisition. It is not only helpful to solve the problem of term translation, but also has a positive effect on cross-language information retrieval, cross-language knowledge acquisition and so on. This paper mainly focuses on the following aspects. The main contents of this paper are as follows: 1) the main problems, difficulties and current research situation of translation acquisition based on Web are analyzed, and the shortcomings of previous studies are discussed. Then, the basic flow and thinking of obtaining terminology translation from web pages are given. 2) using the information extraction technology based on Web and semantic prediction principle, combining with the partial translation of terms to construct query items, returning the related web pages of term translation from search engine, solving the problem that the bilingual cooccurrence data of term translation is difficult to obtain. High-quality terminology translation related data acquisition for the subsequent term extraction laid a good foundation. 3) using knowledge acquisition technology, combining semi-structured text analysis method and information extraction method of statistics and rules to extract terminology translation from web pages. Based on template, dictionary pattern and location mode, the extraction method is proposed, which can improve the accuracy of the result under the premise of guaranteeing recall rate. 4) in order to eliminate the noise data from the translation results, this paper proposes three verification methods: end analogy alignment, bilingual alignment and word-formation, using the manually compiled term bilingual alignment corpus to verify candidate translation unnecessarily. Term translation verification ensures the accuracy of term translation and makes the system more practical and reliable. 5) On-line Chinese-English dictionaries are used to help the translation of commonly used terms to ensure the accuracy of the translation and improve the efficiency of the translation acquisition system. The experiments on the acquisition of terms in different fields show that the method and system of terms translation from web pages and online dictionaries have good accuracy and are significantly improved compared with previous methods, and the system takes less time and has strong practicability.
【学位授予单位】:江苏科技大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.1
【参考文献】
相关期刊论文 前10条
1 陈魁;冯寅;;一种基于隐马尔可夫模型的第一类对位生成方法[J];福建电脑;2008年09期
2 何中军;刘群;林守勋;;基于短语相似度的统计机器翻译模型[J];高技术通讯;2009年04期
3 董燕举;白宇;蔡东风;;基于Web的中英术语翻译获取方法研究[J];沈阳航空工业学院学报;2010年02期
4 李保利,陈玉忠,俞士汶;信息抽取研究综述[J];计算机工程与应用;2003年10期
5 邓丹,刘群,俞鸿魁;基于双语词典的汉英词语对齐算法研究[J];计算机工程;2005年16期
6 夏天;;汉语词语语义相似度计算研究[J];计算机工程;2007年06期
7 吕学强,吴宏林,姚天顺;无双语词典的英汉词对齐[J];计算机学报;2004年08期
8 刘群;统计机器翻译综述[J];中文信息学报;2003年04期
9 蒋龙;周明;简立峰;;利用音译和网络挖掘翻译命名实体[J];中文信息学报;2007年01期
10 熊德意;刘群;林守勋;;基于句法的统计机器翻译综述[J];中文信息学报;2008年02期
相关硕士学位论文 前2条
1 邓丹;汉英词语对齐技术研究[D];中国科学院研究生院(计算技术研究所);2004年
2 王旭东;基于Web的信息抽取技术研究[D];西南交通大学;2008年
本文编号:2254447
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2254447.html