面向多语种的新闻翻译及信息抽取系统的设计与实现
发布时间:2018-04-16 15:39
本文选题:网络新闻 + 网络爬虫 ; 参考:《哈尔滨工业大学》2017年硕士论文
【摘要】:随着经济全球化程度的进一步加深,我国和世界各国之间的交流、合作越来越频繁,为此国家提出并积极推动“一带一路”战略,加强与区域各国的沟通合作。在“一带一路”政策的带动下,各国政府和企业积极进行交流和合作,双方不断加深了解、促进共同发展。与此同时,各国人民之间交流互通也越发频繁。而网络新闻作为记录与传播信息的新媒体,其实时性、真实性以及覆盖范围广等特点,使得越来越多的人们通过新闻这扇窗户更多的了解国外的信息。但是语言的不通成为了区域各国和人民沟通与交流的最大障碍。在积极推动“一带一路”战略的关键时期,对与各国语言的新闻报道、政府公文等文本的翻译需求量大增。但是人工翻译并不能满足现有的大规模的文本翻译需求,而当前阶段神经网络机器翻译(NMT)技术蓬勃发展,并且在在英语、德语、俄语以及中文等语言方面取得了非常好的效果。因此,对于网络新闻翻译的开发显得尤为关键。与此同时,当前网络上的各种新闻铺天盖地,人们也迫切地希望有这么一个工具可以帮助自己用最短的时间了解最多的最有用的新闻。因此,为了方便用户快速的了解各国的新闻报道,方便用户的阅读以及判断该新闻的可读性,所以基于网络爬虫的新闻获取,以及获取新闻翻译后对于其内容的信息抽取也是十分关键的。对此,本课题提出了面向多语言的新闻信息抽取及翻译系统的设计与实现的工作。
[Abstract]:With the further deepening of economic globalization, the exchanges and cooperation between China and the rest of the world become more and more frequent. Therefore, the countries put forward and actively promote the strategy of "Belt and Road" and strengthen the communication and cooperation with regional countries.Under the impetus of Belt and Road's policy, governments and enterprises of various countries have actively carried out exchanges and cooperation, and the two sides have continuously deepened their understanding and promoted common development.At the same time, people from all over the world communicate more and more frequently.As a new media to record and spread information, network news has the characteristics of real-time, authenticity and wide coverage, which makes more and more people know more foreign information through the window of news.However, language barrier has become the biggest obstacle to communication and communication among countries and people in the region.In the critical period of actively promoting Belt and Road's strategy, the translation demand for news reports, government documents and other texts in various languages has increased greatly.However, manual translation can not meet the needs of large-scale text translation. At present, the neural network machine translation (NMTT) technology is booming, and it has achieved very good results in English, German, Russian and Chinese.Therefore, the development of network news translation is particularly critical.At the same time, with all kinds of news on the Internet, people are eager to have such a tool to help themselves to know the most useful news in the shortest time.Therefore, in order to facilitate the users to quickly understand the news reports in various countries, to facilitate the reading of the news and to judge the readability of the news, the news acquisition based on the web crawler,It is also very important to extract the information of news translation.In this paper, the design and implementation of multilingual news information extraction and translation system are presented.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.2
【参考文献】
相关期刊论文 前7条
1 刘美良;;语料库语言学综述[J];科技信息;2010年21期
2 孙立伟;何国辉;吴礼发;;网络爬虫技术的研究[J];电脑知识与技术;2010年15期
3 常宝宝;俞士汶;;语料库技术及其应用[J];外语研究;2009年05期
4 蒲筱哥;;基于Web的信息抽取技术研究综述[J];现代情报;2007年10期
5 刘世涛;;简析搜索引擎中网络爬虫的搜索策略[J];阜阳师范学院学报(自然科学版);2006年03期
6 张晓艳;王挺;陈火旺;;命名实体识别研究[J];计算机科学;2005年04期
7 朱虹;刘扬;;词汇语义知识库的研究现状与发展趋势[J];情报学报;2008年06期
相关硕士学位论文 前2条
1 崔金国;基于蚁群算法的主题爬虫技术研究与实现[D];成都理工大学;2010年
2 陈奋;过滤型网络爬虫的研究与设计[D];厦门大学;2007年
,本文编号:1759593
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1759593.html