面向SWIFT报文生成的信息抽取与辅助翻译方法研究
发布时间:2018-07-11 12:07
本文选题:SWIFT + 报文 ; 参考:《哈尔滨工业大学》2016年硕士论文
【摘要】:SWIFT(Society for Worldwide Interbank Financial Telecommunications)是全球银行金融通讯协会的简称,SWIFT运营着全世界大部分金融机构的信息交换网络,为世界范围内快捷、准确、优良的金融交易提供保障。深圳证券信息有限公司于2008年加入SWIFT组织,向SWIFT组织和会员提供公司行为(Corporate Action,CA)报文服务。SWIFT报文生成流程是从上市公司公告文本中抽取公司行为数据,然后把公司行为数据从中文翻译成英文,最后填充SWIFT模板生成报文文件。目前,翻译和抽取都依赖人工、效率较低、数据一致性难以保证,针对这些问题,本文主要研究面向SWIFT报文自动生成的信息抽取与辅助翻译方法。本文主要工作包括:第一、在对股东大会通知公告文本特点进行深入分析的基础上,本文设计了一种基于随机森林分类器的股东大会基础信息抽取方法。该方法首先剔除不相关的文本段落,然后通过文本分类获取公告中会议基础信息所对应的段落,最后对发现的段落进行正则匹配,获取股东大会基础信息属性和属性值。在利用2014-2015年1000个股东大会通知公告文本构造的数据集上,上述方法获得了F值为0.92的股东大会基础信息属性和属性值的抽取性能。第二、针对股东大会基础信息文本具有结构和表达规范性的特点,本文设计了一种基于命名实体识别和文本相似度的计算机辅助翻译方法。该方法首先利用条件随机场模型从文本中识别机构和人名实体以及数字,并进行实体抽象,应用文本相似度模型从平行语料库中找到与输入议案文本最相似的翻译样本,并进一步通过将样本英文中的人名和数字进行替换,获得最终的翻译结果。在利用2010-2015年660251条股东大会议案翻译对建立平行语料库的基础上,应用本文提出的计算机辅助翻译方法,对1万条SWIFT议案文本进行测试,使用BLEU*评价方法得分为0.83,完全匹配得分为0.69。在上述技术的基础上,本文构建了一个面向SWIFT报文生成的信息抽取和辅助翻译系统。该系统目前已经用于生产实践,实现了股东大会SWIFT报文的自动生成与可视化。该系统能够很好地提高SWIFT报文生产速度,减少了人工依赖和成本,提高了报文生成的一致性。
[Abstract]:Swift (Society for Worldwide Interbank Telecommunications) is a global association of banking and financial communications, which operates the information exchange network of most financial institutions in the world, which provides a guarantee for fast, accurate and excellent financial transactions in the world. Shenzhen Securities Information Co., Ltd. joined Swift in 2008 to provide Swift organization and members with Corporate Action CA message service .Swift message generation process is to extract corporate behavior data from the public announcement text. Then the corporate behavior data is translated from Chinese to English, and the Swift template is filled to generate the message file. At present, translation and extraction rely on manual, low efficiency and difficult to ensure the consistency of data. In view of these problems, this paper mainly studies information extraction and auxiliary translation methods for Swift message generation automatically. The main work of this paper includes: first, based on the in-depth analysis of the characteristics of the notice text of shareholders' general meeting, this paper designs a method of extracting basic information of shareholders' general meeting based on stochastic forest classifier. Firstly, the irrelevant text paragraphs are eliminated, then the corresponding paragraphs of the meeting basic information in the announcement are obtained by text classification. Finally, the found paragraphs are regularly matched to obtain the attributes and attribute values of the basic information of the shareholders' general meeting. In the data set constructed from the text of 1000 notice notices of shareholders' general meeting in 2014-2015, the above method obtained the basic information attribute and the extraction performance of attribute value of the shareholders' meeting with F value of 0.92. Secondly, aiming at the characteristics of the structure and expression of the basic information text of shareholders' general meeting, this paper designs a computer-aided translation method based on named entity recognition and text similarity. Firstly, the conditional random field model is used to identify the entity and the number from the text, and then the text similarity model is used to find the most similar translation sample from the parallel corpus. The final translation results are obtained by replacing the names and numbers in the sample English. On the basis of establishing a parallel corpus by using the translation of 660251 motions of shareholders' meeting from 2010 to 2015, this paper uses the computer-aided translation method proposed in this paper to test 10,000 Swift texts. Using the BLEU* evaluation method, the score was 0. 83, and the perfect match score was 0. 69. Based on the above techniques, a message extraction and translation system for Swift message generation is constructed. The system has been used in production practice and realized the automatic generation and visualization of Swift message. The system can improve the speed of Swift message production, reduce the labor dependence and cost, and improve the consistency of message generation.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1
【参考文献】
相关期刊论文 前5条
1 虞宁翌;饶高琦;荀恩东;;文言信息的自动抽取:基于统计和规则的尝试[J];中文信息学报;2015年06期
2 郭少华;郭岩;李海燕;刘悦;张瑾;程学旗;;可扩展的网页关键信息抽取研究[J];中文信息学报;2015年01期
3 俞敬松;王惠临;吴胜兰;;高正确率的双语语块对齐算法研究[J];中文信息学报;2015年01期
4 熊维;吴健;刘汇丹;张立强;;基于短语串实例的汉藏辅助翻译[J];中文信息学报;2013年03期
5 叶娜;张桂平;韩亚冬;蔡东风;;基于用户行为模型的计算机辅助翻译方法[J];中文信息学报;2011年03期
,本文编号:2115145
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2115145.html