基于多知识库科技报告术语实体链接研究
[Abstract]:As an important document resource, it is of great value and significance to excavate and analyze the scientific and technological report. However, at present, the research on science and technology report is still focused on its basic concept, definition of attributes and construction of science and technology report system. There are a large number of technical terminology entities in science and technology reports, which are the main research subjects of science and technology reports, which represent the development status and future trend of science and technology in China. Therefore, it is of great significance to excavate and analyze the contents of science and technology reports and identify the technical terminology entities. As the key technology of natural language processing, entity recognition technology can be used to automatically recognize the names of persons, place names, agency names and other entities in the text. In this paper, the scientific and technological report is taken as the research object. Firstly, the new term discovery technology is used to discover the potential new term in the scientific and technological report, and then the specialized terminology knowledge base is constructed as the corpus support for the identification and link of the term entity. Finally, the Stanford NER entity recognition framework is used to realize the automatic recognition of the terminology entities in the scientific and technological reports, and links disambiguation with multiple knowledge bases. The main research works are as follows: (1) aiming at the problems existing in Chinese word segmentation and the characteristics of the terms in scientific and technological reports, a new word discovery method based on part of speech combination is proposed. By drawing up the rules of part of speech combination of professional terms to extract the words in accordance with the rules, and according to the support degree of the strings and the internal and external characteristics of the words, such as length and mutual information, the new words are determined, and the new words of the professional terms are found effectively. To some extent, it improves the accuracy of Chinese word segmentation, and lays a foundation for the identification of terminology entities. (2) constructing the specialized terminology knowledge base. Entity recognition needs a large number of corpus as the support, through training corpus to extract entity features to achieve automatic entity recognition. Due to the lack of public scientific and technological reporting terminology data, this paper uses the technical terminology knowledge provided by the China Standard terminology Network as the data source and uses the web crawler as the data source. Database and other information technologies design and construct the term knowledge base. (3) the main methods of entity recognition are introduced in detail, and the mature Stanford NER open source entity recognition framework based on conditional random field model is selected to train the term entity model. Realizing the automatic recognition of the technical report term entity, and combining the multi-knowledge base and semantic similarity calculation to realize the link disambiguation of the term entity. (4) selecting the science and technology report issued by the national science and technology report service system as the experimental data. This paper designs and develops a prototype system of entity link of scientific and technological reporting terms based on multi-knowledge base. The system mainly integrates preprocessing of scientific and technological report data, neologism discovery, entity identification and entity link function, realizes automatic recognition and disambiguation of scientific and technological report term entity, and verifies the correctness and validity of this method.
【学位授予单位】:华中师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:G353.1
【相似文献】
相关期刊论文 前10条
1 陈卫红;;论科技报告编辑的全方位能力[J];编辑学报;2006年02期
2 陈馨武;;科技报告在高校教学和科研中的作用[J];高校图书馆工作;1982年04期
3 张龙根;;科技报告的查检[J];图书情报工作;1982年01期
4 秦洪生;;科技报告管理办法应改进[J];兵工情报工作;1986年02期
5 庆芳;《航天部科技报告》编辑出版[J];中国空间科学技术;1987年Z1期
6 王琳,陈京丽;关于加速船舶科技报告发展的探讨[J];情报理论与实践;1997年06期
7 王维亮;美国政府科技报告的调查分析——关于近几年来发行数量减少问题[J];情报理论与实践;2000年02期
8 刘立雪;;我们是怎样用主题键词处理科技报告的[J];图书情报工作;1981年04期
9 刘士星;美国政府科技报告检索工具的特点[J];中国科学技术大学学报;1982年S2期
10 方平;;怎样查阅科技报告中的医学文献[J];医学情报工作;1984年04期
相关会议论文 前3条
1 邹键;;关于科技报告管理体系建设的思考[A];第二届中国航空学会青年科技论坛文集[C];2006年
2 邹键;;关于科技报告管理体系建设的思考[A];节能环保 和谐发展——2007中国科协年会论文集(一)[C];2007年
3 夏文;;关于综述写作的一些问题[A];辽宁省高校学报研究会首届学术年会论文集[C];1983年
相关重要报纸文章 前10条
1 本报记者 刘垠;建立国家科技报告体系[N];大众科技报;2011年
2 本报记者 徐玢;“科技报告制度是国家创新体系的基本保障条件”[N];科技日报;2012年
3 见习记者 王恒;建立国家科技报告制度需注意四大问题[N];中国经济时报;2014年
4 本报记者 陈磊;国家科技报告制度,,从顶层设计走向逐级实施[N];科技日报;2014年
5 记者 喻思娈;国家科技报告制度全面推行[N];人民日报;2014年
6 记者 胡宇芬邋通讯员 戴雄辉 任彬彬;三百省直厅干听科技报告[N];湖南日报;2008年
7 本报记者 司建楠;冯长根:加快建立国家科技报告体系[N];中国工业报;2011年
8 本报记者 刘垠 陈磊;科技报告:展现科技实力 推进开放共享[N];科技日报;2013年
9 宗禾;制度护航国家科技成果向社会开放共享[N];中国财经报;2014年
10 尹江勇;省科协科技报告周启动[N];河南日报;2007年
相关硕士学位论文 前5条
1 陈桂强;基于多知识库科技报告术语实体链接研究[D];华中师范大学;2017年
2 范苗苗;科技报告的风格翻译[D];北京外国语大学;2017年
3 李亚峰;科技报告知识共享绩效评价体系构建研究[D];吉林大学;2015年
4 张金云;科技报告语篇中人际情感与态度意义[D];山东大学;2005年
5 李成龙;科技报告中粒度关联数据的创建与发布研究[D];华中师范大学;2014年
本文编号:2361695
本文链接:https://www.wllwen.com/tushudanganlunwen/2361695.html