针对科技路线图的文本挖掘研究:信息抽取方法

发布时间：2018-06-07 23:39

本文选题：科技路线图 + 文本挖掘　；参考：《情报理论与实践》2017年05期

【摘要】：[目的/意义]为了加强针对科技路线图的情报研究,探索从科技路线图报告中自动抽取核心信息的方法。[方法/过程]通过分析21个国家或组织发布的166份科技路线图的内容组织和表达特征,总结科技路线图中包含的核心信息,提出一种新的信息抽取思路"抽取—同步—分类",实现对科技路线图中核心内容的抽取。[结果/结论]以45篇科技路线图报告为测试案例进行方法验证,最终获取26736条有效数据信息,按时间序列可视化呈现,能够基本反映科技路线图的主要内容,表明该方法设计可行,能够快速获取科技路线图中的核心信息,提高针对科技路线图的情报获取效率。[局限]在文本清洗、关键词筛选等过程中尚需人工干预,技术方法的选择较为分散,有待进一步综合完善。
[Abstract]:Objective / significance] in order to strengthen the information research on the science and technology road map, to explore the method of extracting the core information automatically from the science and technology road map report. [methods / process] summarizing the core information contained in the science and technology road map by analysing the content organization and expressive characteristics of 166 science and technology road maps issued by 21 countries or organizations, In this paper, a new idea of information extraction, "extraction-synchronization-classification", is proposed to extract the core contents of the science and technology road map. [results / conclusion] using 45 science and technology road map reports as test cases, 26736 valid data information were obtained and visualized according to time series, which can basically reflect the main contents of science and technology road map. The results show that this method is feasible and can quickly acquire the core information in the science and technology road map, and improve the efficiency of information acquisition for the science and technology road map. In the process of text cleaning and keyword screening, manual intervention is needed, and the choice of technical methods is scattered and needs further comprehensive improvement.
【作者单位】：中国科学院文献情报中心;首都医科大学卫生管理与教育学院;首都医科大学图书馆;
【基金】：中国科学院规划与决策科技支持系统建设项目“科技决策知识服务平台”(项目编号:院1405) 国家自然科学基金项目“科学结构特征及其演化动力学分析方法与应用研究”(项目编号:71173211)的成果
【分类号】：TP391.1

【相似文献】