中文文本的信息自动抽取和相似检索机制

发布时间：2018-09-11 09:10

【摘要】：目前信息抽取成为提供高质量信息服务的重要手段,提出面向中文文本信息的自动抽取和相似检索机制,其基本思想是将用户兴趣表示为语义模板,对关键字进行概念扩充,通过搜索引擎获得初步的候选文本集合,在概念触发机制和部分分析技术基础上,利用语义关系到模板槽的映射机制,填充文本语义模板,形成结构化文本数据库.基于文本数据表述的模糊性,给出用户查询与文本语义模板的相似关系,实现了相似检索,可以更加全面地满足用户的信息需求.
[Abstract]:At present, information extraction has become an important means to provide high quality information service. An automatic extraction and similar retrieval mechanism for Chinese text information is proposed. The basic idea is to express user interest as semantic template and expand the concept of keywords. On the basis of concept trigger mechanism and partial analysis technology, the candidate text set is obtained by search engine. Based on the mapping mechanism of semantic relation to template slot, the text semantic template is filled in to form a structured text database. Based on the fuzziness of text data representation, the similarity relationship between user query and text semantic template is given, and the similarity retrieval is realized, which can meet the information needs of users more comprehensively.
【作者单位】：大连理工大学计算机系大连理工大学计算机系大连理工大学计算机系
【基金】：国家自然科学基金项目(6037309560673039)资助.
【分类号】：TP391.1

【参考文献】