中文时间关键词识别研究
发布时间:2018-04-30 20:39
本文选题:时间关键词 + 时间关键词识别 ; 参考:《计算机应用研究》2017年04期
【摘要】:时态信息广泛应用在自然语言处理、信息检索等领域,而时间关键词识别更是时态信息应用的关键,其直接影响到时态信息的使用。时间信息在文本或句中形式多样性、位置随意性以及边界不确定性等特点成为时间关键词识别任务中的难点。为了解决中文时间关键词的识别问题,通过分析文本语句结构并结合语法结构树提出短语划分方法,将文本转换成短语集从而确定短语边界;在此基础上提出短语向量化表示法,以此构建向量空间;最后,引入谱聚类的聚类思想,将识别问题转换为聚类问题。实验证明,运用该方法进行中文时态关键词识别具有较好的效果。
[Abstract]:Temporal information is widely used in the fields of Natural Language Processing, information retrieval and other fields, and time keyword recognition is the key to the application of temporal information. It directly affects the use of temporal information. The characteristics of time information in text or sentence form diversity, location randomness and boundary inaccuracy become the task of time keyword recognition. In order to solve the recognition problem of Chinese time key words, by analyzing the structure of text statement and combining the grammar structure tree with the phrase division method, the text is converted into phrase set to determine the phrase boundary. On this basis, the phrase direction quantization representation is proposed to construct the vector space. Finally, the clustering idea of spectral clustering is introduced. Recognition problem is transformed into clustering problem. Experiments show that this method is effective in Chinese tense keyword recognition.
【作者单位】: 广东工业大学计算机学院;
【基金】:广东省自然科学基金资助项目(S2011040004281,S2013010014457)
【分类号】:TP391.1
,
本文编号:1826134
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1826134.html