基于PhraseLDA模型的主题短语挖掘方法研究
发布时间:2019-03-17 20:46
【摘要】:[目的/意义]以主题短语识别为研究对象,提出基于PhraseLDA模型的主题短语挖掘方法,为快速理解文本内容、准确抽取文本主题提供借鉴思路。[方法/过程]对低频词进行量化定义,提出一种合理的短语重要度计算方法,最终利用PhraseLDA主题模型推理出主题短语。[结果/结论]实验结果表明该方法在多种数据集中挖掘出的主题短语质量较高,主题一致性较强。
[Abstract]:[aim / meaning] this paper proposes a topic phrase mining method based on PhraseLDA model, which can be used for reference to quickly understand the text content and extract the text topic accurately. [methods / processes] quantificationally define low-frequency words, and propose a reasonable method to calculate the importance of phrases. Finally, we use the PhraseLDA topic model to infer the topic phrases. [results / conclusion] the experimental results show that the quality of topic phrases mined by this method in various data sets is high and the topic consistency is strong.
【作者单位】: 中国科学院文献情报中心;中国科学院大学;中国科学院武汉文献情报中心;
【基金】:中国科学院“全院科技信息监测中心建设”项目(项目编号:院1628-4)研究成果之一
【分类号】:TP391.1
,
本文编号:2442686
[Abstract]:[aim / meaning] this paper proposes a topic phrase mining method based on PhraseLDA model, which can be used for reference to quickly understand the text content and extract the text topic accurately. [methods / processes] quantificationally define low-frequency words, and propose a reasonable method to calculate the importance of phrases. Finally, we use the PhraseLDA topic model to infer the topic phrases. [results / conclusion] the experimental results show that the quality of topic phrases mined by this method in various data sets is high and the topic consistency is strong.
【作者单位】: 中国科学院文献情报中心;中国科学院大学;中国科学院武汉文献情报中心;
【基金】:中国科学院“全院科技信息监测中心建设”项目(项目编号:院1628-4)研究成果之一
【分类号】:TP391.1
,
本文编号:2442686
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2442686.html