一种基于PL-LDA模型的主题文本网络构建方法
发布时间:2018-06-18 02:08
本文选题:主题模型 + 文本挖掘 ; 参考:《复杂系统与复杂性科学》2017年01期
【摘要】:Labeled LDA能挖掘出给定主题下的单词概率分布,但却无法分析主题词之间的关联关系。采用PMI虽可计算两个单词的相互关系,但却和给定主题失去联系。受PMI在窗口中统计词对共现频率的启发,提出了一种PL-LDA(Pointwise Labeled LDA)主题模型,可计算给定主题下词对的联合概率分布,在航空安全报告数据集上的实验表明PL-LDA模型所得结果具有很好的解释性。利用PL-LDA构建了主题文本网络,该网络除能反映主题词分布外,还可展现它们之间的复杂关联关系。
[Abstract]:Labeled LDA can mine the probability distribution of words under a given topic, but it can not analyze the relationship between the subject words. The PMI can be used to calculate the relationship between two words, but it is not related to a given subject. A PL-LDA-Pointwise Labeled LDA-topic model is proposed to calculate the joint probability distribution of word pairs under a given topic. Experiments on the data set of aviation safety report show that the results of PL-LDA model are well explained. PL-LDA is used to construct a topic text network, which can not only reflect the distribution of theme words, but also show the complex relationship between them.
【作者单位】: 中国民航大学计算机科学与技术学院;南京航空航天大学计算机科学与技术学院;
【基金】:国家自然科学基金(61201414,61301245,U1233113)
【分类号】:TP391.1
,
本文编号:2033531
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2033531.html