融合PAM和主题偏好TextRank的历史沿革信息抽取

发布时间：2018-05-20 12:20

本文选题：信息抽取 + 历史沿革　；参考：《计算机应用研究》2017年01期

【摘要】：针对从自然标注大数据中抽取历史沿革主题信息的问题,提出了一种融合PAM主题模型与主题偏好TextRank的方法。该方法利用PAM主题模型获取历史沿革主题基于其他相关主题的分布和不同主题基于词的分布;主题偏好TextRank算法则根据PAM所获得的主题和词的分布,在随机游走的过程中更加偏好于与历史沿革主题相关度大的节点,从而更有利于抽取历史沿革主题信息。因历史沿革主题特征复杂,与其他主题关联度大,词项本身是否表达历史沿革主题信息也并不明确,因此PAM既可以获取基于词空间的分布,又可以获取基于主题分布,对解决这类问题有很大的帮助。利用已获取的主题信息,主题偏好TextRank算法偏向于与历史沿革主题相关的节点进行随机游走,使得抽取结果趋向于历史沿革主题,从而提高了抽取的准确性。实验结果表明,该方法在抽取历史沿革主题信息上更有效。
[Abstract]:To solve the problem of extracting historical topic information from natural annotation big data, a method of combining PAM topic model with topic preference TextRank is proposed. This method uses the PAM theme model to obtain the distribution of historical themes based on other related topics and the distributions of different themes based on words, and the TextRank algorithm of topic preference is based on the distribution of topics and words obtained by PAM. In the process of random walk, we prefer the node with high correlation with historical evolution theme, which is more advantageous to extract historical topic information. Because of the complexity of historical evolution theme and its correlation with other themes, it is not clear whether the word item itself expresses historical theme information, so PAM can obtain the distribution based on word space as well as topic distribution. It will be of great help in solving such problems. Using the obtained topic information, the topic preference TextRank algorithm is biased towards the nodes associated with historical topics for random walk, which makes the extraction results tend to historical topics, thus improving the accuracy of extraction. The experimental results show that the proposed method is more effective in extracting historical subject information.
【作者单位】：内蒙古师范大学计算机与信息工程学院;
【基金】：国家自然科学基金资助项目(61562068) 内蒙古自然科学基金资助项目(2013MS0912) 内蒙古师范大学研究生科研创新基金资助项目(CXJJS14078)
【分类号】：TP391.1

【相似文献】