基于句子成分的微博热点主题挖掘模型研究

发布时间：2018-01-07 15:08

本文关键词：基于句子成分的微博热点主题挖掘模型研究　出处：《情报科学》2015年11期 　论文类型：期刊论文

【摘要】：由于传统聚类分析中文本相似度计算方法不适用于短文本,本文选用基于句子成分的相似度计算方法来计算微博文本之间的相似度。首先对文本进行句子划分,再通过句法分析获取微博的句子成分,选择构成句子成分的词语为特征词。利用知网计算两个微博文本之间相同成分词语的语义相似度,将语义相似度值按句子成分种类加权相加得到微博文本之间的相似度值。据此,构建文本相似矩阵,进行聚类分析,找到微博热点主题。最后,用实验证明本文方法的可行性。
[Abstract]:Because the traditional clustering analysis Chinese text similarity calculation method is not suitable for short text, this paper uses the similarity calculation method based on sentence components to calculate the similarity between Weibo text. Firstly, the text is divided into sentences. Then the syntactic analysis is used to obtain the sentence components of Weibo, and the words that constitute the sentence components are selected as feature words. The semantic similarity of the same component words between the two Weibo texts is calculated by using the knowledge net. The semantic similarity value is weighted according to sentence composition category to get the similarity value between Weibo texts. Based on this, text similarity matrix is constructed, clustering analysis is carried out, and Weibo hot topic is found. Finally. The feasibility of this method is proved by experiments.
【作者单位】：南京大学信息管理学院;武汉大学信息管理学院;
【基金】：国家自然科学基金项目(71273194)
【分类号】：TP391.1;TP393.092
【正文快照】： 1引言随着互联网技术的进步,社会化媒体得到迅速普及,微博更是发展迅速。2011年社科院发布的《社会蓝皮书》指出【1】:传统的社会舆论格局正在被微博等网络平台所改变,微博话题成为其中最具影响力的一种【2】,因此对微博数据进行热点挖掘研究意义重大。由于微博数据有文本长度

【参考文献】