基于微博特定实体的关联信息挖掘算法研究
发布时间:2019-04-28 17:42
【摘要】:作为随着web2.0技术而兴起的互联网社交类应用,微博已经逐渐成为人们日常生活里不可或缺的一部分。微博的火爆带来的是微博数据量的爆炸式增长。如何利用庞大的微博数据,如何从海量数据中获得符合需求的信息,如何挖掘和指定实体的关联信息,成为现阶段学术界的重点研究方向。 本论文通过分析微博的特点,提出了基于微博的特定实体对象的信息挖掘系统——微邮系统,并从微博环境下的信息检索,特定实体信息挖掘和基于实体间关联性的推荐系统三个方面由浅入深地进行了研究。本文的主要创新点和贡献在于以下几个方面: 首先,提出了一种基于电阻网络模型的查询扩展方法,利用电路系统上的电阻网络模型来模拟文本空间的词间关系网络,以有效电阻来表征词间的关联度。此方法有效地简化了复杂的词间关系网络的计算。TREC提出的Microblog Track评测的结果表明,此方法可以得到符合用户原始查询意图的扩展词,并提高各项检索指标。 其次,在查询扩展的基础上,提出了一种基于词激活力模型的扩展词间关联性挖掘算法。利用词激活力模型中词间亲密度,计算扩展词问的关联性,得到扩展词对,并利用扩展词对进行查询重构。实验数据说明,扩展词对可以有效减少因扩展词引起的信息偏移,在关于实体对象的信息挖掘中取得了较好的效果。 最后,设计实现了一个基于词激活力模型,针对用户兴趣和环境信息共同影响下的个性化推荐系统。此系统在TREC的Contextual Suggestion Track评测中取得了优异的成果,充分说明了词激活力模型在实体间关联性挖掘上的有效性。
[Abstract]:With the rise of web2.0 technology, Internet social applications, Weibo has gradually become an indispensable part of people's daily life. The explosion of Weibo results in an explosive increase in the amount of Weibo data. How to make use of huge Weibo data, how to obtain the required information from the massive data, how to mine and identify the associated information of entities, has become the focus of academic research at this stage. By analyzing the characteristics of Weibo, this paper puts forward the information mining system of specific entity object based on Weibo-micro-mail system, and retrieves the information from Weibo environment. Specific entity information mining and recommendation system based on inter-entity association are studied from shallow to deep. The main innovations and contributions of this paper lie in the following aspects: firstly, a query extension method based on resistance network model is proposed, which uses the resistance network model on the circuit system to simulate the inter-word relation network in text space. Use effective resistance to characterize the correlation between words. This method effectively simplifies the computation of complex word-to-word relationship networks. The results of Microblog Track evaluation proposed by TREC show that this method can obtain extended words that accord with the original query intention of users and improve the retrieval indexes. Secondly, on the basis of query extension, an extended word-to-word association mining algorithm based on word vitality model is proposed. By using the affinity density between words in the dynamic model of words, the relevance of extended word questions is calculated, the extended word pairs are obtained, and the extended word pairs are used for query reconstruction. The experimental data show that the extended word pair can effectively reduce the information offset caused by the extended word and obtain a good effect in the information mining of the entity object. Finally, a personalized recommendation system based on word activation model is designed and implemented, which is influenced by user's interest and environmental information. This system has achieved excellent results in the Contextual Suggestion Track evaluation of TREC, which fully demonstrates the validity of the word activation model in the mining of association between entities.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092;TP391.1
本文编号:2467832
[Abstract]:With the rise of web2.0 technology, Internet social applications, Weibo has gradually become an indispensable part of people's daily life. The explosion of Weibo results in an explosive increase in the amount of Weibo data. How to make use of huge Weibo data, how to obtain the required information from the massive data, how to mine and identify the associated information of entities, has become the focus of academic research at this stage. By analyzing the characteristics of Weibo, this paper puts forward the information mining system of specific entity object based on Weibo-micro-mail system, and retrieves the information from Weibo environment. Specific entity information mining and recommendation system based on inter-entity association are studied from shallow to deep. The main innovations and contributions of this paper lie in the following aspects: firstly, a query extension method based on resistance network model is proposed, which uses the resistance network model on the circuit system to simulate the inter-word relation network in text space. Use effective resistance to characterize the correlation between words. This method effectively simplifies the computation of complex word-to-word relationship networks. The results of Microblog Track evaluation proposed by TREC show that this method can obtain extended words that accord with the original query intention of users and improve the retrieval indexes. Secondly, on the basis of query extension, an extended word-to-word association mining algorithm based on word vitality model is proposed. By using the affinity density between words in the dynamic model of words, the relevance of extended word questions is calculated, the extended word pairs are obtained, and the extended word pairs are used for query reconstruction. The experimental data show that the extended word pair can effectively reduce the information offset caused by the extended word and obtain a good effect in the information mining of the entity object. Finally, a personalized recommendation system based on word activation model is designed and implemented, which is influenced by user's interest and environmental information. This system has achieved excellent results in the Contextual Suggestion Track evaluation of TREC, which fully demonstrates the validity of the word activation model in the mining of association between entities.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092;TP391.1
【参考文献】
相关期刊论文 前9条
1 董振东,董强;知网和汉语研究[J];当代语言学;2001年01期
2 马晖男;吴江宁;潘东华;;一种基于同义词词典的模糊查询扩展方法[J];大连理工大学学报;2007年03期
3 魏晓宁;;基于隐马尔科夫模型的中文分词研究[J];电脑知识与技术(学术交流);2007年21期
4 韩立新,陈贵海,谢立;一个面向Internet的个性化信息检索系统模型[J];电子学报;2002年02期
5 高茂庭;王正欧;;一种基于双词关联的文本特征选择模型[J];计算机工程与应用;2007年10期
6 邹海山,吴勇,吴月珠,陈阵;中文搜索引擎中的中文信息处理技术[J];计算机应用研究;2000年12期
7 董振东;董强;郝长伶;;知网的理论发现[J];中文信息学报;2007年04期
8 刘海峰;王元元;张学仁;刘守生;;一种基于聚类和LSA相结合的文本特征降维方法[J];情报杂志;2008年02期
9 丁立恺;夏勇明;钱松荣;;基于词关联度的文本检索系统[J];微型电脑应用;2011年03期
,本文编号:2467832
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2467832.html