基于潜在语义与图结构的微博语义检索
发布时间:2019-02-13 20:13
【摘要】:微博文本短小、特征稀疏、与用户查询之间存在语义鸿沟的特点会降低语义检索效率。针对该问题,结合文本特征和知识库语义,构建基于潜在语义与图结构的语义检索模型。通过Tversky算法计算基于Hashtag的特征相关度;利用隐含狄利克雷分布算法对Wikipedia语料库训练主题模型,基于JSD距离计算映射到该模型的文本主题相关度;抽取DBpedia中实体及其网络关系连接图,使用SimRank算法计算图中实体间的相关度。综合以上3个结果得到最终相关度。通过短文本和长文本检索对Twitter子集进行实验,结果表明,与基于开放关联数据和图论的方法相比,该模型在评估指标MAP,P@30,R-Prec上分别提高了2.98%,6.40%,5.16%,具有较好的检索性能。
[Abstract]:The semantic gap between Weibo text and user query will reduce the efficiency of semantic retrieval. To solve this problem, a semantic retrieval model based on latent semantics and graph structure is constructed by combining text features and knowledge base semantics. The feature correlation degree based on Hashtag is calculated by Tversky algorithm, the topic model of Wikipedia corpus is trained by implicit Dirichlet distribution algorithm, and the relevance degree of text topic mapped to the model is calculated based on JSD distance. The connection graph of entities and their networks in DBpedia is extracted, and the correlation between entities in the graph is calculated by using SimRank algorithm. Combined with the above three results, the final correlation was obtained. The experiment on Twitter subset by short text retrieval and long text retrieval shows that compared with the method based on open association data and graph theory, the model increases 2.98% 6.40% in MAP,P@30,R-Prec, respectively. 5.16, with better retrieval performance.
【作者单位】: 钦州学院电子与信息工程学院;华南师范大学计算机学院;郑州轻工业学院软件学院;
【基金】:国家自然科学基金(61272066) 广西高校中青年教师基础能力提升项目(KY2016LX431) 广州市科技计划项目(2014J4100031) 钦州市科学研究与技术开发计划项目(20164407)
【分类号】:TP391.1;TP393.092
[Abstract]:The semantic gap between Weibo text and user query will reduce the efficiency of semantic retrieval. To solve this problem, a semantic retrieval model based on latent semantics and graph structure is constructed by combining text features and knowledge base semantics. The feature correlation degree based on Hashtag is calculated by Tversky algorithm, the topic model of Wikipedia corpus is trained by implicit Dirichlet distribution algorithm, and the relevance degree of text topic mapped to the model is calculated based on JSD distance. The connection graph of entities and their networks in DBpedia is extracted, and the correlation between entities in the graph is calculated by using SimRank algorithm. Combined with the above three results, the final correlation was obtained. The experiment on Twitter subset by short text retrieval and long text retrieval shows that compared with the method based on open association data and graph theory, the model increases 2.98% 6.40% in MAP,P@30,R-Prec, respectively. 5.16, with better retrieval performance.
【作者单位】: 钦州学院电子与信息工程学院;华南师范大学计算机学院;郑州轻工业学院软件学院;
【基金】:国家自然科学基金(61272066) 广西高校中青年教师基础能力提升项目(KY2016LX431) 广州市科技计划项目(2014J4100031) 钦州市科学研究与技术开发计划项目(20164407)
【分类号】:TP391.1;TP393.092
【相似文献】
相关期刊论文 前10条
1 张慧;蒋开伟;冯玉珉;;图像和视频的语义检索[J];科技信息;2006年10期
2 李晨光;;基于本体的网络问答式语义检索系统[J];科技情报开发与经济;2008年32期
3 黄敏;赖茂生;;语义检索研究综述[J];图书情报工作;2008年06期
4 胡哲;郑诚;王艳玲;;语义检索关键技术研究[J];计算机技术与发展;2008年10期
5 刘珊慧;万韵;杨乐;;基于本体的农业信息资源语义检索过程研究[J];安徽农业科学;2009年23期
6 楚书来;张瑞;;基于本体的语义检索技术研究[J];黑龙江科技信息;2010年24期
7 付苓;崔新春;谢娟;连慧平;;基于语义信息链的语义检索研究[J];山东图书馆学刊;2010年04期
8 张世勇;陈运启;;基于概念匹配的语义检索模型研究[J];重庆工商大学学报(自然科学版);2010年05期
9 马中杰;郑诚;苏喻;;一种基于知识库的语义检索系统模型[J];微型机与应用;2010年20期
10 李林;王红;付宇;杨璇;王静;;民航突发事件应急案例语义检索方法研究[J];计算机工程与设计;2011年03期
相关会议论文 前7条
1 王洪俊 ;沈水荣 ;黄,
本文编号:2421826
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2421826.html