词向量聚类加权TextRank的关键词抽取

发布时间：2019-06-27 18:25

【摘要】：【目的】将维基百科蕴涵的世界知识以词向量方式融入TextRank模型,改进单文档关键词抽取效果。【方法】利用Word2Vec模型基于维基百科中文数据,生成词向量模型,对TextRank词图节点的词向量进行聚类以调整簇内节点的投票重要性,结合节点的覆盖和位置因素,计算节点之间的随机跳转概率,生成转移矩阵,最终通过迭代计算获得节点的重要性得分,选取前TopN个词语生成关键词。【结果】当TopN≤7时,词向量聚类加权方法均优于对比方法;TopN=3时,F值取得最大值,比先前最优结果增量提升了3.374%;TopN7时,结果与位置加权法相似。【局限】聚类分析使得计算开销变高。【结论】词向量聚类加权能够改善关键词抽取效果。
[Abstract]:[objective] to integrate the world knowledge contained in Wikipedia into TextRank model by word vector, and to improve the effect of keyword extraction from single document. [methods] the word vector model is generated based on Wikipedia Chinese data, and the word vector of TextRank word map node is clustering to adjust the voting importance of the nodes in the cluster. Combined with the coverage and location factors of the nodes, the random jump probability between nodes is calculated and the transfer matrix is generated. Finally, the importance score of the node is obtained by iterative calculation, and the former TopN words are selected to generate keywords. [results] when TopN 鈮，

本文编号：2507032

资料下载

论文发表

支付宝下载

Download by Alipay
微信下载

Download by Wechat
会员下载

Download by Member

本文链接：https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2507032.html

上一篇：基于工业以太网的RFID技术在汽车焊装车间的应用
下一篇：基于情境感知的移动购物应用设计研究

论文发表

·知网|万方|维普|龙源|省级|国家级|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|