当前位置:主页 > 科技论文 > 软件论文 >

融合词向量特征的双词主题模型

发布时间:2018-11-01 13:02
【摘要】:针对短文本中固有的文本内容稀疏和上下文信息匮乏等问题,在双词主题模型(BTM)的基础上提出一种融合词向量特征的双词主题模型LF-BTM。该模型引入潜在特征模型以利用丰富的词向量信息弥补内容稀疏,在改进的生成过程中每个双词的词汇的生成受到主题—词汇多项分布和潜在特征模型的共同影响。模型中的参数通过吉布斯采样算法进行估计。在真实的短文本数据集上的实验结果表明,该模型能结合外部通用的大规模语料库上已训练好的词向量挖掘出语义一致性显著提升的主题。
[Abstract]:In view of the inherent problems of sparse text content and lack of context information in short text, this paper proposes a two-word topic model LF-BTM. based on the two-word topic model (BTM). In this model, the potential feature model is introduced to make up for the sparse content by using abundant word vector information. In the improved generation process, the generation of each two-word vocabulary is influenced by the multi-item distribution of the topic vocabulary and the potential feature model. The parameters in the model are estimated by Gibbs sampling algorithm. The experimental results on the real short text dataset show that the proposed model can combine with the trained word vectors in a large scale corpus to find the topic of significant semantic consistency improvement.
【作者单位】: 海南大学信息科学技术学院;
【基金】:国家自然科学基金资助项目(61462022)
【分类号】:TP391.1


本文编号:2304001

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2304001.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户e7062***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com