基于复合加权LDA模型的书目信息分类方法研究
发布时间:2018-06-16 13:11
本文选题:文本分类 + LDA模型 ; 参考:《情报学报》2017年04期
【摘要】:以书目信息为分类对象的自动分类研究对信息资源组织具有重要意义。本文以概率主题模型LDA作为书目信息的文本表示模型,以克服因文本短小而产生的特征稀疏问题;以书目信息的体例结构和所在类目的类别区分能力分别实现两种不同的特征加权策略,在此基础上构建复合加权策略,使获取的特征词集既不向高频词倾斜,也更能代表书目信息的所属类别。将复合加权策略融合于LDA、提出一种基于复合加权LDA的书目信息分类方法。使用公开和自建的书目信息语料进行对比实验,验证和分析复合加权策略的有效性,实验显示本文提出的复合加权LDA分类方法的分类性能优于仅考虑其中一种特征加权策略的LDA分类方法。
[Abstract]:The automatic classification study of bibliographic information is of great significance to the organization of information resources. In this paper, the probability theme model LDA is used as the text representation model of bibliographic information to overcome the feature sparsity caused by short text, and the classification ability of the bibliographic information and the classification ability of the category is two respectively. On the basis of different feature weighting strategies, a compound weighting strategy is constructed so that the acquired feature words are not inclined to high frequency words, and they can also represent the category of bibliographic information. The composite weighting strategy is fused to LDA, and a bibliographic information classification method based on the compound weighted LDA is proposed. The comparison experiment is carried out to verify and analyze the effectiveness of the combined weighted strategy. The experiment shows that the classification performance of the combined weighted LDA classification method proposed in this paper is better than the LDA classification method which only considers one of the feature weighted strategies.
【作者单位】: 武汉大学信息管理学院;
【基金】:国家社会科学基金项目“多种类型文本数字资源自动分类研究”(15BTQ066)
【分类号】:TP391.1
【相似文献】
相关期刊论文 前1条
1 中本,内藤,张希轩;日本书目信息交换标准[J];现代图书情报技术;1985年04期
,本文编号:2026819
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2026819.html