基于LSTM模型的中文图书多标签分类研究
发布时间:2018-04-25 20:44
本文选题:LSTM模型 + 深度学习 ; 参考:《数据分析与知识发现》2017年07期
【摘要】:【目的】利用LSTM模型和字嵌入的方法构建分类系统,提出一种中文图书分类中多标签分类的解决方案。【方法】引入深度学习算法,利用字嵌入方法和LSTM模型构建分类系统,对题名、主题词等字段组成的字符串进行学习以训练模型,并采用构建多个二元分类器的方法解决多标签分类问题,选择3所高校5个类别的书目数据进行实验。【结果】从整体准确率、各类别精度、召回率、F1值多个指标进行分析,本文提出的模型均有良好表现,有较强的实际应用价值。【局限】数据仅涉及中图分类法5个类别,考虑的分类粒度较粗等。【结论】基于LSTM模型的中文图书分类系统具有预处理简单、增量学习、可迁移性高等优点,具备可行性和实用性。
[Abstract]:[objective] to construct a classification system by using LSTM model and word embedding method, and to put forward a solution of multi-label classification in Chinese book classification. [methods] an in-depth learning algorithm is introduced, and a classification system is constructed by word embedding method and LSTM model. In order to train the model, we use the method of constructing multiple binary classifiers to solve the problem of multi-label classification. The bibliographic data of five categories of three colleges and universities are selected to carry on the experiment. [results] from the overall accuracy, the precision of each category, the recall rate and the F1 value, the model presented in this paper has good performance. It has strong practical application value. [limitation] data only involve 5 categories of middle graph classification, and consider the classification granularity is coarser. [conclusion] the Chinese book classification system based on LSTM model has simple preprocessing and incremental learning. It has the advantages of high mobility, feasibility and practicability.
【作者单位】: 南京大学信息管理学院;江苏省数据工程与知识服务重点实验室(南京大学);
【基金】:国家自然科学基金项目“面向学术资源的TSD与TDC测度及分析研究”(项目编号:71503121) 中央高校基本科研业务费重点项目“我国图书情报学科知识结构及演化动态研究”(项目编号:20620140645)的研究成果之一
【分类号】:TP181;TP391.1
,
本文编号:1802892
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1802892.html