当前位置:主页 > 经济论文 > 电子商务论文 >

基于内容的电子书和作者推荐方法研究

发布时间:2018-06-20 06:55

  本文选题:内容推荐 + 电子书推荐 ; 参考:《哈尔滨工业大学》2016年硕士论文


【摘要】:随着互联网的迅速普及,电子书及作者资源变得越来越丰富,但在海量的资源中读者却越来越难以发现真正感兴趣的信息,而推荐系统能够快速地帮助读者选择感兴趣的、有价值的信息。在实际生活当中,读者更倾向于阅读题材内容相同者相似的图书,或者阅读与自己喜爱的作者写作风格内容相近的作者所著的图书。所以,个性化的电子书推荐和作者推荐成为网上书城比较关注的热点问题。因此,本课题对基于内容的电子书推荐和作者推荐方法的研究具有重要的实用价值。本课题主要研究内容如下:在基于内容的电子书推荐方面,传统一些文本处理模型针对短文本研究较多,对长文本研究较少,因为与短文本(如新闻)相比,长文本(如电子书)存在预处理维度更高、更复杂,文本语义关系更难度量等问题。本课题从电子书权威网站爬取电子书全文文本,构造实验长文本数据集,针对电子书长文本维度高、处理复杂等特点,采用分治的思想,将长文本分割成若干部分,提出了多维潜在语义算法模型,通过构建词语义关系图谱矩阵,来表达文本语义关联性的特征。针对长文本语义关系难度量的特点,采用融合全局和局部语义的相似性距离,对电子书文本内容相似性进行衡量,并对实验涉及的参数进行一系列实验研究。实验结果表明,多维潜在语义算法模型在五种量化评价指标的衡量下,优于传统的其他文本处理模型;在基于内容的电子书作者推荐方面,研究大多集中在专家推荐,并且推荐里使用的特征比较单一。针对上述问题,本课题利用爬虫程序,从电子商务网站爬取与作者相关的三种异构特征,即电子书作者简介、作者所著书摘要以及读者评论。利用这三种电子书作者的异构特征,本课题提出了作者树状结构的表示方法,应用多层自组织映射算法模型,进行电子书作者推荐。实验根据作者树结构中作者节点特征是否与另外两种特征信息融合设计两组实验,并对实验涉及的参数进行研究。实验结果表明,在五种量化指标的衡量下,基于作者树的多层自组织映射模型优于传统文本处理模型。
[Abstract]:With the rapid popularity of the Internet, e-books and author resources have become more and more abundant, but in the vast amount of resources, it is increasingly difficult for readers to find information of real interest, and recommendation systems can quickly help readers to choose what is interested. Valuable information. In real life, readers tend to read similar books with the same subject matter or books written by authors with similar writing styles. Therefore, personalized e-book recommendation and author recommendation has become a hot issue in online book city. Therefore, this topic has important practical value to the content-based e-book recommendation and the author recommendation method. The main contents of this paper are as follows: in the aspect of content based e-book recommendation, some traditional text processing models have more research on short text and less on long text, because compared with short text (such as news), Long text (e. G. E-books) has many problems such as higher preprocessing dimension, more complex, and more difficult to measure the semantic relationship of text. This topic crawls the full text from the ebook authority website, constructs the experiment long text data set, in view of the e-book long text dimension high dimension, processing complex and so on characteristic, adopts the division and conquer thought, divides the long text into several parts. A multi-dimensional latent semantic algorithm model is proposed to express the semantic relevance of the text by constructing the semantic relationship map matrix. According to the characteristics of the difficulty of long text semantic relationship, the similarity distance between global and local semantics is used to measure the similarity of e-book text content, and a series of experimental studies are carried out on the parameters involved in the experiment. The experimental results show that the multi-dimensional latent semantic algorithm model is superior to other traditional text processing models under the measurement of five quantitative evaluation indexes, and the research on the content based e-book author recommendation is mostly focused on the expert recommendation. And the features recommended for use are relatively simple. In order to solve the above problems, we use the crawler program to pick up three kinds of heterogeneous features related to the author from the e-commerce website, that is, the brief introduction of the author of the e-book, the abstract of the author's book and the reader's comment. Taking advantage of the heterogeneous characteristics of the three e-book authors, this paper proposes a representation method of the author's tree structure, and applies the multi-layer self-organizing mapping algorithm model to recommend the author of the e-book. Two groups of experiments are designed according to whether the author node features are fused with the other two kinds of feature information in the author tree structure, and the parameters involved in the experiment are studied. The experimental results show that the multi-layer self-organizing mapping model based on author tree is superior to the traditional text processing model under the measurement of five quantitative indexes.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1


本文编号:2043387

资料下载
论文发表

本文链接:https://www.wllwen.com/jingjilunwen/dianzishangwulunwen/2043387.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户e13ac***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com