基于本体的用户兴趣挖掘系统的研究与实现
发布时间:2018-05-16 05:40
本文选题:兴趣挖掘 + 本体 ; 参考:《华南理工大学》2013年硕士论文
【摘要】:随着人类社会商业模式的不断演变,各商业公司都希望有一种既符合自己业务特点,又能够为不同用户提供个性化服务的经营策略,而这种个性化服务的关键便是用户的兴趣模型。 传统的数据挖掘技术,根据用户与物品之间的关联关系,捕获用户的兴趣特征,为用户推荐其可能感兴趣的商品,进而引发用户的购买行为。对于一些非电子商务类的企业,例如搜索引擎厂商、网络服务提供商等,由于其并没有用户直接的购买记录,所以使用常见的数据挖掘技术很难建立起有效的用户兴趣模型。但是这类企业往往拥有另外一种宝贵的资源——用户的浏览记录。本文所探讨的用户兴趣挖掘系统正是针对用户浏览记录中的URL数据,以兴趣本体为基础,,提出了一种新颖的用户兴趣建模流程,并以真实的用户数据作为实验对象,证明了本系统的可行性和实用性。 本文主要的研究工作如下: 1.一套完整有效的面向兴趣本体概念的训练方法。从预先建立的兴趣参照本体中获得兴趣关键字,使用这些关键字针对特定的搜索引擎构造其搜索URL,系统将抓取搜索引擎的返回结果作为本体概念的训练文档集。结合基于XPath的网页信息提取技术和改进的基于行块长度函数的网页正文抽取算法,提炼训练文档的核心内容。最后使用Lucene为文档集构建便于快速检索的倒排索引,高效而准确地计算出兴趣本体中每一个概念的TF-IDF特征向量。 2.一种结合了用户浏览行为的兴趣建模方法。用户兴趣模型本质上是兴趣参照本体的一个带评分的实例,本文提出了一种集成了用户浏览模式的扩散激活算法来初始化和更新用户的兴趣评分。该方法充分考虑了本体概念之间的关联关系,不但能准确捕捉到用户明确表现出来的兴趣,而且还能在一定程度上发现用户的潜在兴趣。此外,本文方法很好地克服了一般兴趣挖掘算法所面临的冷启动问题。
[Abstract]:With the continuous evolution of the business model of human society, each commercial company hopes to have a business strategy that not only conforms to its own business characteristics, but also can provide individual services for different users. The key to this personalized service is the user's interest model. Traditional data mining technology, according to the relationship between the user and the goods, captures the interest characteristics of users, recommends the products they may be interested in, and then leads to the purchase behavior of users. For some non-e-commerce enterprises, such as search engine manufacturers, network service providers and so on, it is difficult to establish an effective user interest model by using common data mining techniques because they do not have direct purchase records. But such businesses often have another valuable resource-users'browsing records. The user interest mining system discussed in this paper is aimed at the URL data in the user browsing record. Based on the interest ontology, a novel modeling process of user interest is proposed, and the real user data is taken as the experimental object. The feasibility and practicability of the system are proved. The main research work of this paper is as follows: 1. A complete and effective training method for the concept of interest ontology. Interest keywords are obtained from pre-established interest reference ontology, and these keywords are used to construct their search URLLs for specific search engines. The system grabs the returned results of search engines as a set of training documents for ontology concepts. Combined with the technology of web page information extraction based on XPath and the improved algorithm of page text extraction based on line block length function, the core content of training document is extracted. Finally, Lucene is used to construct the inverted index for the document set, which is convenient for fast retrieval. The TF-IDF feature vectors of each concept in the ontology of interest are calculated efficiently and accurately. 2. An interest modeling method that combines user browsing behavior. User interest model is essentially an example of interest reference ontology. This paper proposes a diffusion activation algorithm which integrates user browsing mode to initialize and update user interest score. This method fully considers the relationship between ontology concepts, not only can accurately capture the clearly expressed interest of the user, but also can discover the potential interest of the user to a certain extent. In addition, this method can overcome the cold start problem of general interest mining algorithm.
【学位授予单位】:华南理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP311.13;TP391.3
【参考文献】
相关期刊论文 前10条
1 应晓敏,刘明,窦文华;一种面向个性化服务的无需反例集的用户建模方法[J];国防科技大学学报;2002年03期
2 张付志;李伟静;朱彩云;;基于领域本体的跨系统个性化服务用户模型[J];计算机工程;2009年13期
3 郭岩,白硕,杨志峰,张凯;网络日志规模分析和用户兴趣挖掘[J];计算机学报;2005年09期
4 施聪莺;徐朝军;杨晓江;;TFIDF算法研究综述[J];计算机应用;2009年S1期
5 詹恒飞;杨岳湘;方宏;;Nutch分布式网络爬虫研究与优化[J];计算机科学与探索;2011年01期
6 许波;张结魁;周军;;基于行为分析的用户兴趣建模[J];情报杂志;2009年06期
7 赖庆梅;;新经济时代下的个性化服务营销策略[J];商场现代化;2007年05期
8 管建和;甘剑峰;;基于Lucene全文检索引擎的应用研究与实现[J];计算机工程与设计;2007年02期
9 李建廷;郭晔;汤志军;;基于用户浏览行为分析的用户兴趣度计算[J];计算机工程与设计;2012年03期
10 史艳梅;个性化服务中挖掘用户兴趣的CMPS[J];现代图书情报技术;2005年03期
本文编号:1895676
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1895676.html