基于APP数据的标签提取与整合
发布时间:2018-01-20 10:46
本文关键词: APP数据 标签提取 标签整合 标签系统 出处:《浙江大学》2017年硕士论文 论文类型:学位论文
【摘要】:近几年随着移动互联网的高速发展,手机上的应用程序(APP)呈现出爆炸式增长,如何帮助用户在海量的APP中找到合适的APP是各下载平台急需解决的问题。传统热门排序的方法缺点十分明显,热门的APP竞争有限的曝光位置,大量长尾的APP得不到出场机会,这十分不利于APP生态圈的构建。标签体系是Web2.0时代解决资源对象长尾、帮助用户管理、检索资源对象的有效方式。在APP领域鲜有标签体系的研究,APPStore、Google Play等国外代表性APP下载平台也尚无标签功能,但未来随着APP数量继续爆炸式增长,通过标签体系来解决APP长尾问题势在必行。本文致力于对这一全新领域进行探索,研究如何高效、自动、准确地构建APP领域标签体系,具体包括四方面工作:·数据预处理工作:针对APP数据专门进行了新词发现、停用词表构建,在360APP数据集上实验表明,预处理工作可以显著改善标签质量。·标签提取工作:归纳总结了常用的关键词提取、标签推荐算法,根据APP数据存在多维度文本的特点,改进提出了三种有效的标签提取算法(SemanticRank、RankScore1、RankScOre2算法),更加适应APP数据集特点,得到较优的标签结果。·标签整合工作:借鉴知识图谱思想,引入多个外部数据源构建同义词关系、偏序关系整合标签,大幅提高了APP覆盖率、召回率,解决了标签杂乱、不规范的问题。另外,本文提出APP标签树的方法管理APP标签,使得APP标签多维度问题得到有效解决。·标签管理系统工作:实现了APP标签管理系统,系统融入了预处理、标签提取、标签整合相关成果,并提供友好的交互界面和可视化界面,方便标签管理人员维护、管理标签。
[Abstract]:In recent years, with the rapid development of the mobile Internet, the application on the mobile phone (app) has shown an explosive growth. How to help users find the right APP in the massive APP is an urgent problem for the download platforms. The disadvantages of the traditional popular sorting method are very obvious, and the hot APP competes for the limited exposure location. A large number of long-tailed APP is not available, which is not conducive to the construction of the APP ecosystem. Tag system is the Web2.0 era to solve the long tail of resource objects, to help users manage. In the field of APP, there are few research on tag system. There is no tag function on the APP download platform such as app Store Play and other representative foreign countries. However, as the number of APP continues to explode in the future, it is imperative to solve the long tail problem of APP through label system. This paper is devoted to explore this new field, how to study how to be efficient and automatic. The accurate construction of APP domain label system includes four aspects: 路data preprocessing work: for the APP data specifically for the new word discovery, stop the construction of vocabulary. Experiments on 360 app data set show that preprocessing can significantly improve tag quality... Tag extraction: summarized commonly used keyword extraction, tag recommendation algorithm. According to the characteristics of multi-dimensional text in APP data, three effective label extraction algorithms are proposed. RankScOre2 algorithm, more suitable for the characteristics of APP data sets, get better tag results... Tag integration work: learn from the idea of knowledge map. Introduce a number of external data sources to build synonym relationships, partial order relationship integration tags, significantly improve APP coverage, recall rate, to solve the label clutter, non-standard problem. In this paper, the method of APP tag tree is put forward to manage APP tag, so that the multi-dimension problem of APP tag can be solved effectively. 路tag management system work: implement APP tag management system. The system integrates preprocessing, label extraction, label integration, and provides friendly interface and visual interface for label manager maintenance and label management.
【学位授予单位】:浙江大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1;TP311.56
【参考文献】
相关期刊论文 前4条
1 张子柯;周涛;张翼成;;Tag-Aware Recommender Systems:A State-of-the-Art Survey[J];Journal of Computer Science & Technology;2011年05期
2 黄昌宁;赵海;;中文分词十年回顾[J];中文信息学报;2007年03期
3 周茜,赵明生,扈e,
本文编号:1447908
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1447908.html