基于机器学习的用户反馈数据中心设计与实现
发布时间:2018-06-05 16:37
本文选题:用户反馈 + 文本分类 ; 参考:《北京交通大学》2017年硕士论文
【摘要】:该项目是来源于百度公司度秘产品线的实际项目,属于互联网人工智能领域。度秘是新一代智能操作系统的杰出代表,以NLP(Natural Language Processing,自然语言处理)技术为基础,明确用户需求,提供相应服务。在该产品线上,每天要接收到十万量级的用户评论和用户反馈,数据量十分庞大。通过对用户反馈进行分类筛选,可以得到用户对于当前产品使用体验的相关问题和建议,直观的反映出当前版本产品所存在的问题和亟待优化的部分,从而引导迭代需求,也为质量保证人员跟踪线上问题提供了依据。数量庞大的用户反馈数据的文本分类和筛选成为问题的关键,但是当前的解决方法为人工从线上数据库中导出部分数据,并进行人工分类筛选有用的反馈。论文运用机器学习的方法,设计和实现了用户反馈数据中心平台,将户反馈数据导入到平台中,可以高效且准确地对庞大数据量的用户反馈文本进行分类筛选,进行分类展现和统计,方便相关人员进行查阅并跟进用户反馈问题的原因排查和问题解决。用户反馈数据中心平台系统可划分成三大部分:用户反馈数据的拉取、反馈数据分类筛选和用户反馈数据中心。其中,用户反馈数据的拉取利用Python 编写相关轮询 API(Application Programming Interface,应用程序编程接口)从公司统一的用户反馈平台上拉取该产品线的所有反馈数据并根据需要重新组织数据格式,并存储到Hbase中;反馈数据的分类筛选利用机器学习中的遗传算法等相关算法,完成特征词的提取,优化分类以及数据根据特征词进行相应的数据分类筛选;数据中心基于PHP和MySQL,实现数据的分类展现、条件查询、反馈问题跟踪处理等功能。论文完成了用户反馈数据中心平台系统的需求分析、总体设计、详细设计、测试验证等具体工作。本人参与设计和开发了用户反馈数据拉取、基于机器学习的反馈数据分类筛选和数据平台中的相关功能。目前论文完成的用户反馈数据中心平台系统已经上线投入使用,数据分类合格率达到91%以上。用户反馈数据中心极大地提高了用户反馈处理的效率,并释放了数据人力,获得了部门领导和同事的一致好评。
[Abstract]:The project is derived from Baidu Company's secret product line of the actual project, belong to the field of artificial intelligence on the Internet. Degree secret is an outstanding representative of the new generation of intelligent operating system. It is based on the NLP Natural language processing (NLP) technology to define the user's needs and provide the corresponding services. In this product line, we receive 100,000 comments and feedback every day, and the amount of data is very large. By classifying and filtering the user feedback, we can get the relevant problems and suggestions of the user for the current product use experience, and intuitively reflect the problems existing in the current version of the product and the parts that need to be optimized so as to guide the iterative requirements. It also provides the basis for the quality assurance personnel to track the problems on the line. A large number of user feedback data text classification and filtering become the key to the problem, but the current solution is to manually export part of the data from the online database, and carry out manual classification filtering useful feedback. In this paper, the user feedback data center platform is designed and implemented by the method of machine learning, and the household feedback data is imported into the platform, which can efficiently and accurately classify and filter the user feedback text of the huge amount of data. Conduct classification presentation and statistics, facilitate related personnel to consult and follow up user feedback problem of the cause and problem solving. The system of user feedback data center platform can be divided into three parts: the pulling of user feedback data, the classification and filtering of feedback data and the user feedback data center. Among them, the pull of the user feedback data uses Python to write the related polling API Application programming Interface (API) from the company's unified user feedback platform to pull all the feedback data of the product line and reorganize the data format according to the need. The feedback data is classified and filtered by genetic algorithm in machine learning to extract the feature words, optimize the classification and select the corresponding data according to the feature words. The data center is based on PHP and MySQL to realize the functions of data classification, conditional query, feedback problem tracking and so on. In this paper, the requirements analysis, overall design, detailed design, test and verification of the user feedback data center platform system are completed. I have participated in the design and development of user feedback data extraction, feedback data classification and filtering based on machine learning and related functions in the data platform. At present, the user feedback data center platform system has been put into use, and the qualified rate of data classification is over 91%. The user feedback data center greatly improves the efficiency of user feedback processing, and releases the data manpower, which is well received by department leaders and colleagues.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.52;TP181
【参考文献】
相关期刊论文 前4条
1 何永;;一种元数据驱动数据仓库设计与应用[J];科技创新与应用;2014年02期
2 赵龙;江荣安;;基于Hive的海量搜索日志分析系统研究[J];计算机应用研究;2013年11期
3 黄楠;;海量信息存储中数据库性能优化方法[J];科技通报;2013年03期
4 张海军;彭成;栾静;;基于外部排序的字串左右熵快速计算方法[J];计算机工程与应用;2011年19期
,本文编号:1982651
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1982651.html