基于分类算法的移动互联网视频UGC质量评价研究
发布时间:2018-05-10 14:28
本文选题:用户生成内容 + 主成分分析 ; 参考:《北京邮电大学》2017年硕士论文
【摘要】:随着互联网技术的快速发展,由用户主导生成内容的Web2.0时代逐步发展成熟,近几年移动互联网技术的发展更是给人们的生活带来巨大变革。用户生成内容(UGC)作为Web 2.0环境下一种新的内容生成及组织形式,受到大家的广泛关注。目前视频分享网站、微博、博客、问答社区是比较主流的移动互联网UGC业务形式。移动互联网为UGC的发展注入了新的动力,近几年用户生成内容的数量快速增加,然而也暴露出越来越多的质量问题,其整体质量还有待改善。选择科学的评价方法对UGC质量做出评价,才能使以UGC业务为主的网络平台的环境得到改善,在用户生成内容质量评价的基础上制定合理的激励措施有利于让用户生成更多优质内容。本文选取移动互联网视频UGC为研究对象,通过挖掘视频相关的指标数据,使用分类算法对其质量做出评价。在以往研究和视频UGC本身特点的基础上,构建了包含对象层、维度层、测度层的质量评价框架。对象层包括视频制作水平、视频内容本身、视频观看体验、视频内容效用四个维度,确保对视频内容做出全面、准确的评价。本研究在维度层指标的基础上还设计了维度层量化指标,根据维度层量化指标对视频进行人工打分,使用主成分分析方法确定指标权重,人工打分和指标权重共同得到了基于主成分分析的视频UGC质量,然后对视频质量做出高、低质量分类。本文还构建了视频UGC内容、用户交互关系模型,测度层指标均取自该模型。基于主成分分析的视频质量分类结果和测度层指标共同构成了基于分类算法的视频质量评价模型,本文将应用于该模型的样本数据分为训练样本集和测试样本集,使用训练样本集对模型进行训练,然后使用训练后的模型对测试样本集进行质量分类预测,结果表明该模型具有很强的可操作性和科学性。本文选取优酷APP自频道的用户生成视频内容进行实证分析,首先抓取了 892条视频的测度层指标数据,然后通过问卷调查及主成分分析的方法得到这些视频的质量分类情况。测度层指标数据和基于主成分分析的视频质量分类结果共同构成了样本数据,对样本数据进行样本平衡处理后,将样本数据分为训练样本集和测试样本集。以测试样本集为例,基于C5.0分类算法的质量评价模型对视频质量的分类预测准确率达到94.62%。最后本文还对四种分类算法的分类预测结果进行了误差对比及收益对比,结果表明C5.0算法的预测准确率及预测收益均为最好。
[Abstract]:With the rapid development of Internet technology, the Web2.0 era of user-led content has gradually developed and matured. In recent years, the development of mobile Internet technology has brought great changes to people's lives. User generated content (UGC), as a new form of content generation and organization in Web 2.0 environment, has attracted wide attention. At present, video sharing website, Weibo, blog, Q & A community is the mainstream mobile Internet UGC service form. Mobile Internet has injected new impetus into the development of UGC. In recent years, the number of user-generated content has increased rapidly, but also exposed more and more quality problems, its overall quality needs to be improved. Only by choosing scientific evaluation methods to evaluate the quality of UGC, can the environment of network platform based on UGC service be improved. On the basis of quality evaluation of user-generated content, reasonable incentive measures can help users to generate more high-quality content. This paper selects the mobile Internet video UGC as the research object, through mining the video related index data, uses the classification algorithm to evaluate its quality. Based on the previous research and the characteristics of video UGC, a quality evaluation framework including object layer, dimension layer and measure layer is constructed. The object layer includes four dimensions: video production level, video content itself, video viewing experience, and video content utility, to ensure a comprehensive and accurate evaluation of video content. On the basis of dimensionality index, this study also designs dimension level quantization index, according to dimension level quantization index, the video is scored manually, and principal component analysis method is used to determine index weight. The quality of video UGC based on principal component analysis (PCA) is obtained by artificial scoring and index weight, and then the video quality is classified with high and low quality. This paper also constructs a video UGC content, user interaction model, the measurement layer indicators are taken from the model. The video quality classification results based on principal component analysis (PCA) and the measurement level indexes constitute the video quality evaluation model based on classification algorithm. In this paper, the sample data applied to the model are divided into training sample set and test sample set. The training sample set is used to train the model, and then the trained model is used to predict the quality of the test sample set. The results show that the model is feasible and scientific. In this paper, the user-generated video content of Youku APP self-channel is selected for empirical analysis. First, we grab the measure layer index data of 892 videos, and then obtain the quality classification of these videos by questionnaire and principal component analysis. The measure layer index data and the video quality classification results based on principal component analysis (PCA) constitute the sample data. After the sample data is balanced, the sample data is divided into the training sample set and the test sample set. Taking the test sample set as an example, the accuracy of video quality classification and prediction based on C5.0 classification algorithm is 94.622. Finally, the error and income of the four classification algorithms are compared, and the results show that the C5.0 algorithm has the best prediction accuracy and revenue.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:F49
【参考文献】
相关期刊论文 前10条
1 徐蒙;祝仁涛;;新媒体视域下UGC模式的法律风险及其防范——以网络直播为例[J];浙江传媒学院学报;2016年04期
2 郑志刚;陆杰华;;我国涉老互联网信息服务企业现状研究[J];人口与发展;2016年04期
3 范佳佳;叶继元;;基于结构方程的科技网站信息质量评价模型构建及应用[J];图书馆杂志;2016年09期
4 金燕;;国内外UGC质量研究现状与展望[J];情报理论与实践;2016年03期
5 李贺;张世颖;;移动互联网用户生成内容质量评价体系研究[J];情报理论与实践;2015年10期
6 汪旭晖;张其林;;用户生成内容质量对多渠道零售商品牌权益的影响[J];管理科学;2015年04期
7 聂进;郭章根;;网络金融信息服务质量评价研究——以垂直财经网站为例[J];图书情报知识;2014年06期
8 聂卉;;基于内容分析的用户评论质量的评价与预测[J];图书情报工作;2014年13期
9 丁敬达;;维基百科词条信息质量启发式评价框架研究[J];图书情报知识;2014年02期
10 李蕾;王冕;章成志;;区分标签类型的社会化标签质量测评研究[J];图书情报工作;2013年23期
相关硕士学位论文 前2条
1 方鹏程;用户贡献内容质量评价研究[D];北京邮电大学;2011年
2 陶青;基于信息构建(IA)的Web2.0网站研究[D];华东师范大学;2007年
,本文编号:1869675
本文链接:https://www.wllwen.com/jingjilunwen/xxjj/1869675.html