基于Spark的直播视频场景分类系统的分析与实现

发布时间：2018-04-18 01:01

本文选题：场景分类 + 直播视频　；参考：《北京交通大学》2017年硕士论文

【摘要】：网络直播是继传统广播电视后的新视频文化,至今共经历了三个发展阶段。从优酷等网站上传个人视频,到YY等网页端的"秀场"直播,再到如今的"随走、随看、随播"的移动视频直播时代,共经历六年,而二、三阶段仅经历三年不到却占据现今直播的主流市场。据不完全统计,目前全国在线直播平台数量超过200家,在方正证券的预测中,2020年网络直播市场规模将达到600亿元,直播将是微博、微信之后的第三波移动互联网流量中心。由于缺少高端设备和后期制作的支持,画面质量问题一直是网络直播关注的重要问题之一。目前如百度云直播、阿里云直播等平台都向用户提供了针对不同录制场景的画质优化方案,但是针对同一个视频流,直播平台一般都使用同一个优化算法进行优化,这也就意味着在直播过程中更换场景后对应使用的视频流优化算法并不是最合适的。因此,本项目提出了基于Spark的直播视频场景分类系统对视频流进行实时分类从而为实现画质优化方案的动态变化提供依据。本项目结合任务消息队列的思想在Spark核心框架上使用Spark Streaming流式计算框架对多个视频流进行并行实时处理与分类。视频的处理包括视频流数据转图像帧数据,对图像帧进行转灰度处理、直方图均衡化处理、HOG特征提取处理以及光流特征提取处理。同时,本项目基于AlexNet模型改进并建立了接收多路输入的深度卷积神经网络分类模型,多路AlexNet(Multi-Stream AlexNet,MSAN)模型。本项目使用该模型对视频的图像数据进行场景分类,并且按照视频推流的最小单元对连续图像帧进行分组,统计组内分类记票以确定该组视频数据的场景类别从而实现直播视频的实时分类。目前项目已经训练得到分类准确率平均为98%的MSAN模型,场景分类系统也在集群上部署实现,并且完成了系统的单元测试与系统测试。
[Abstract]:Webcast is a new video culture after traditional radio and television, which has experienced three stages of development.From websites such as Youku uploading personal videos, to the "show" live broadcast at the end of the YY web page, and to the mobile video broadcasting era of "follow, watch and broadcast" now, a total of six years, and two,Three stages only experienced less than three years but occupied the current mainstream market of live broadcast.According to incomplete statistics, there are more than 200 online live broadcasting platforms in the country at present. In the forecast of Fang Zheng Securities, the market scale of live webcast will reach 60 billion yuan in 2020, and live broadcast will be the third wave of mobile Internet traffic center after Weibo and WeChat.Due to the lack of high-end equipment and post-production support, picture quality has been one of the most important issues in webcast.At present, such platforms as Baidu Cloud Live and Ali Cloud Live provide users with the optimization scheme for different recorded scenes, but for the same video stream, the live broadcast platform generally uses the same optimization algorithm to optimize.This means that the corresponding video stream optimization algorithm is not the most suitable after changing the scene during the live broadcast.Therefore, this project proposes a live video scene classification system based on Spark to classify video streams in real time so as to provide the basis for the dynamic change of image quality optimization scheme.Based on the idea of task message queue, this project uses Spark Streaming streaming computing framework to process and classify multiple video streams in parallel and real-time on the Spark core framework.Video processing includes video stream data to image frame data, image frame to gray processing, histogram equalization processing to hog feature extraction and optical flow feature extraction processing.At the same time, based on the AlexNet model, the classification model of the deep convolution neural network for receiving multiple inputs, the multichannel AlexNet(Multi-Stream AlexNet MSAN model, is established.The project uses the model to classify the scene of the video image data, and groups the continuous image frames according to the smallest unit of the video push stream.In order to determine the scene classification of the video data, the real-time classification of live video can be realized.At present, the project has been trained to get the MSAN model with an average classification accuracy of 98%, and the scene classification system has been deployed and implemented on the cluster, and the unit test and system test of the system have been completed.
【学位授予单位】：北京交通大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【参考文献】