Web视频信息提取研究
发布时间:2018-07-17 16:28
【摘要】:在这个信息的时代,网络信息量急剧增加,像百度、Google等通用搜索引擎越来越体会到庞大数据库所带来的查找速度慢、硬件要求高等压力,除此之外,在查找准确率、统一存储、统一显示方面,通用搜索引擎也存在不小的难题,在这种环境下,关注于特定领域的垂直搜索引擎蓬勃发展起来,它没有通用搜索引擎拥有的广度,但它避免了上述缺点。 最近这些年,视频网站如雨后春笋般出现在网络用户面前,由于各视频网站的显示风格和视频数据库不尽相同,所以如何方便、准确地反馈给用户需要的视频是当今需要解决的问题。另外,有些不法商家、用户在网络上散布扭曲事实视频或色情视频等,这对公众产生了不良影响,相关管理部门需要统一检索网络视频的工具。虽然现在有些搜索引擎与某些视频网站合作,通过传递视频相关信息的方式达到视频的统一检索,但是参与合作的都是较大的视频网站,所以要达到更大范围的视频检索就需要使用Web视频信息提取。 作为垂直搜索引擎和Web视频统一检索的交集,Web视频信息提取得到了人们的重视并将发挥更大的作用,然而在实现过程中,现有的一些网页分类方法和网页净化方法并没有充分考虑Web视频网页的特点,这就造成了效果不佳的困境。 本文从Web视频网站实际出发,首先通过分析视频网站上网页的分类,得出通过对视频播放页进行信息提取可以得到很好效果的结论,然后根据视频播放页的特点描述了通过模板、视觉特征、特征脚本等信息进行网页分类的方法,最后在网页净化方面,可以将视频播放页的噪声分为三类:背景噪声、随机噪声和残留噪声,可以分别通过模板、网页结构和语义分析进行消除。通过实验对比、分析,也证明了在Web视频信息提取中,本文描述的网页分类和网页净化方法能够达到很好的效果。
[Abstract]:In this era of information, the amount of information on the network has increased dramatically. General search engines such as Baidu and Google have increasingly realized the pressure of slow search speed and high hardware requirements brought by large databases. In addition, they are looking for accuracy and storing them uniformly. In the unified display aspect, the common search engine also has the big difficulty, in this environment, the vertical search engine that focuses on the specific domain flourishes, it does not have the breadth which the general search engine has, but it avoids the above shortcoming. In recent years, video websites have sprung up in front of Internet users. Because the display styles and video databases of different video websites are different, how convenient are they? Accurate feedback to users needs video is the problem that needs to be solved today. In addition, some illegal businesses, users on the Internet to distribute distorted fact video or pornographic video, which has a negative impact on the public, relevant management departments need to unify the retrieval of network video tools. Although some search engines now cooperate with some video websites to achieve unified video retrieval by means of transmitting video related information, they are all involved in the cooperation of larger video websites. Therefore, to achieve a wider range of video retrieval, we need to use Web video information extraction. As a vertical search engine and Web video retrieval, intersecting Web video information extraction has been paid more attention to and will play a more important role. However, in the process of implementation, Some existing web page classification methods and page purification methods do not fully take into account the characteristics of Web video pages, which has resulted in a difficult situation. Starting from the reality of the web video website, this paper first analyzes the classification of the web page on the video website, and draws the conclusion that the information extraction of the video playing page can get a good effect. Then, according to the characteristics of video playing pages, the paper describes the methods of classifying web pages by template, visual features, feature scripts, etc. Finally, in the aspect of page purification, the noise of video playing pages can be divided into three categories: background noise, and so on. Random noise and residual noise can be eliminated by template, page structure and semantic analysis respectively. Through experimental comparison and analysis, it is also proved that the methods of web page classification and page purification described in this paper can achieve good results in Web video information extraction.
【学位授予单位】:武汉理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
本文编号:2130240
[Abstract]:In this era of information, the amount of information on the network has increased dramatically. General search engines such as Baidu and Google have increasingly realized the pressure of slow search speed and high hardware requirements brought by large databases. In addition, they are looking for accuracy and storing them uniformly. In the unified display aspect, the common search engine also has the big difficulty, in this environment, the vertical search engine that focuses on the specific domain flourishes, it does not have the breadth which the general search engine has, but it avoids the above shortcoming. In recent years, video websites have sprung up in front of Internet users. Because the display styles and video databases of different video websites are different, how convenient are they? Accurate feedback to users needs video is the problem that needs to be solved today. In addition, some illegal businesses, users on the Internet to distribute distorted fact video or pornographic video, which has a negative impact on the public, relevant management departments need to unify the retrieval of network video tools. Although some search engines now cooperate with some video websites to achieve unified video retrieval by means of transmitting video related information, they are all involved in the cooperation of larger video websites. Therefore, to achieve a wider range of video retrieval, we need to use Web video information extraction. As a vertical search engine and Web video retrieval, intersecting Web video information extraction has been paid more attention to and will play a more important role. However, in the process of implementation, Some existing web page classification methods and page purification methods do not fully take into account the characteristics of Web video pages, which has resulted in a difficult situation. Starting from the reality of the web video website, this paper first analyzes the classification of the web page on the video website, and draws the conclusion that the information extraction of the video playing page can get a good effect. Then, according to the characteristics of video playing pages, the paper describes the methods of classifying web pages by template, visual features, feature scripts, etc. Finally, in the aspect of page purification, the noise of video playing pages can be divided into three categories: background noise, and so on. Random noise and residual noise can be eliminated by template, page structure and semantic analysis respectively. Through experimental comparison and analysis, it is also proved that the methods of web page classification and page purification described in this paper can achieve good results in Web video information extraction.
【学位授予单位】:武汉理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前6条
1 胡军伟;秦奕青;张伟;;正则表达式在Web信息抽取中的应用[J];北京信息科技大学学报(自然科学版);2011年06期
2 黄子越;万常选;;XML检索中基于聚类的查询词扩展[J];电子科技大学学报;2009年S1期
3 张鑫;陈梅;王翰虎;王嫣然;;基于视觉特征和领域本体的Web信息抽取[J];计算机技术与发展;2011年02期
4 陈旭春 ,赵明生;分布式多搜索引擎系统的研究与实现[J];微计算机信息;2005年20期
5 李志义;;网络爬虫的优化策略探略[J];现代情报;2011年10期
6 易荣锋;朱六璋;尹文科;;互联网视频摘要信息自动抽取[J];计算机系统应用;2010年10期
相关硕士学位论文 前4条
1 张瑞雪;基于DOM树的网页相似度研究与应用[D];大连理工大学;2011年
2 李少波;支持语义的分布式视频检索系统的设计与实现[D];中国科学技术大学;2011年
3 吕韩飞;主题(topical)crawler及其应用——主题搜索引擎[D];浙江大学;2005年
4 袁宇丽;基于HTML网页的Web信息提取研究[D];电子科技大学;2006年
,本文编号:2130240
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2130240.html