面向质量安全的元搜索数据采集系统的设计与实现
[Abstract]:At present, quality and safety problems occur frequently, and with the popularity of the Internet, quality and safety issues are more and more discussed by the public on the Internet. Comments on quality and safety published on the Internet and Internet media reports on quality and safety can be used as textual data for quality and safety analysis. Therefore, the Internet can become the data source of quality and safety information acquisition, which provides the data basis for quality and safety analysis. In this paper, a data acquisition system based on meta-search is designed and implemented, which is responsible for collecting web pages related to quality and safety. In this paper, meta-search engine is no longer the traditional way to use, but is used to collect data according to the query words set by the user. The function of the system is mainly divided into three functional blocks: meta-search query, web page extraction and correlation determination. The different meta-search engines are encapsulated in the meta-search function block, and the query is managed by priority scheduling. In the function block of web page extraction, two methods based on template analysis and statistical analysis are adopted: template analysis is mainly responsible for the extraction of result links, and statistical analysis is used as a general text extraction method. The classification algorithm of support vector machine is used to filter the quality and safety related data and remove the noise information in the correlation decision function block. Finally, the paper tests the effect of web page extraction and classification, and shows the results of the system. Because the quality and safety related data are scattered on the Internet and the data characteristics are obvious, this paper abandons the use of targeted crawler mode to collect data, and makes an attempt to use meta-search engine for data acquisition. This paper has certain reference significance to other fields of data acquisition research.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP274.2
【参考文献】
相关期刊论文 前10条
1 吴东辰;;国内外几种主要搜索引擎比较[J];福建图书馆理论与实践;2005年04期
2 王琦,唐世渭,杨冬青,王腾蛟;基于DOM的网页主题信息自动提取[J];计算机研究与发展;2004年10期
3 孟军;刘秋水;王秀坤;;节点频度和语义距离相结合的网页正文信息抽取[J];计算机工程与应用;2009年01期
4 彭洪汇;林作铨;;Internet上的搜索引擎和元搜索引擎[J];计算机科学;2002年09期
5 陆安江;董旭晖;;个性化元搜索引擎模型的研究与设计[J];计算机与现代化;2011年01期
6 孙承杰,关毅;基于统计的网页正文信息抽取方法的研究[J];中文信息学报;2004年05期
7 詹勇;;质量安全是企业首要责任[J];决策导刊;2008年10期
8 李纲;戴强斌;;WNBTE网页正文抽取方法研究[J];情报科学;2008年03期
9 龚蛟腾;元搜索引擎研究[J];情报杂志;2004年10期
10 原福永;梁顺攀;;元搜索引擎的现状与发展[J];计算机工程与设计;2005年12期
相关博士学位论文 前1条
1 杜亚军;搜索引擎智能行为的研究及实现[D];西南交通大学;2005年
相关硕士学位论文 前3条
1 王春艳;元搜索引擎的研究与实现[D];吉林大学;2011年
2 陈剑敏;基于Bayes方法的文本分类器的研究与实现[D];重庆大学;2007年
3 吴鹏;支持向量机文本分类算法的研究及其应用[D];大连理工大学;2009年
本文编号:2474318
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2474318.html