面向房产领域的垂直搜索引擎研究与实现
[Abstract]:With the rapid development of the Internet, network information increases exponentially. In such a large amount of information needs search engine location needs information. Although the general search engine can solve the problem of resource location to a certain extent, its effect is not ideal, and it is difficult to reach the user's search demand for the information retrieval in the specialized field. The emergence of vertical search engine is to solve the shortcomings of general search engine in the professional field, and its deep mining of information in a specific field makes up for the shortcomings of general search engine information. In this paper, the key technologies of vertical search engine are studied in theory and practice. This paper first introduces the research background and significance, the classification of search engines and the development of vertical search engines at home and abroad. Secondly, the basic working principle, system structure and key technology of vertical search engine are introduced. Then, the theme representation of the web page is introduced in detail, the theme feature vector is constructed, and the distribution feature of the theme page is analyzed. In this paper, the content based topic correlation decision and the link structure based topic relevance judgment are studied in depth, and their shortcomings and shortcomings are analyzed. A topic crawler algorithm based on web content and web link structure is designed by introducing the importance of web pages on the basis of content-based topic correlation judgment. For the topic isolated island problem in the topic crawler, a tunnel crossing algorithm based on dynamic adjustment of maximum depth is designed, which to some extent alleviates the problem of network islanding. Then, a vertical search engine based on the real estate field is designed, the system is systematically analyzed, the overall framework of the system is designed, and the design and implementation of each sub-function module are introduced in detail. The performance analysis and function test of the system are also done. Finally, the work of the paper is summarized, and further research work is proposed.
【学位授予单位】:南昌大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3
【参考文献】
相关期刊论文 前10条
1 高琴;;HITS算法探究[J];信息安全与技术;2012年02期
2 张晓刚,李明树;智能搜索引擎技术的研究与发展[J];计算机工程与应用;2001年24期
3 赫建营;晏海华;金茂忠;刘超;;结合本体筛选和文本挖掘的垂直搜索引擎研究[J];计算机科学;2008年02期
4 黄德才;戚华春;;PageRank算法研究[J];计算机工程;2006年04期
5 陈钊;张冬梅;;Web信息抽取技术综述[J];计算机应用研究;2010年12期
6 苏成;潘云涛;袁军鹏;马峥;郭红;张玉华;俞征鹿;胡志宇;;基于PageRank的期刊评价研究[J];中国科技期刊研究;2009年04期
7 胡永锋;;浅谈垂直搜索引擎的工作原理[J];科学大众(科学教育);2011年06期
8 孙西全;马瑞芳;李燕灵;;基于Lucene的信息检索的研究与应用[J];情报理论与实践;2006年01期
9 何晓阳,吴强,吴治蓉;HITS算法与PageRank算法比较分析[J];情报杂志;2004年02期
10 刘琨,郑有才;搜索引擎剖析[J];微机发展;2004年03期
相关硕士学位论文 前8条
1 周源;基于本体的语义垂直搜索引擎研究[D];北京交通大学;2011年
2 李宜兵;基于搜索引擎网页排序算法研究[D];沈阳理工大学;2011年
3 冯运;信息检索中的查询算法研究[D];湖南大学;2007年
4 海涛;垂直搜索引擎数据采集技术的研究与实现[D];华北电力大学(北京);2008年
5 张慧;旅游信息垂直搜索系统的设计与实现[D];北京邮电大学;2009年
6 孙逸雪;基于时态信息的主题搜索引擎的研究与实现[D];中国科学技术大学;2009年
7 贺晟;搜索引擎中主题网络爬虫的研究与设计[D];安徽大学;2010年
8 龚勇;搜索引擎中网络爬虫的研究[D];武汉理工大学;2010年
,本文编号:2313653
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2313653.html