基于链接信誉分析的网页权威排序分类算法研究

发布时间：2018-03-27 13:41

本文选题：文本分类　切入点：链接分析　出处：《暨南大学》2012年硕士论文

【摘要】：随着互联网普及，网页数量呈指数增长，用户通过现有搜索引擎进行网页搜索时存在很大困难。究其原因，一是搜索引擎返回结果存在主题混杂，没有根据主题对网页搜索结果进行分类，这增加了用户搜索所需主题类型信息的困难。二是搜索引擎返回检索结果存在网页质量参差不齐（存在垃圾网页，垃圾广告），增加用户筛选高质量信息的困难。针对上述问题，本文做了一下工作。首先，为了解决搜索引擎返回结果中的网页主题混杂现象，本文将对网页进行主题类别标识，用户可以选择自己需要信息主题类别搜索，从而更快更准确定位到所需信息。其次为了提高网页文本分类准确度，提出基于特征噪声加权的特征权重算法方法，该方法通过降低用词不规范特征噪声对网页文本分类影响，提高网页文本分类的准确度和健壮性。再次，针对用户检索的网页质量参差不齐问题，本文把市场经济中的商家信誉模型引入到对网页权威的评价排序。通过挖掘历史链接信誉评价，建立与PageRank算法结合的评价模型对网页进行调整排序，有效提高搜索结果排在前面网页的质量，有效激励网页生产者专注创造高质量的网页。最后，应用本文思想建立一个系统模型，从而证明本文思想的可用性。
[Abstract]:With the popularity of the Internet, the number of web pages has increased exponentially, and it is very difficult for users to search through existing search engines. Web search results are not classified according to topics, which makes it more difficult for users to search for subject type information. Spam advertising increases the difficulty for users to screen high quality information. In view of the above problems, this paper has done some work. First of all, in order to solve the problem of web page topic confusion in the result of search engine return, this paper will carry on the topic category identification to the web page, the user can choose their own information topic category search, thus locate the required information more quickly and accurately. Secondly, in order to improve the accuracy of web page text classification, a feature weight algorithm based on feature noise weighting is proposed, which can reduce the influence of non-standard feature noise on web text classification. Improve the accuracy and robustness of web page text classification. Thirdly, aiming at the uneven quality of web pages retrieved by users, this paper introduces the reputation model of merchants in the market economy into the ranking of the evaluation of web page authority. An evaluation model combined with PageRank algorithm is established to adjust and sort web pages, which can effectively improve the quality of search results in front of web pages and encourage web page producers to focus on creating high quality web pages. Finally, a system model is established by using the idea of this paper to prove the usability of this idea.
【学位授予单位】：暨南大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP393.092

【参考文献】