基于粒计算的Web信息融合方法研究
发布时间:2018-03-27 13:19
本文选题:Web挖掘 切入点:粒计算 出处:《武汉理工大学》2013年硕士论文
【摘要】:随着Internet的发展与普及,企业的运营日益扩展到Internet上,Internet已经成为世界上包含信息量最大、涵盖知识面最广的信息知识库,是全球信息传播的主要渠道,为人们提供了最有价值的信息源。Internet的迅速发展让Web信息更具多样性,人们在Internet这个广阔的选择空间中获得有用知识信息的同时也面临着巨大的挑战。传统搜索引擎存在检索结果信息冗余、不精准和碎片化的问题,用户不得不承受沉重的信息加工负担。信息融合技术已经广泛用于军事、经济和生物医学等领域,其在改善信息置信度、降低信息冗余度方面的能力为Web信息处理提供了新途径,已有的信息融合技术在处理结构化数据方面具有很好的发展前景,但是并不适用于具有非结构化、大容量并且动态变化的Web信息。基于以上问题,本文从“构造-集成”和“事件-索引”两个认知角度处理非结构化的Web信息,研究Web信息多粒度融合方法。借鉴现有的粒计算理论和Web信息融合理论,采用Web信息抽取技术抽取Web信息作为知识源,采用Web挖掘技术对Web信息进行深层挖掘,对Web信息进行提取及分析,将大量的、不确定的、非结构化Web信息转换为量化的、结构化的文本信息,研究Web信息多粒度融合模型和Web信息多粒度融合算法,主要的工作如下: (1)采用Web信息抽取技术抽取Web信息中包含的标题、正文、发布时间、信息来源等信息作为知识源,针对文本信息非结构化的特征,采用Web内容挖掘技术对Web信息进行内容上的深层挖掘,对文本信息进行量化表达;采用Web结构挖掘技术挖掘文本信息中的结构信息,设计包括概念、内容属性和链接结构属性的Web信息表示模型。 (2)研究能反映知识的粒度性的构造-集成认知模型,结合模糊商空间理论从“构造-集成”认知角度设计Web信息粒度空间模型;研究能反映知识的关联性的事件-索引认知模型,针对Web信息的特征从“事件-索引”认知角度设计Web信息粒度关联模型; (3)研究文本特征权重计算方法,针对Web信息动态更新的特征,研究主题增量聚类算法;针对同—主题信息结合Web信息粒度空间模型进行多粒度划分及表示,研究信息粒度空间生成算法;分析Web信息的内容属性以及结构属性,结合Web信息多粒度关联模型研究Web信息粒度关联融合算法。并以新浪网中新闻中心的新闻信息为实验数据,进行了实例分析,证明了本文提出的方法是有效的。
[Abstract]:With the development and popularization of Internet, the operation of enterprises is increasingly extended to the Internet. It has become the information knowledge base with the largest amount of information and the most extensive knowledge in the world, and it is the main channel of global information dissemination. Provides people with the most valuable sources of information. The rapid development of the Internet makes Web information more diverse. People are faced with great challenges while obtaining useful knowledge information in the vast choice space of Internet. Traditional search engines have the problems of redundant retrieval results, inaccuracy and fragmentation. Users have to bear a heavy burden of information processing. Information fusion technology, which has been widely used in military, economic and biomedical fields, is improving confidence in information. The ability of reducing information redundancy provides a new way for Web information processing. The existing information fusion technology has a good development prospect in dealing with structured data, but it is not suitable for unstructured data processing. Large volume and dynamic Web information. Based on the above problems, this paper deals with unstructured Web information from two cognitive perspectives, namely "structure-integration" and "event-index". This paper studies the multi-granularity fusion method of Web information, draws lessons from existing granular computing theory and Web information fusion theory, uses Web information extraction technology to extract Web information as knowledge source, and uses Web mining technology to mine Web information deeply. The Web information is extracted and analyzed. A large amount of uncertain and unstructured Web information is transformed into quantitative and structured text information. The multi-granularity fusion model of Web information and the multi-granularity fusion algorithm of Web information are studied. The main work is as follows:. Using Web information extraction technology to extract the title, text, release time, information source and other information contained in Web information as knowledge source, aiming at the unstructured features of text information. The Web content mining technology is used to mine the Web information in depth and to express the text information quantitatively, and the Web structure mining technology is used to mine the structural information in the text information. Web information representation model for content attributes and link structure attributes. 2) the construction-integrated cognitive model which can reflect the granularity of knowledge is studied, and the Web information granularity space model is designed from the perspective of "structure-integration" cognitive theory combined with fuzzy quotient space theory. This paper studies the event-index cognitive model which can reflect the relevance of knowledge, and designs the Web information granularity correlation model from the perspective of "event-index" cognition according to the characteristics of Web information. Thirdly, the paper studies the method of calculating the weight of text feature, studies the incremental clustering algorithm for the dynamic update of Web information, divides and expresses the multi-granularity of the same topic information combined with Web information granularity space model. This paper studies the algorithm of generating information granularity space, analyzes the content attribute and structure attribute of Web information, studies the fusion algorithm of Web information granularity association with Web information multi-granularity association model, and takes the news information of news center in Sina.com as experimental data. An example is given to show that the proposed method is effective.
【学位授予单位】:武汉理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP202
【参考文献】
相关期刊论文 前9条
1 张玉峰;蔡皎洁;;基于数据挖掘的Web文本语义分析与标注研究[J];情报理论与实践;2010年02期
2 张玉峰;何超;;基于Web挖掘的网络舆情智能分析研究[J];情报理论与实践;2011年04期
3 张小明;李舟军;巢文涵;;基于增量型聚类的自动话题检测研究[J];软件学报;2012年06期
4 谢刚;刘静;;粒计算研究现状及展望[J];软件;2011年03期
5 刘平峰;章佩璐;张军;余文艳;;面向主题的Web信息融合模型[J];图书情报工作;2011年08期
6 范聪贤;徐汀荣;范强贤;;Web结构挖掘中HITS算法改进的研究[J];微计算机信息;2010年03期
7 余燕芳;;基于改进遗传算法的Web文本挖掘系统[J];微电子学与计算机;2010年04期
8 张文;唐锡晋;吉田武稔;;AIS—基于文本挖掘的增强型Web信息处理技术[J];系统工程理论与实践;2010年01期
9 杨潇;马军;杨同峰;杜言琦;邵海敏;;主题模型LDA的多文档自动文摘[J];智能系统学报;2010年02期
,本文编号:1671678
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1671678.html