当前位置:主页 > 管理论文 > 移动网络论文 >

网络表格的实体列发现方法研究

发布时间:2018-04-14 09:18

  本文选题:网络表格 + 实体列 ; 参考:《北京交通大学》2017年硕士论文


【摘要】:互联网上包含大量价值极高的网络表格,机器却无法理解,只有标注出表格的语义信息,才能更好地利用这些结构化数据。网络表格的实体列在一定程度上表达了表格的语义信息,准确地探测实体列能够大大提升机器对表格语义的理解程度。目前,人们提出了基于知识库的实体列发现方法,这些方法仅仅依靠表头和知识库信息的匹配情况来进行实体列发现,不仅对一些表头语义模糊或者其表头不存在于知识库的表格无能为力,而且不能发现多实体列表格中的具体实体属性关系,并且算法的准确率和执行时间方面的表现也不理想。本文提出基于属性间依赖关系的实体列发现方法,主要研究工作如下:(1)提出一种基于属性间依赖关系的实体列发现方法。该方法不依赖知识库和表头信息,不仅提高了实体列的发现效率,而且增强了算法适用性。(2)提出一种适应网络表格特点的近似函数依赖检测方法。考虑表格中的噪声因素,使其能更加准确地表达网络表格属性间的函数依赖关系。(3)提出实体属性依赖强度的概念,并由此定义实体列的语义强度。由实体属性之间的依赖强度判断实体列的语义强度,进而提高最强实体列探测的准确度。(4)在基于属性间依赖关系算法的基础上引入实体属性依赖强度的概念。不仅可以按照实体列的语义强度进行实体列发现,而且还能够根据实体属性的依赖强度标注具体关系。大量实验结果表明,本文提出的近似函数依赖检测方法具有明显的降噪作用。本文提出的基于属性间依赖关系的实体列发现方法均在有效性和时间效率上有优秀的表现,并且适用性更强。
[Abstract]:The Internet contains a large number of high value network tables, but the machine can not understand them. Only by marking the semantic information of the tables can we make better use of these structured data.The entity column of the network table expresses the semantic information of the table to a certain extent, and the accurate detection of the entity column can greatly improve the machine's understanding of the table semantics.At present, people put forward entity column discovery methods based on knowledge base. These methods only rely on the matching of header and knowledge base information to carry out entity column discovery.Not only the semantic ambiguity of some table heads or tables whose heads do not exist in the knowledge base are powerless, but also the specific entity attribute relationships in multi-entity column tables can not be found, and the performance of the algorithm in terms of accuracy and execution time is not satisfactory.In this paper, an entity column discovery method based on attribute dependency is proposed. The main research work is as follows: 1) an entity column discovery method based on attribute dependency is proposed.This method does not rely on knowledge base and header information. It not only improves the efficiency of entity column discovery, but also enhances the applicability of the algorithm.Considering the noise factor in the table, it can more accurately express the functional dependency relationship between the attributes of the network table.) the concept of entity attribute dependency intensity is proposed, and the semantic strength of the entity column is defined.The semantic strength of the entity column is judged by the dependency strength between the entity attributes, and the accuracy of the strongest entity column detection is improved. (4) the concept of entity attribute dependency strength is introduced based on the algorithm based on the dependency relationship between the attributes.Not only can the entity column be discovered according to the semantic strength of the entity column, but also the specific relationship can be labeled according to the dependent strength of the entity attribute.A large number of experimental results show that the proposed approximate function dependence detection method has obvious noise reduction effect.The method of entity column discovery based on attribute dependency in this paper has excellent performance in efficiency and time efficiency, and is more applicable.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP393.0

【参考文献】

相关期刊论文 前7条

1 黎章海;潘久辉;;基于函数依赖的导出关系候选码计算[J];计算机工程;2016年05期

2 李卫榜;李战怀;陈群;杨婧颖;姜涛;;分布式大数据不一致性检测[J];软件学报;2016年08期

3 孙纪舟;李建中;高宏;刘显敏;;微函数依赖及其推理[J];计算机学报;2016年10期

4 苗东菁;刘显敏;李建中;;概率数据库中近似函数依赖挖掘算法[J];计算机研究与发展;2015年12期

5 贾长云;程永上;;HTML表格向XML的智能转换[J];计算机工程;2009年14期

6 任仲晟;薛永生;;基于页面标签的Web结构化数据抽取[J];计算机科学;2007年10期

7 张守志,施伯乐;一种发现函数依赖集的方法及应用[J];软件学报;2003年10期

相关硕士学位论文 前1条

1 任向冉;网络表格的实体列发现与标识[D];北京交通大学;2015年



本文编号:1748685

资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1748685.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户f5be5***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com