当前位置:主页 > 科技论文 > 软件论文 >

基于社区分析的大众分类多义词发现方法研究

发布时间:2019-06-14 20:29
【摘要】:由社会化标注系统形成的大众分类在个性化推荐领域和信息检索领域已经得到了广泛的应用。社会化标注系统的成功主要缘于用户可以随意使用标签标注资源。然而,正是这种不规范的标注方式使得社会化标注系统及大众分类长期受到语义模糊问题的困扰,阻碍着社会化标注系统进一步发展。本文针对大众分类中的多义词这一语义模糊问题开展研究。在大多数已有研究中,研究者的关注点更多集中于使用标签、资源以及它们之间的关联信息,常常忽略表现用户特征的信息。然而,作为社会化标注系统的主体,用户对于标签的理解直接影响着标签所蕴含的语义。同时,对于标签语义的挖掘也不应局限于用户集合整体层面,也应当深入到个体层面。因此,本文根据用户的兴趣信息对大众分类进行分割,分析同一个标签在不同用户社区中的上下文差异,并通过对这些差异的比较来发现大众分类中的多义词标签。具体而言,本文进行了两方面的工作。一方面,本文构建了基于用户兴趣的关系网络,并在该网络上通过社区发现算法进行用户社区发现。另一方面,本文提出了语义聚集度和语义离散度两个度量指标,其中语义聚集度用来度量上下文中的标签之间的语义相似程度,语义离散度用来度量标签在不同社区中的上下文之间的差异程度。通过这两个指标,本文可以量化地比较不同用户社区之间标签上下文的差异,进而判断标签是否为多义词标签。本文使用了Delicious数据集和Movie Lens数据集进行了实验,并于基于重叠聚类的一词多义发现算法进行了对比。实验结果证明,本文所提出的多义词发现方法优于对比方法,尤其是在拥有大量具有不同兴趣用户的数据集上表现更为明显。
[Abstract]:Public classification formed by socialized tagging system has been widely used in the field of personalized recommendation and information retrieval. The success of socialized tagging system is mainly due to the fact that users can use label tagging resources at will. However, it is this irregular tagging method that makes the socialized tagging system and the public classification suffer from the semantic ambiguity problem for a long time, which hinders the further development of the socialized tagging system. In this paper, the semantic ambiguity of polysemy in popular classification is studied. In most of the existing studies, researchers focus more on the use of tags, resources and their association information, often neglecting the information that represents the characteristics of the user. However, as the main body of socialized tagging system, users' understanding of tags directly affects the semantics of tags. At the same time, the mining of tag semantics should not be limited to the overall level of user collection, but also should go deep into the individual level. Therefore, this paper divides the popular classification according to the interest information of the user, analyzes the context difference of the same label in different user communities, and finds the polysemous word label in the popular classification through the comparison of these differences. Specifically, this paper has carried on two aspects of work. On the one hand, this paper constructs a relational network based on user interest, and carries on the user community discovery through the community discovery algorithm on the network. On the other hand, this paper proposes two metrics: semantic aggregation and semantic dispersion, in which semantic aggregation is used to measure the semantic similarity between tags in context, and semantic dispersion is used to measure the degree of difference between the contexts of tags in different communities. Through these two indicators, this paper can quantitatively compare the differences of label context among different user communities, and then judge whether the label is polysemous or not. In this paper, Delicious dataset and Movie Lens dataset are used for experiments, and the polysemy discovery algorithm based on overlapping clustering is compared. The experimental results show that the polysemy discovery method proposed in this paper is superior to the contrast method, especially on the dataset with a large number of users with different interests.
【学位授予单位】:大连理工大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1

【相似文献】

相关期刊论文 前2条

1 刘增荣;余雪丽;李志;;视听觉情感语义相干及应用研究[J];太原理工大学学报;2012年03期

2 ;[J];;年期

相关硕士学位论文 前10条

1 崔一;基于社区分析的大众分类多义词发现方法研究[D];大连理工大学;2016年

2 孙永;经济原则下解读语义模糊:对汉字“几”的研究[D];山东大学;2008年

3 侯丽娟;语义模糊的认知探索及其启示[D];厦门大学;2007年

4 董志强;语义模糊初探[D];四川大学;2002年

5 张海;《红楼梦》中语义模糊数字的翻译[D];沈阳师范大学;2012年

6 韩红红;[D];西安外国语大学;2011年

7 马洁;语义场理论与语义模糊性研究[D];河北大学;2008年

8 双元凤;从语用功能角度看《朝花夕拾》中副词的语义模糊研究及翻译策略[D];中南大学;2013年

9 郑丽;语义模糊及其翻译策略[D];山西大学;2006年

10 张爱珍;语义模糊的认知分析[D];福建师范大学;2002年



本文编号:2499669

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2499669.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户e660a***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com