社区问答系统中答案排序迁移学习的方法研究
发布时间:2018-02-28 05:22
本文关键词: 社区问答系统 用户特征 排序学习 迁移学习 排序模型 出处:《昆明理工大学》2017年硕士论文 论文类型:学位论文
【摘要】:随着互联网技术的不断发展使得人们获取知识、解决问题的方式变得越来越便捷。传统的搜索引擎公司,例如雅虎、谷歌等为日益增多的互联网用户提供了更为方便的信息获取方式,用户可以通过在搜索对话框中输入相关关键词从而快速得到自己想要的信息。但是随着互联网的普及以及互联网自身内容的不断丰富,人们在获取答案的同时,也对得到最佳答案的便易性提出了更高的要求。基于社区问答的个性化服务有效的弥补了传统搜索引擎技术上的不足从而越来越受到各个互联网公司的重视。社区问答系统是一种新兴知识共享模式,通过用户提交问题和答案,社区积累了大量的问答对(question answering pairs)。当用户提交新问题时,如何通过排序,为用户提供准确的答案序列,是社区问答系统的重要环节。传统的排序算法主要利用监督学习的方法构建排序模型,它需要通过大量人工标记数据来训练模型。目前国内外学者提出了许多基于监督排序学习的方法并且在实际生活中得到了很好的应用,例如排序支持向量机,它就是基于监督学习的排序算法中的典型代表,通过大量的标注数据,输入到指定的学习机当中,然后自动训练得到一个排序模型。基于监督排序学习的方法往往需要相当规模的标注数据,保证训练模型的可靠性,但是在实际环境当中由于标注数据的不足。当数据缺乏的时候监督排序学算法的可靠性就会相应的降低。某个特定领域训练好的排序模型,在新的领域往往不能获得好的效果。并且互联网中数据更新很快,之前标注的数据随着时间的推移就无法适应当前模型的训练。针对实际应用中标注不足的问题借助迁移学习的思想对传统的排序学习方法进行改进。利用基于特征选择的迁移学习排序算法,假设源领域与目标领域存在共享的低维特征表示,以用户的多个兴趣为源领域和目标领域的共享特征,从而使目标领域达到知识迁移的目的。我们通过分析社区问答系统自身的特点可以观察到它存在许多基于用户行为的标签。结合基于特征的迁移学习方法将这些用户特征融入到特征空间,通过选取社区中具体价值的用户标签和用户行为标签对基于特征的迁移学习排序算法进行优化。例如问题回答者的擅长领域这个特征,一个问题的回答者可能会擅长多个领域(比如网球和羽毛球)在特征向量中该特征主要以布尔类型来表示,擅长为1不擅长为0。那么这个特征在羽毛球和网球类别中的布尔类型均为1,即这个特征可以作为羽毛球和网球两个不同类别共性特征来使用,从而改善了排序学习方法。通过实验的验证,证实了融入用户特征的迁移学习答案排序算法能够有效的提高答案排序的效果。
[Abstract]:With the development of Internet technology, it is becoming more and more convenient for people to acquire knowledge and solve problems. Google and others have provided a more convenient way to access information to a growing number of Internet users. Users can quickly get the information they want by entering relevant keywords in the search dialog box. But with the popularity of the Internet and the continuous enrichment of the content of the Internet, people get the answers at the same time. The personalized service based on community Q & A effectively makes up for the technical deficiency of traditional search engine and is paid more and more attention to by various Internet companies. Q & A system is a new knowledge sharing model. By submitting questions and answers, the community has accumulated a large number of Q & A questions answering airs.When users submit new questions, how to sort them to provide them with accurate answer sequences, It is an important part of community question answering system. Traditional sorting algorithms mainly use supervised learning method to construct sort model. It needs a lot of artificial marking data to train the model. At present, scholars at home and abroad have put forward a lot of supervised ranking learning methods and have been applied in real life, such as sort support vector machine. It is a typical representative of the sorting algorithm based on supervised learning, which is input into the designated learning machine through a large amount of annotated data. Then a sort model is obtained by automatic training. The method based on supervised ranking learning often requires a considerable scale of tagging data to ensure the reliability of the training model. But in the actual environment, due to the shortage of annotated data, the reliability of the supervised sorting algorithm will be reduced when the data is lacking. It often doesn't work well in new areas. And data updates quickly on the Internet. The previously annotated data can not adapt to the training of the current model with the passage of time. In order to solve the problem of insufficient tagging in practical application, the traditional sorting learning method is improved by the idea of transfer learning. A shift Learning sorting algorithm based on sign selection, Assuming that there is a shared low-dimensional feature representation between the source domain and the target domain, the shared feature of the source domain and the target domain is based on the user's multiple interests. By analyzing the characteristics of the community Q & A system, we can observe that there are many tags based on user behavior. User features are incorporated into the feature space, By selecting user tags and user behavior tags for specific values in the community, the feature-based migration learning sorting algorithm is optimized. The answer to a question may be good at more than one area (such as tennis and badminton) in a feature vector that is mainly represented as a Boolean type. Good at 1 is not good at 0. Then this feature has a Boolean type of 1 in both badminton and tennis classes, which means that this feature can be used as a common feature of two different categories of badminton and tennis. Through the experimental verification, it is proved that the migration learning answer sorting algorithm can effectively improve the result of the answer sorting.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.3
【参考文献】
相关期刊论文 前5条
1 庄福振;罗平;何清;史忠植;;迁移学习研究进展[J];软件学报;2015年01期
2 毛先领;李晓明;;问答系统研究综述[J];计算机科学与探索;2012年03期
3 田久乐;赵蔚;;基于同义词词林的词语相似度计算方法[J];吉林大学学报(信息科学版);2010年06期
4 李波;高文君;邱锡鹏;;基于语法分析和统计方法的答案排序模型[J];中文信息学报;2009年02期
5 游斓,周雅倩,黄萱菁,吴立德;基于最大熵模型的QA系统置信度评分算法[J];软件学报;2005年08期
相关博士学位论文 前2条
1 程凡;基于排序学习的信息检索模型研究[D];中国科学技术大学;2012年
2 陈德品;基于迁移学习的跨领域排序学习算法研究[D];中国科学技术大学;2010年
相关硕士学位论文 前3条
1 李yN阳;社区问答系统中融入用户标签和用户行为的列表排序方法研究[D];昆明理工大学;2016年
2 杨彬;社区问答中文问句分类的迁移学习方法研究[D];昆明理工大学;2015年
3 宗焕云;领域问答系统答案排序研究[D];昆明理工大学;2011年
,本文编号:1545928
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1545928.html