当前位置:主页 > 科技论文 > 搜索引擎论文 >

基于迁移学习的中文问句分类方法研究

发布时间:2019-03-26 15:33
【摘要】:问答系统是新一代的搜索引擎,它可以更好地满足用户的查询要求,更精确地检索出用户所想要的答案。问句分类是问答系统的关键部分,它的分类结果直接影响答案抽取的准确率。通常问句分类模型的构建是通过标记一定规模的语料训练获得,然而,在不同领域构建问句分类模型就必须在每个领域都要标记一定样本,因此样本标记代价昂贵。由于不用领域之间可能存在一定的关联性,因此,本文利用迁移学习的思想,针对不同领域问句分类特点,研究不同领域问句分类中的特征选取、问句分类模型迁移方法。主要完成以下特色工作: 1、根据领域之间的相关性,基于领域间特征互信息构建了不同领域问句特征空间。首先选取不同问句领域中训练语料的词频较高的词以及问句中的疑问词、主谓宾等词汇,分别作为各自问句领域分类特征的特征词。其次使用互信息计算源领域特征空间的特征词与目标领域特征词之间的相关性,定义阀值,选取相关性大的特征词分别作为各自领域特征空间的特征词。最后,以词汇语义相似度方法获取各个领域的问句特征空间特征值。 2、在中文问句领域分类移植方面,提出了一种基于特征映射的问句分类迁移学习方法。该方法首先统计源领域和目标领域的公共特征词,并采用词语相似度计算挖掘领域间相似的特征词。然后改变源领域的每一个问句特征向量,使其特征词改变为目标领域共现或者相似特征词。接着使用改进的聚类算法,把源领域问句实例映射到目标领域各个类别中。最后使用支持向量机的分类算法进行分类模型的训练。在源领域为金融领域、目标领域为云南旅游领域进行了中文问句分类领迁移实验,结果表明借助源领域已标记的样本大大提高了目标领域的分类准确率。 3、设计并实现了基于特征映射迁移学习的中文问句分类原型系统。
[Abstract]:Q & A system is a new generation of search engine, it can better meet the user's query requirements, more accurate retrieval of the user's desired answers. Question classification is a key part of question answering system, and its classification results directly affect the accuracy of answer extraction. The construction of question classification model is usually obtained through the training of tagging a certain scale of corpus. However, the construction of question classification model in different fields must mark certain samples in each domain, so the sample marking is expensive. Because there may be some relevance between different domains, this paper makes use of the idea of transfer learning to study the feature selection of question classification in different domains and the transfer method of question classification model according to the characteristics of question classification in different domains. The main work is as follows: 1. According to the correlation between domains, the feature spaces of different domains are constructed based on the mutual information of inter-domain features. Firstly, the words with higher frequency of training corpus in different question fields, interrogative words, subject-predicate objects and other words in question are selected as the feature words of the classification features of each question field respectively. Secondly, we use mutual information to calculate the correlation between the feature words in the source domain and the feature words in the target domain, define the threshold, and select the feature words with high correlation as the feature words in the feature space of each domain. Finally, the lexical semantic similarity method is used to obtain the feature values of question feature space in each domain. 2. In the aspect of Chinese question domain classification transplantation, this paper proposes a learning method of question classification transfer based on feature mapping. Firstly, the common feature words in the source domain and the target domain are counted, and the similarity of words is used to calculate and mine the similar feature words between the domains. Then each question feature vector in the source domain is changed into a co-occurrence or similar feature word in the target domain. Then we use the improved clustering algorithm to map the source domain question instance to each category of the target domain. Finally, the classification algorithm of support vector machine is used to train the classification model. In the source domain is the financial domain and the target domain is the Yunnan tourism domain Chinese question classification transfer experiment is carried out. The results show that the labeled samples in the source domain greatly improve the classification accuracy of the target domain. Thirdly, the prototype system of Chinese question classification based on feature mapping transfer learning is designed and implemented.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.1

【参考文献】

相关期刊论文 前5条

1 黄发良,钟智;用于分类的支持向量机[J];广西师范学院学报(自然科学版);2004年03期

2 刘伟;张化祥;;数据集动态重构的集成迁移学习[J];计算机工程与应用;2010年12期

3 郑实福,刘挺,秦兵,李生;自动问答综述[J];中文信息学报;2002年06期

4 张宇,刘挺,文勖;基于改进贝叶斯模型的问题分类[J];中文信息学报;2005年02期

5 林昌,康泰兆;基于自组织特征映射的矢量量化方法[J];南京理工大学学报;1999年05期



本文编号:2447687

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2447687.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户7a90c***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com