基于挖掘Web双语词汇关联度的无指导译文消歧
发布时间:2019-05-12 17:38
【摘要】:为缓解译文消歧任务中消歧知识获取困难及数据稀疏问题,提出了一种基于Web的挖掘双语词汇相关关系的无指导译文消歧方法。该方法将双语词汇在语料库中的间接相关拓展到Web,提出了基于Web的双语词汇间接相关模型,在此基础上又提出了一种基于Web的双语词汇相关度的消歧方法,通过构造不同queries并利用搜索引擎抽取返回页面的page counts,最后利用点式互信息来计算词汇间的相关度并用于消歧决策。该方法最好性能(P_(mar)=0.464)超过了国际语义评测Semeval-2007的Task #5上可比较的最好无指导系统TorMd。
[Abstract]:In order to alleviate the difficulty of obtaining disambiguation knowledge and sparse data in the task of target disambiguation, an unguided translation disambiguation method based on Web for mining bilingual vocabulary correlation is proposed. In this method, the indirect correlation of bilingual vocabulary in corpus is extended to Web,. An indirect correlation model of bilingual vocabulary based on Web is proposed, and on this basis, a disambiguation method of bilingual vocabulary correlation based on Web is proposed. By constructing different queries and using search engine to extract the page counts, of the returned page, finally, the correlation between words is calculated by using point mutual information and used in disambiguation decision. The best performance of this method (P _ (mar) = 0.464) exceeds the best undirected system TorMd. on Task # 5, which is used for international semantic evaluation of Semeval-2007.
【作者单位】: 北京大学信息科学与技术学院计算语言学研究所;哈尔滨工业大学计算机科学与技术学院;
【基金】:973计划(2004CB318102) 国家自然科学基金(60903063) 中国博士后科学基金(20090450007)资助项目
【分类号】:TP391.1
本文编号:2475567
[Abstract]:In order to alleviate the difficulty of obtaining disambiguation knowledge and sparse data in the task of target disambiguation, an unguided translation disambiguation method based on Web for mining bilingual vocabulary correlation is proposed. In this method, the indirect correlation of bilingual vocabulary in corpus is extended to Web,. An indirect correlation model of bilingual vocabulary based on Web is proposed, and on this basis, a disambiguation method of bilingual vocabulary correlation based on Web is proposed. By constructing different queries and using search engine to extract the page counts, of the returned page, finally, the correlation between words is calculated by using point mutual information and used in disambiguation decision. The best performance of this method (P _ (mar) = 0.464) exceeds the best undirected system TorMd. on Task # 5, which is used for international semantic evaluation of Semeval-2007.
【作者单位】: 北京大学信息科学与技术学院计算语言学研究所;哈尔滨工业大学计算机科学与技术学院;
【基金】:973计划(2004CB318102) 国家自然科学基金(60903063) 中国博士后科学基金(20090450007)资助项目
【分类号】:TP391.1
【相似文献】
相关期刊论文 前10条
1 刘鹏远;赵铁军;;基于挖掘Web双语词汇关联度的无指导译文消歧[J];高技术通讯;2010年04期
2 ;[J];;年期
3 ;[J];;年期
4 ;[J];;年期
5 ;[J];;年期
6 ;[J];;年期
7 ;[J];;年期
8 ;[J];;年期
9 ;[J];;年期
10 ;[J];;年期
相关博士学位论文 前1条
1 刘鹏远;基于知识自动获取的无指导译文消歧方法研究[D];哈尔滨工业大学;2008年
,本文编号:2475567
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2475567.html