摘要
在主题深度表示学习的基础上,该文提出了一种融合双语词嵌入的主题对齐模型(topic alignment model, TAM),通过双语词嵌入扩充语义对齐词汇词典,在传统双语主题模型基础上设计辅助分布用于改进不同词分布的语义共享,以此改善跨语言和跨领域情境下的主题对齐效果;提出了2种新的指标,即双语主题相似度(bilingual topic similarity, BTS)和双语对齐相似度(bilingual alignment similarity, BAS),用于评价辅助分布对齐的效果。相比传统的对齐模型MCTA, TAM在跨语言主题对齐任务中双语对齐相似度提升了约1.5%,在跨领域主题对齐任务中F1值提升了约10%。研究结果对于改进跨语言和跨领域信息处理具有重要意义。
Deep representation learning of domain topics was used to build a topic alignment model(TAM) with integrated bilingual word embedding. The semantic alignment lexicon was extended to include bilingual word embedding. A traditional bilingual topic model was used to develop an auxiliary distribution to improve the word distribution semantic sharing to improve the topic alignments in the cross-lingual and cross-domain contexts. A bilingual topic similarity(BTS) indicator and a bilingual alignment similarity(BAS) indicator were developed to evaluate the supplementary alignment. The bilingual alignment similarity improved the cross-language topic matching by about 1.5% compared to a traditional multi-language common cultural theme analysis and improved F1 by about 10% for cross-domain topic alignment. These results can improve cross language and cross domain information processing.
作者
余传明
原赛
胡莎莎
安璐
YU Chuanming;YUAN Sai;HU Shasha;AN Lu(School of Information and Safety Engineering,Zhongnan University of Economics and Law,Wuhan 430073,China;School of Statistics and Mathematics,Zhongnan University of Economics and Law,Wuhan 430073,China;School of Information Management,Wuhan University,Wuhan 430072,China)
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2020年第5期430-439,共10页
Journal of Tsinghua University(Science and Technology)
基金
国家自然科学基金面上项目(71974202)
国家自然科学基金重大课题(71790612)。
关键词
跨语言主题对齐
跨领域主题对齐
深度学习
双语词嵌入
知识对齐
cross-lingual topic alignment
cross-domain topic alignment
deep learning
bilingual word embedding
knowledge alignment