期刊文献+

融合敏感词典和异构图的汉泰跨语言敏感信息识别

Chinese-Thai cross-lingual sensitive information recognition incorporating sensitive dictionary and heterogeneous graph
下载PDF
导出
摘要 通用跨语言文本分类模型识别毒品、暴力和自然灾害等敏感信息不准确,且汉泰双语敏感词表示多样化、难对齐导致不同语言信息聚合能力较弱,为此提出一种融合敏感词典和异构图的汉泰跨语言敏感信息识别方法。利用汉泰敏感词典构建具有文档对齐和词对齐的跨语言异构图结构,将文档以及所含关键词和敏感词作为节点,双语对齐、相似关系和不同词性作为边构建汉泰跨语言异构图;基于多语言预训练模型对文档节点和词节点进行表征;通过多层图卷积神经网络对输入文档进行编码,使用敏感信息分类器对文档进行分类预测。实验结果表明,所提方法准确率较基线模型提高了5.83%。 To address the problems of inaccurate recognition of sensitive information such as drugs,violence and natural disasters using general cross-lingual text classification models,and the weak ability to aggregate information in different languages due to diverse and difficult alignment of bilingual Chinese-Thai sensitive word representations,a Chinese-Thai cross-lingual sensitive information recognition method that integrated sensitive dictionaries and heterogeneous graphs was proposed.The cross-lingual heterogeneous graph structures with document alignment and word alignment to be constructed by the Chinese-Thai sensitive dictionary were used,while documents and the contained keywords and sensitive words were taken as nodes,bilingual alignment,similarity relations and different lexical properties were taken as edges to construct the Chinese-Thai cross-lingual heterogeneous graph.Document nodes and word nodes were characterized through a multilingual pre-trained model.Input documents were encoded through a multilayer graph convolutional neural network,and documents were encoded by sensitive information classifier for classification prediction.Experimental results show that the accuracy of the proposed method is improved by 5.83%compared to that of the baseline model.
作者 朱栩冉 余正涛 张勇丙 ZHU Xu-ran;YU Zheng-tao;ZHANG Yong-bing(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)
出处 《计算机工程与设计》 北大核心 2024年第7期2150-2156,共7页 Computer Engineering and Design
基金 国家自然科学基金项目(U21B2027、61972186、62266028) 云南省重大科技专项计划基金项目(202202AD080003)。
关键词 敏感词典 跨语言 异构图 图卷积神经网络 敏感信息识别 多语言预训练模型 双语对齐 sensitive dictionary cross-lingual heterogeneous graph graph convolutional neural network sensitive information identification multi-lingual pre-trained model bilingual alignment
  • 相关文献

参考文献1

二级参考文献3

共引文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部