基于图分类的中文长文本匹配算法

Chinese document matching based on graph classification

下载PDF

导出

摘要判断一对文章之间的关系是一项很重要的自然语言处理任务,在新闻系统和搜索引擎等实际服务中有着广泛的应用。然而,相比在信息检索场景中去匹配一对句子或者匹配一个查询-文档对而言,长文章通常具有丰富的语义信息和复杂的逻辑结构,这也使得长文章之间的匹配成为一个相对独立且很有挑战的任务。本文围绕长文章匹配的难点,提出了基于图分类框架的长文本匹配算法,通过将长文本匹配任务等价的转化为图分类任务,使用图表示学习的范式来求解,从而获得长文本匹配的结果。算法模型主要包括基于图表示学习来实现对长文本的建模,基于图注意力神经网络的图节点特征提取,以及图池化等步骤。在两个大型公开数据集上的训练和测试实验结果表明:本文提出的算法可以实现高质量的文本匹配,同时各项评价指标均达到了目前最先进的结果。 Identifying the relationship between two documents is an important task in natural language processing area,which has been popularly applied to Internet services such as news recommendation systems or search engines.However,compared with sentence matching or query-doc matching in information retrieval,document matching is more challenging and independent since documents always contained rich semantic information and complicated structure.This paper tries to focus on the challenges to document matching and propose a matching pipeline based on graph classification by transferring the matching task equally to a graph classification problem.The pipeline mainly includes modeling documents pairs based on graph representation learning,graph attention neural network based node feature extraction and graph pooling.Two public datasets are used to verify the performance of our proposed methods,and the results achieve the state-of-the-art.

作者郭佳乐卜巍邬向前 GUO Jiale;BU Wei;WU Xiangqian(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China;School of Media Technology and Art,Harbin Institute of Technology,Harbin 150001,China)

机构地区哈尔滨工业大学计算机科学与技术学院哈尔滨工业大学媒体技术与艺术学院

出处《智能计算机与应用》 2020年第6期294-299,共6页 Intelligent Computer and Applications

基金国家自然科学基金(61672194) 国家重点研究与发展计划(2018YFC0832304) 中国黑龙江省杰出青年科学基金(JC2018021) 国家机器人与系统国家重点实验室项目(SKLRS-2019-KF-14) 中兴通讯产学研合作论坛合作项目。

关键词自然语言处理文本匹配图注意力神经网络图池化 Natural Language Processing Text Matching Graph Attention Network Graph Pooling

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1李咏翰,周雄俊.智慧教学数据的需求识别与应用思考[J].现代教育技术,2020,30(9):28-34. 被引量：9
2冯圆圆.狗不理、厉家菜等老字号被吐槽背后[J].财经天下,2020(20):20-21.
3吴芳华,张羿祺,王磊,马超.海底地形图表达方法探讨[J].海洋测绘,2020,40(5):58-62. 被引量：1

智能计算机与应用

2020年第6期

浏览历史

内容加载中请稍等...

基于图分类的中文长文本匹配算法

相关作者

相关机构

相关主题

浏览历史