摘要
判断一对文章之间的关系是一项很重要的自然语言处理任务,在新闻系统和搜索引擎等实际服务中有着广泛的应用。然而,相比在信息检索场景中去匹配一对句子或者匹配一个查询-文档对而言,长文章通常具有丰富的语义信息和复杂的逻辑结构,这也使得长文章之间的匹配成为一个相对独立且很有挑战的任务。本文围绕长文章匹配的难点,提出了基于图分类框架的长文本匹配算法,通过将长文本匹配任务等价的转化为图分类任务,使用图表示学习的范式来求解,从而获得长文本匹配的结果。算法模型主要包括基于图表示学习来实现对长文本的建模,基于图注意力神经网络的图节点特征提取,以及图池化等步骤。在两个大型公开数据集上的训练和测试实验结果表明:本文提出的算法可以实现高质量的文本匹配,同时各项评价指标均达到了目前最先进的结果。
Identifying the relationship between two documents is an important task in natural language processing area,which has been popularly applied to Internet services such as news recommendation systems or search engines.However,compared with sentence matching or query-doc matching in information retrieval,document matching is more challenging and independent since documents always contained rich semantic information and complicated structure.This paper tries to focus on the challenges to document matching and propose a matching pipeline based on graph classification by transferring the matching task equally to a graph classification problem.The pipeline mainly includes modeling documents pairs based on graph representation learning,graph attention neural network based node feature extraction and graph pooling.Two public datasets are used to verify the performance of our proposed methods,and the results achieve the state-of-the-art.
作者
郭佳乐
卜巍
邬向前
GUO Jiale;BU Wei;WU Xiangqian(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China;School of Media Technology and Art,Harbin Institute of Technology,Harbin 150001,China)
出处
《智能计算机与应用》
2020年第6期294-299,共6页
Intelligent Computer and Applications
基金
国家自然科学基金(61672194)
国家重点研究与发展计划(2018YFC0832304)
中国黑龙江省杰出青年科学基金(JC2018021)
国家机器人与系统国家重点实验室项目(SKLRS-2019-KF-14)
中兴通讯产学研合作论坛合作项目。
关键词
自然语言处理
文本匹配
图注意力神经网络
图池化
Natural Language Processing
Text Matching
Graph Attention Network
Graph Pooling