摘要
论辩挖掘可分为论点边界的检测、论点类型的识别、论点关系的抽取三个子任务.现有的工作大多数对子任务分别建模研究,忽略了三个子任务之间的关联信息,导致性能低下.另外,还有部分的工作采用流水线模型把三个子任务进行联合建模,由于流水线模型仍然是独立的看待每个子任务,为每个子任务训练单独的模型,存在错误传播的问题,且在训练过程中产生了冗余信息.因此,本文提出了一种基于多任务迭代学习的论辩挖掘方法.该方法将论辩挖掘三个任务并行地联合在一起学习,首先通过深度卷积神经网络(CNN)和高速神经网络(Highway Network),获得文本字符和词级别的浅层共享参数表示;然后输入双向长短时记忆循环神经网络(Bi-LSTM),利用论辩挖掘三个任务之间的关联信息进行同时训练,不仅可以避免错误传播,而且能够克服冗余信息的产生;最后,联结三个任务的Bi-LSTM网络输出作为下一次迭代的输入,来提高模型的性能.实验采用了德国UKP实验室公开的学生论文数据集,实验结果表明,与目前最好的基准方法对比,该方法的准确率指标提高了2.74%,“ F1 (100%)”和“ F1 (50%)”指标分别提高了1.05%和1.19%,很好地验证了该方法的有效性。
Argumentation mining has recently become a hot topic in the field of data mining and natural language processing. Its main task is automatic identification of argumentative structures in persuasive essays so as to help people better understand the massive text information. A persuasive essay usually consists of a series of argument components. The types of argument components are generally classified into claims or premises, and the types of relationship between argument components are commonly classified into support or attack. Argumentation mining typically contains three consecutive subtasks, i.e.,(1) Argument component boundary detection (ACBD Task), which involves separating argument component from non-argumentative text units and identifying the argument component boundaries;(2) Argument component identification (ACI Task), whose goal is to classify argument components into different types, such as claims or premises;(3) Argument component relation identification (RI Task), which aims to identify the relationship type between argument components, such as support or attack. Recently, many researchers have proposed a series of argumentation mining models and made brilliant improvement. However, most of the existing approaches mainly focus on modeling each subtask and ignore the correlation information among the three subtasks, resulting in low performance. In addition, some of the approaches utilize pipeline methods to jointly model three subtasks. The pipeline methods still consider each subtask independently, and train separated models for each subtask, which could lead to error propagation and redundant information in the training process. More specifically, the error of argument component boundary recognition module affects the following argument component classification performance. Similarly, the error of argument component classification also influences the performance of argument component relation identification. To solve these problems above, we propose a multi-task iterative learning method which a
作者
廖祥文
陈泽泽
桂林
程学旗
陈国龙
LIAO Xiang-Wen;CHEN Ze - Ze;GUI Lin;CHENG Xue - Qi;CHEN Guo-Long(College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116;Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing (Fuzhou University), Fuzhou 350116;Digital Fujian Institute of Financial Big Data, Fuzhou 350116;CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology,Chinese Academy of Sciences, Beijing 100190)
出处
《计算机学报》
EI
CSCD
北大核心
2019年第7期1524-1538,共15页
Chinese Journal of Computers
基金
国家自然科学基金项目(61772135,U1605251)
中国科学院网络数据科学与技术重点实验室开放基金课题(CASNDST201708,CASNDST201606)
可信分布式计算与服务教育部重点实验室主任基金(2017KF01)资助~~
关键词
多任务学习
论辩挖掘
迭代模型
深度学习
卷积神经网络
multi-task learning
argumentation mining
iterator model
deep learning
convolution neural network