摘要
随着深度学习的兴起与发展,越来越多的学者开始将深度学习技术应用于指代消解任务中。但现有的神经指代消解模型普遍只关注文本的线性特征,忽略了传统方法中已证明非常有效的结构信息的融入。以目前表现最佳的Lee等提出的神经网络模型为基础,借助成分句法树对上述问题进行了改进:1)提出了一种枚举句法树中以结点为短语的抽取策略,避免了暴力枚举策略所受到的长度限制与不符合句法规则的短语集噪音的引入;2)利用树的遍历得到结点序列,结合结点的高度与路径等特征,直接对成分句法树进行上下文表示并将其融入模型中,避免了只使用字、词序列而产生的结构信息缺失问题。在CoNLL 2012 Shared Task的数据集上对所提模型进行了一系列实验,实验结果显示,其中文指代消解的F 1值达到了62.35,英文指代消解的F 1值也达到了67.24,从而验证了所提结构信息融入策略能大大提升指代消解的性能。
With the rise and development of deep learning,more and more researchers begin to apply deep learning technology to coreference resolution.However,existing neural coreference resolution models only focus on the sequential information of text and ignore the integration of structural information which has been proved to be very useful in traditional methods.Based on the neural coreference model proposed by Lee et al.,which has the best performance at present,two measures to solve the problem mentioned above with the help of the constituency parse tree were proposed.Firstly,node enumeration was used to replace the original span extraction strategy.It avoids the restriction of span length and reduces the number of spans that don’t satisfy syntactic rules.Secondly,node sequences are obtained through tree traversal,and the features such as height and path are combined to generate the context representation of the constituency parse trees directly.It avoids the problem of missing structural information caused by the use of word and character sequences only.A lot of experiments were conducted on the dataset of CoNLL 2012 Shared Task,and the proposed model achieves 62.35 average F1 for Chinese and 67.24 average F1 for English,which show that the proposed structural information integration strategy can improve the performance of coreference resolution significantly.
作者
付健
孔芳
FU Jian;KONG Fang(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 251006,China)
出处
《计算机科学》
CSCD
北大核心
2020年第3期231-236,共6页
Computer Science
基金
国家自然科学基金(61876118)
人工智能应急项目(61751206)
国家重点研发计划子课题(2017YFB1002101)~~
关键词
指代消解
成分句法树
结构信息
高度特征
嵌入
Coreference resolution
Constituency parse tree
Structural information
Height features
Embedding