摘要
提出一种融合多模型和高置信度词典的事件线索识别方法,将高置信度词典特征分别加入最大熵模型和条件随机场模型,然后融合两个模型的结果,旨在提高触发词识别的召回率和整体性能。针对事件真伪性识别任务,进一步考察否定词或不确定词与触发词的物理位置距离和依存路径距离等特征,提高事件真伪性识别的性能。实验结果显示,针对触发词识别和事件真伪性识别任务,与仅使用最大熵模型相比,所提出的融合多模型与高置信度词典的方法能够提高触发词识别的性能6.43%,提高事件真伪性识别的性能1.69%。
This paper proposes a method that combines multiple models and high-confidence dictionary for eventnugget detection.This method introduces dictionary features into maximum entropy model and conditional randomfields model respectively,then combines the results of two models.In addition,the lexical length and the length ofthe dependency path between the trigger and negation or speculation in event realis recognition are considered toimprove the accuracy of event realis detection.Compared to the method based on maximum entropy model,theexperiment results show that proposed method can get6.43%gain of F1in event nugget recognition and1.69%gain of F1in event realis recognition.
作者
陈亚东
洪宇
王潇斌
杨雪蓉
姚建民
朱巧明
CHEN Yadong;HONG Yu;WANG Xiaobin;YANG Xuerong;YAO Jianmin;ZHU Qiaoming(Provincial Key Laboratory of Computer Information Processing Technology, Soochow University, Suzhou 215006)
出处
《北京大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2017年第3期412-420,共9页
Acta Scientiarum Naturalium Universitatis Pekinensis
基金
国家自然科学基金(61373097
61272259
61272260)资助
关键词
事件线索检测
最大熵模型
条件随机模型
高置信度词典
event nugget detection
Maximum Entropy
Conditional Random Fields
high-confidence dictionary