军事命名实体(Military Named Entities,MNEs)内部嵌套关系复杂、语法区分不明显,从而影响实体识别效果,针对这一问题,提出了一种小粒度策略下基于条件随机场(Conditional Random Fields,CRFs)的MNEs识别方法。运用小粒度策略,结合手工...军事命名实体(Military Named Entities,MNEs)内部嵌套关系复杂、语法区分不明显,从而影响实体识别效果,针对这一问题,提出了一种小粒度策略下基于条件随机场(Conditional Random Fields,CRFs)的MNEs识别方法。运用小粒度策略,结合手工构建的MNEs标注语料进行建模,采用CRFs模型识别出不可再分的小粒度MNEs,再通过对小粒度MNEs进行组合得到完整的MNEs。最后,通过实验对该方法进行了验证,结果表明:在作战文书语料的开放测试中,MNEs识别的召回率达到72%以上,准确率达到85%以上。展开更多
The research on named entity recognition for label-few domain is becoming increasingly important.In this paper,a novel algorithm,positive unlabeled named entity recognition(PUNER)with multi-granularity language inform...The research on named entity recognition for label-few domain is becoming increasingly important.In this paper,a novel algorithm,positive unlabeled named entity recognition(PUNER)with multi-granularity language information,is proposed,which combines positive unlabeled(PU)learning and deep learning to obtain the multi-granularity language information from a few labeled in-stances and many unlabeled instances to recognize named entities.First,PUNER selects reliable negative instances from unlabeled datasets,uses positive instances and a corresponding number of negative instances to train the PU learning classifier,and iterates continuously to label all unlabeled instances.Second,a neural network-based architecture to implement the PU learning classifier is used,and comprehensive text semantics through multi-granular language information are obtained,which helps the classifier correctly recognize named entities.Performance tests of the PUNER are carried out on three multilingual NER datasets,which are CoNLL2003,CoNLL 2002 and SIGHAN Bakeoff 2006.Experimental results demonstrate the effectiveness of the proposed PUNER.展开更多
文摘军事命名实体(Military Named Entities,MNEs)内部嵌套关系复杂、语法区分不明显,从而影响实体识别效果,针对这一问题,提出了一种小粒度策略下基于条件随机场(Conditional Random Fields,CRFs)的MNEs识别方法。运用小粒度策略,结合手工构建的MNEs标注语料进行建模,采用CRFs模型识别出不可再分的小粒度MNEs,再通过对小粒度MNEs进行组合得到完整的MNEs。最后,通过实验对该方法进行了验证,结果表明:在作战文书语料的开放测试中,MNEs识别的召回率达到72%以上,准确率达到85%以上。
基金the National Natural Science Foundation of China(No.61876144)the Strategy Priority Research Program of Chinese Acade-my of Sciences(No.XDC02070600).
文摘The research on named entity recognition for label-few domain is becoming increasingly important.In this paper,a novel algorithm,positive unlabeled named entity recognition(PUNER)with multi-granularity language information,is proposed,which combines positive unlabeled(PU)learning and deep learning to obtain the multi-granularity language information from a few labeled in-stances and many unlabeled instances to recognize named entities.First,PUNER selects reliable negative instances from unlabeled datasets,uses positive instances and a corresponding number of negative instances to train the PU learning classifier,and iterates continuously to label all unlabeled instances.Second,a neural network-based architecture to implement the PU learning classifier is used,and comprehensive text semantics through multi-granular language information are obtained,which helps the classifier correctly recognize named entities.Performance tests of the PUNER are carried out on three multilingual NER datasets,which are CoNLL2003,CoNLL 2002 and SIGHAN Bakeoff 2006.Experimental results demonstrate the effectiveness of the proposed PUNER.