摘要
The research on named entity recognition for label-few domain is becoming increasingly important.In this paper,a novel algorithm,positive unlabeled named entity recognition(PUNER)with multi-granularity language information,is proposed,which combines positive unlabeled(PU)learning and deep learning to obtain the multi-granularity language information from a few labeled in-stances and many unlabeled instances to recognize named entities.First,PUNER selects reliable negative instances from unlabeled datasets,uses positive instances and a corresponding number of negative instances to train the PU learning classifier,and iterates continuously to label all unlabeled instances.Second,a neural network-based architecture to implement the PU learning classifier is used,and comprehensive text semantics through multi-granular language information are obtained,which helps the classifier correctly recognize named entities.Performance tests of the PUNER are carried out on three multilingual NER datasets,which are CoNLL2003,CoNLL 2002 and SIGHAN Bakeoff 2006.Experimental results demonstrate the effectiveness of the proposed PUNER.
作者
Ouyang Xiaoye
Chen Shudong
Wang Rong
欧阳小叶;Chen Shudong;Wang Rong(Institute of Microelectronics,Chinese Academy of Sciences,Beijing 100029,P.R.China;University of Chinese Academy of Sciences,Beijing 100049,P.R.China;Key Laboratory of Space Object Measurement Department,Beijing Institute of Tracking and Telecommunications Technology,Beijing 100094,P.R.China)
基金
the National Natural Science Foundation of China(No.61876144)
the Strategy Priority Research Program of Chinese Acade-my of Sciences(No.XDC02070600).