期刊文献+

基于Tri-training与噪声过滤的弱监督关系抽取 被引量:2

Weakly Supervised Relation Extraction Based on Tri-training and Noise Filtering
下载PDF
导出
摘要 弱监督关系抽取利用已有关系实体对从文本集中自动获取训练数据,有效解决了训练数据不足的问题。针对弱监督训练数据存在噪声、特征不足和不平衡,导致关系抽取性能不高的问题,文中提出NF-Tri-training(Tritraining with Noise Filtering)弱监督关系抽取算法。它利用欠采样解决样本不平衡问题,基于Tri-training从未标注数据中迭代学习新的样本,提高分类器的泛化能力,采用数据编辑技术识别并移除初始训练数据和每次迭代产生的错标样本。在互动百科采集数据集上实验结果表明NF-Tri-training算法能够有效提升关系分类器的性能。 Weakly supervised relation extraction utilizes entity pairs to obtain training data from texts automatically,which can effectively deal with the problem of inadequate training data.However,there are many problems in the weakly supervised training data such as noise,inadequate features,and imbalance samples,leading to low performance of relation extraction.In this paper,a weakly supervised relation extraction algorithm named NF-Tri-training(Tri-training with Noise Filtering)is proposed.NF-Tri-training employs an under-sampling approach to solve the problem of imbalance samples,learns new samples iteratively from unlabeled data and uses a data editing technique to identify and discard possible mislabeled samples both in initial training data and in new samples generating at each iteration.The experiment on dataset of Hudong encyclopedia indicates the proposed method can improve the performance of relation classifiers.
作者 贾真 冶忠林 尹红风 何大可 JIA Zhen YE Zhonglin YIN Hongfeng HE Dake(School of Information and Science Technology, Southwest Jiaotong University, Chengdu, Sichuan 610031, China DOCOMO Innovations Inc. ,Palo Alto 94304, USA)
出处 《中文信息学报》 CSCD 北大核心 2016年第4期142-149,158,共9页 Journal of Chinese Information Processing
基金 国家自然科学基金(61170111,61202043,61262058)
关键词 关系抽取 弱监督学习 TRI-TRAINING 数据编辑 relation extraction weakly supervised learning Tri-training data editing
  • 相关文献

参考文献6

二级参考文献116

  • 1叶正,林鸿飞,苏绥,刘菁菁.基于支持向量机的人物属性抽取[J].计算机研究与发展,2007,44(z2):271-275. 被引量:11
  • 2车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2):1-6. 被引量:116
  • 3董静,孙乐,冯元勇,黄瑞红.中文实体关系抽取中的特征选择研究[J].中文信息学报,2007,21(4):80-85. 被引量:55
  • 4Pang B.,Lee L.,Vaithyanathan S.Thumbs up?:Sentiment Classification using Machine LearningTechniques[C] //Proceedings of EMNLP.2002. 被引量:1
  • 5Blitzer J.,Dredze M.,Pereira F.Biographies.Bollywood,Boom-boxes and Blenders:DomainAdaptation for Sentiment Classification[C] //Proceedings of ACL.2007. 被引量:1
  • 6Li S.,Huang C.,Zhou G.,et al.EmployingPersonal/Impersonal Views in Supervised and Semi-supervised Sentiment Classification[C] //Proceedingsof ACL.2010. 被引量:1
  • 7Barandela R.,Sánchez J.S.,García V.,et al.Strategiesfor Learning in Class Imbalance Problems[J].PatternRecognition,2003,36:849-851. 被引量:1
  • 8Kubat M.,Matwin S.Addressing the Curse ofImbalanced Training Sets:One-Sided Selection[C] //Proceedings of ICML.1997. 被引量:1
  • 9Chawla N.,Bowyer K.,Hall L.,et al.SMOTE:Synthetic Minority Over-Sampling Technique[J].Journal of Artificial Intelligence Research,2002,16:321-357. 被引量:1
  • 10Juszczak P.,Duin R.Uncertainty Sampling Methodsfor One-Class Classifiers[C] //Proceedings of ICML,Workshop on Learning with Imbalanced Data Sets II.2003. 被引量:1

共引文献95

同被引文献13

引证文献2

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部