期刊文献+

结合全局特征的命名实体属性值抽取 被引量:5

Extracting Attribute Values for Named Entities Based on Global Feature
下载PDF
导出
摘要 关注非结构化文本中命名实体属性值的抽取问题.当前主流有监督属性值抽取方法仅使用局部特征,抽取效果有限,开展了利用文本全局特征改善属性值抽取的研究.通过适用于中文属性值抽取的全局特征,用局部特征以外的有价值信息提高抽取效果.据此,提出结合全局特征的感知机学习算法,该算法能够方便地融合文本全局特征,并将全局特征和局部特征统一结合到模型学习过程中,使模型具有更好的特征表示能力.实验结果表明,所提出方法的整体抽取效果高于仅使用局部特征的CRF模型和平均感知机模型.该方法适用于开放领域的属性值获取,具有较好的泛化能力. Attribute-value extraction is an important and challenging task in information extraction,which aims to automatically discover the values of attributes of named entities.In this paper,we focus on extracting these values from Chinese unstructured text.In order to make models easy to compute,current major methods of attribute-value extraction use only local feature.As a result,it may not make full use of global information related to attribute values.We propose a novel approach based on global feature to enhance the performance of attribute-value extraction.Two types of global feature are defined to capture the extra information beyond local feature,which are boundary distribution feature and value-name dependency feature.To our knowledge,this is the first attempt to acquire attribute values utilizing global feature.Then a new perceptron algorithm is proposed that can use all types of global feature.The proposed algorithm can learn the parameters of local feature and global feature simultaneously.Experiments are carried out on different kinds of attributes of some entity categories.Experimental results show that both precision and recall of our proposed approach are significantly higher than CRF model and averaged perceptron with only local feature.The proposed approach has a good generalization capability on open-domain.
出处 《计算机研究与发展》 EI CSCD 北大核心 2016年第4期941-948,共8页 Journal of Computer Research and Development
基金 国家"九七三"重点基础研究发展计划基金项目(2012CB316303 2014CB340401) 国家自然科学基金重点项目(61232010) 国家科技支撑计划基金项目(2012BAH39B02)~~
关键词 实体属性 属性值抽取 命名实体 全局特征 平均感知机 entity attribute attribute-value extraction named entity global feature averaged perceptron
  • 相关文献

参考文献20

  • 1Kopliku A, Bougbanem M, Pinel-Sauvagnat K. Towards a framework for attribute retrieval [C]//Proc of CIKM 2011. New York: ACM, 2011:515-524. 被引量:1
  • 2Takahashi T. Computation of semantic equivalence for question answering [D]. Nara, Japan: Nara Institute of Science and Technology, 2005. 被引量:1
  • 3Li F, Han C, Huang M, et al. Structure~aware review mining and summarization [C]//Proe of Coling 2010. Stroudsburg, PA= ACL, 2010:653-661. 被引量:1
  • 4卢汉,曹存根,王石.基于元性质的数量型属性值自动提取系统的实现[J].计算机研究与发展,2010,47(10):1741-1748. 被引量:4
  • 5Probst K, Ghani R, Krema M, et al. Semi-supervised learning of attribute-value pairs from product descriptions [C] //Proe of IJCAI 2007. San Francisco, CA: Morgan Kaufmann, 2007: 2838-2843. 被引量:1
  • 6Huang R, Riloff E. Classifying message board posts with an extracted lexicon of patient attributes [C] //Proc of EMNLP 2013. Stroudsburg, PA: ACL, 2013: 1557-1562. 被引量:1
  • 7叶正,林鸿飞,苏绥,刘菁菁.基于支持向量机的人物属性抽取[J].计算机研究与发展,2007,44(z2):271-275. 被引量:11
  • 8Davidov D, Rappoport A. Extraction and approximation of numerical attributes from the Web [C]//Proc of ACL 2010. Stroudsburg, PA: ACL, 2010:1308-1317. 被引量:1
  • 9Putthividhya D P, Hu J. Bootstrapped named entity recognition for product attribute extraction [C]//Proc of EMNLP 2011. Stroudsburg, PA: ACL, 2011:1557-1567. 被引量:1
  • 10Pasca M, Van Durme 13. What you seek is what you get: Extraction of class attributes from query logs [C]//Proc of IJCAI 2007. San Francisco, CA: Morgan Kaufmann, 2007:2832-2837. 被引量:1

二级参考文献21

  • 1车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2):1-6. 被引量:115
  • 2李向阳,戴江山,张亚非.一种Web信息抽取规则的优化方法[J].兰州理工大学学报,2006,32(1):90-93. 被引量:3
  • 3[1]R Gaizauskas,Y Wilks.Information extraction:Beyond document retrieval.Journal of Documentation,1998,54(1):70-105 被引量:1
  • 4[2]C Aone,M Ramos-Santacruz.Rees:A large-scale relation and event extraction system.The 6th Applied Natural Language Processing Conference,Washington,USA,2000 被引量:1
  • 5[4]S Soderland.Learning information extraction rules for semi-structured and free text.Machine Learning,1999,34(1-3):233-272 被引量:1
  • 6[5]D Zelenko,C Aone,A Richardella.Kernel methods for relation extraction.Journal of Machine Learning Research,2003,3:1083-1106 被引量:1
  • 7[6]D Freitag.Machine learning for information extraction in informal domains:[Ph D dissertation].Pittsburghers,USA:Carnegie Mellon University,1998 被引量:1
  • 8[7]Sergey Brin.Extracting patterns and relations from the World Wide Web.In:Lecture Notes in Computer Science 1590,Berlin:Srpinger,1998.172-183 被引量:1
  • 9[8]T Hasegawa,S Sekine,R Grishman.Discovering relations among named entities for large corpora.Association for Computational Linguistics(ACL-2004),Barcelona,Spain,2004 被引量:1
  • 10[9]N Cristianini,J Shawe-Taylor.An introduction to support vector machines.Cambridge:Cambridge University Press,2000 被引量:1

共引文献13

同被引文献25

引证文献5

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部