期刊文献+

一种软/硬模板相结合的定义抽取算法 被引量:4

A Definition Extraction Algorithm Combining Hard Pattern Matching and Soft Pattern Matching
下载PDF
导出
摘要 术语定义抽取是信息抽取研究领域的重要内容之一。文中提出了一种结合硬模板匹配和软模板匹配技术的综合术语定义自动抽取方法。文中首先使用硬模板库对待抽取文本进行了初步的定义句匹配抽取。接着,通过使用基于N元语言模型的软模板匹配模型来计算待匹配文本中每个句子与软模板之间的匹配度,并通过设定匹配得分阈值来抽取定义句或过滤掉错误召回的非定义句。实验结果表明文中的术语定义抽取方法远远优于单纯的硬模板匹配或软模板匹配方法。 Definition extraction is an important topic in the field of information extraction. It proposes a definition extraction method based on both hard pattern matching and soft pattern matching. Firstly, conduct hard matching on candidate sentences and hard patterns. Secondly, n-gram based soft pattern matching model is used to get a matching score between the candidate sentence and the soft pattern. In the second step, an upper threshold is set to recall candidate sentences with a high matching score;A lower threshold is used to rule out some wrongly-recalled sentences by hard matching. The experimental results show that the proposed definition extraction method is far superior to both pure hard pattern matching and soft pattern matching method.
作者 钱菲 袁春风
出处 《计算机技术与发展》 2012年第9期32-36,共5页 Computer Technology and Development
基金 国家自然科学基金资助项目(61072152 61021062)
关键词 定义抽取 硬模板匹配 软模板匹配 N元语言模型 词类格 definition extraction hard pattern matching soft pattern matching N-gram language model word class lattice
  • 相关文献

参考文献5

二级参考文献26

  • 1邹纲,刘洋,刘群,孟遥,于浩,西野文人,亢世勇.面向Internet的中文新词语检测[J].中文信息学报,2004,18(6):1-9. 被引量:59
  • 2冯志伟.术语定义的原则和方法.中国术语网通讯,1994,. 被引量:1
  • 3刘悦耕.术语标准中的定义.自然科学术语研究,1990,. 被引量:1
  • 4黄鸿森.百科全书编纂求索[M].北京:中国大百科全书出版社,1993.. 被引量:1
  • 5Frantzi K, Ananiadou S, Mima H. Automatic recognition of multi-word terms: The C-value/NC-value method [J]. International Journal on Digital Libraries, 2000, 3(2): 115- 130 被引量:1
  • 6Justeson J, Katz S. Technical terminology: Some linguistic properties and an algorithm for identification in text [J]. Natural Language Engineering, 1995, 1(1): 9-27 被引量:1
  • 7Maynard D, Ananiadou S. Identifying terms by their family and friends [C] //Proc of the 18th Int Conf on Computational Linguistics (COLING). Morristown, N J: ACI., 2000: 530- 536 被引量:1
  • 8Wermter J, Hahn U. Paradigmatic modifiability statistics for the extraction of complex multi-word terms [C] //Proc of the 5th Human Language Technology Conference and 2005 Conf on Empirical Methods in Natural Language Processing. Morristown, NJ: ACL, 2005:843-850 被引量:1
  • 9Argamon S, Dagan I, Krymolowski Yuval. A memory-based approach to learning shallow natural language patterns [C] // Proc of the 17th COLING and the 36th Annual Meeting of ACL. Morristown, NJ: ACI., 1999: 67-73 被引量:1
  • 10Xun E, Ge S, Zhang R. Internet based Chinese term definition extraction research [C] //Proc of the 3rd Int Conf on Terminology, Standardization and Technology Transfer (TSTT'2006). Beijing: Encyclopedia of China Publishing House. 2006:382-389 被引量:1

共引文献32

同被引文献30

引证文献4

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部