期刊文献+

基于词序列频率有向网的中文组合词提取算法 被引量:6

Chinese combined-word detection based on directed net of word-sequence frequency
下载PDF
导出
摘要 随着人类知识体系的不断拓展和深化,很多组合词(多个词或语素组成的词)被创造出来用于表达新的概念。由于无法及时把组合词收录进词库,分词系统无法识别它们。为此,从文本中提取组合词成为智能计算领域的一个热门的研究方向。借鉴人类的认知心理模式,提出一种基于词序列频率有向网的组合词抽取算法,以识别自由文本中的组合词。算法首先建立描述文本中的词序列出现频率的有向网,然后通过独特的矩阵运算,逐步把组合词提取出来。算法的优点是无须借助专业的语言知识,在实验分析中,算法显示了较好的效果。 Inspired by one of human being' s cognition patterns, this paper proposed a new detection algorithm based on directed net of word-sequence frequency to discover combined-words from free texts. The algorithm created a directed net which implicated frequency of any word-sequence in a certain text first, then extracted combined-words from the directed net along with our special matrix operations. The algorithm is free from any linguistics knowledge. In the experiment analysis, the algorithm achieved very good result.
出处 《计算机应用研究》 CSCD 北大核心 2009年第10期3746-3749,共4页 Application Research of Computers
基金 广东省自然科学基金资助项目(07006474) 广东省科技攻关资助项目(2007B010200044)
关键词 有向图 组合词 词序列 认知心理模式 directed net combined-word word-sequence cognition pattern
  • 相关文献

参考文献10

  • 1余蕾,曹存根.基于Web语料的概念获取系统的研究与实现[J].计算机科学,2007,34(2):161-165. 被引量:6
  • 2张春霞..领域文本知识获取方法研究及其在考古领域中的应用[D].中国科学院计算技术研究所,2005:
  • 3罗贝,吴洁,曹存根,邵志清.从文本中获取植物知识方法的研究[J].计算机科学,2005,32(10):6-13. 被引量:13
  • 4刘磊,曹存根,王海涛,陈威.一种基于“是一个”模式的下位概念获取方法[J].计算机科学,2006,33(9):146-151. 被引量:18
  • 5WU Z. LDC Chinese segmenter[ EB/OL]. (1999). http://www, lde. upenn, edu/Projects/Chinese/segmenter/mansegment, perl. 被引量:1
  • 6TEAHAN W J,WEN Y, McNAB R,et al. A compression-based algorithm for chinese word segmentation [ J ]. Computational Linguistics, 2000,26 ( 3 ) : 375- 393. 被引量:1
  • 7GAO Jian-feng, LI Mu, HUANG Chang-ning. Improved source-channel models for chinese word segmentation [ C ]//Proc of the 41 st Annual Meeting of Association of Computaional Linguistics (ACL). Morristown, NJ : Assoeiation for Computational Linguisties ,2003 : 272 - 279. 被引量:1
  • 8XUE N. Chinese word segmentation as character tagging[ J]. International Journal of Computational Linguistics and Chinese Language Processing ,2003,8 ( ! ) :29-48. 被引量:1
  • 9ZHANG H P, LIU Q, CHENG X Q, et al. Chinese lexical analysis using hierarchical hidden Markov Model [ C ]//Proc of the 2nd SIGHAN Workshop. Morristown, NJ: Assoeiation for Computational Linguisties ,2003:63-70. 被引量:1
  • 10PENG F, FENG Fang-fang, McCALLUM A,et al. Chinese segmentation and new word detection using conditional random fields [ C ]// Proc of the 20th International Conference on Computational Linguistics. Morristown, NJ : Association for Computational Linguisties, 2004 : 562-566. 被引量:1

二级参考文献32

  • 1张春霞,郝天永.汉语自动分词的研究现状与困难[J].系统仿真学报,2005,17(1):138-143. 被引量:60
  • 2罗贝,吴洁,曹存根,邵志清.从文本中获取植物知识方法的研究[J].计算机科学,2005,32(10):6-13. 被引量:13
  • 3刘磊,曹存根,王海涛,陈威.一种基于“是一个”模式的下位概念获取方法[J].计算机科学,2006,33(9):146-151. 被引量:18
  • 4郑家恒 杜永萍 宋礼鹏.农业病虫害词汇获取方法初探[A]..第七届全国计算语言学联合学术会议论文集(JSCL-2003)[C].北京:清华大学出版社,2003.. 被引量:3
  • 5Miller G.WordNet:An On-line Lexical Database.International Journal of Lexicography,1990,3(4) 被引量:1
  • 6Beeferman D.Lexical discovery with an enriched semantic network.In:Proceedings of the Workshop on Applications of Word-Net in Natural Language Processing Systems,ACL/COLING,1998 被引量:1
  • 7Richardson S D,Dolan W B,Vandervende L.Mindnet:acquiring and structuring semantic information from text.In:Proc.of COL-ING-ACL'98,1998.1098~1102 被引量:1
  • 8Cao Cungen,Shi Qiuyan.Acquiring Chinese Historical Knowledge from Encyclopedic Texts.In:Proceedings of the International Conference for Young Computer Scientists,2001.1194~1198 被引量:1
  • 9Dolan W,Vanderwende L,Richardson S D.Automatically Deriving Structured Knowledge Bases From On-Line Dictionaries.In:Proceedings of the Pacific Association for Computational Linguistics.Vancouver,British Columbia,1993.5~14 被引量:1
  • 10Shinzato K,Torisawa K.Acquiring hyponymy relations from web documents.In:Proceedings of HLT-NAACL 2004.73~80 被引量:1

共引文献31

同被引文献47

引证文献6

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部