期刊文献+

基于改进位置成词概率的新词识别 被引量:3

A new method for Chinese new word identification based on the improved PWP
原文传递
导出
摘要 提出了一种基于改进位置成词概率的新词识别算法.该算法在位置成词概率的基础上,结合新词内部模式的特征提出了改进的位置成词概率,然后再综合互信息、邻接类别等统计量对新词进行识别.采用小说语料进行测试,实验结果表明该算法在一定程度上能有效提取新词. This paper proposes a new method for Chinese new word identification based on the improved ( position word probability, PWP). Different from the traditional PWP, the improved PWP that proposes in this paper took the pattern of a string into consideration. At the meanwhile, we also used AV and MI statistics to identify Chinese new words. Experimental results show that this method is effective in recognition of Chinese new words.
出处 《福州大学学报(自然科学版)》 CAS CSCD 北大核心 2011年第1期43-48,共6页 Journal of Fuzhou University(Natural Science Edition)
基金 福建省科技创新平台计划资助项目(2009J1007) 福建省教育厅科研资助项目(JA04161) 福建省发展改革委员会基金资助项目(SX2004-29)
关键词 汉语 新词 识别 改进位置成词概率 Chinese new words identification improved PWP
  • 相关文献

参考文献15

  • 1曾依灵,许洪波.网络热点信息发现研究[J].通信学报,2007,28(12):141-146. 被引量:29
  • 2郑家恒,李文花.基于构词法的网络新词自动识别初探[J].山西大学学报(自然科学版),2002,25(2):115-119. 被引量:56
  • 3Chien L F. PATtreebased keyword extraction for Chinese information' retrieval[C]//Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Philadelphia: [s. n.], 1997:50-58. 被引量:1
  • 4Zhang J, Gao J F, Zhou M. Extraction of Chinese compound words : an experimental study on a very large corpus [C]// ACL2000 Second Chinese Language Processing Workshop. Hong Kong: [s. n.], 2000:132-139. 被引量:1
  • 5Feng H D, Chert K, Deng X T, et al. Accessor variety criteria for Chinese word extraction[J]. Computer Linguistics, 2004, 30(1) : 75-93. 被引量:1
  • 6Wu A, Jiang Z. Statisticallyenhanced new word identification in a rulebased Chinese system[C]//Proceedings of the Second Chinese Language Processing Workshop. Hong Kong: [s. n.], 2000:46-51. 被引量:1
  • 7Chen A T. Chinese word segmentation using minimal linguistic knowledge[C]//Proceedings of the Second SIGHAN Workshop on Chinese Language Proceeding. Sapporo: [s. n.] , 2003:148-151. 被引量:1
  • 8Li H Q, Huang C N, Gao J F, et al. The use of SVM for Chinese new word identifination[C]//Processings of First International Joint Conference on Natural Language Processing. Sanya: [s. n.], 2004 : 497-504. 被引量:1
  • 9Peng F C, Feng F F, McCallum A. Chinese segmentation and new word detection using conditional random fields [C]//Proceedings of the 20th International Conference on Computational Linguistics(COLING 2004). Geneva: [s. n.], 2004:562-568. 被引量:1
  • 10黄玉兰,龚才春,许洪波,等.基于局部性原理的有意义串提取方法[C]//第四届全国信息检索与内容安全学术会议论文集(上).北京:[出版者不详],2008:56-64. 被引量:1

二级参考文献9

  • 1郑家恒 李文花.新词语自动识别方法研究.自然语言理解与机器翻译[M].北京:清华大学出版社,2001.. 被引量:1
  • 2陆志苇.现代汉语构词法(修订本)[M].北京:中华书局,1975.. 被引量:1
  • 3ZHANG H P, LIU Q, YU H K, et al. Chinese named entity recognition using role model[J]. The International Journal of Computational Linguistics and Chinese Language Processing, 2003, 8(2):29-60. 被引量:1
  • 4CHEN H H, DINGY W, TSAI S C, et al. Description of the NTU system used for MET-2[A]. Proc MUC-7[C].1997. 被引量:1
  • 5YU S H, BAI S H, WU E Description of the kent ridge digital labs system used for MUC-7[A]. Proc MUC-7[C]. 1997. 被引量:1
  • 6WU A D, JIANG Z X. Statistically-enhanced new word identification in a rule-based Chinese system[A]. The Second Chinese Language Processing Workshop[C].Hong Kong, China, 2000.46-51. 被引量:1
  • 7LI H Q, HUANG C N, GAO J F, et al.The use of SVM for Chinese new word identification[A]. First International Joint Conference on Natural Language Processing[C]. Sanya, Hainana Island, China, 2004. 497-504. 被引量:1
  • 8UKKONEN E. On-line construction of suffix trees[J]. Algorithmica, 1995, 14(3): 249-260. 被引量:1
  • 9刘挺,吴岩,王开铸.串频统计和词形匹配相结合的汉语自动分词系统[J].中文信息学报,1998,12(1):17-25. 被引量:65

共引文献83

同被引文献31

引证文献3

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部