期刊文献+

汉语分词中上文和下文重要性比较 被引量:2

Comparing of importance of above-context versus below-context for Chinese word segmentation
下载PDF
导出
摘要 上下文是统计语言学中获取语言知识和解决自然语言处理中多种实际应用问题必须依靠的资源和基础。近年来基于字的词位标注的方法极大地提高了汉语分词的性能,该方法将汉语分词转化为字的词位标注问题,当前字的词位标注需要借助于该字的上下文来确定。为克服仅凭主观经验给出猜测结果的不足,采用四词位标注集,使用条件随机场模型研究了词位标注汉语分词中上文和下文对分词性能的贡献情况,在国际汉语分词评测Bakeoff2005的PKU和MSRA两种语料上进行了封闭测试,采用分别表征上文和下文的特征模板集进行了对比实验,结果表明,下文对分词性能的贡献比上文的贡献高出13个百分点以上。 Context is the necessary resource not only for obtaining linguistic knowledge in statistical linguistics but also for solving the problem in natural language processing.The performance of Chinese word segmentation has been greatly improved by word-position-based approaches in recent years.This approach treats Chinese word segmentation as a word-position tagging problem.To tag the word-position of current character needs the help of correlative context.To overcome the lack of giving the result by the subjective experience,this paper studies the contribution of above and below for Chinese word segmentation via using four word-positions and conditional random fields.Closed evaluations are performed on PKU and MSRA corpus from the second international Chinese word segmentation Bakeoff-2005,and comparative experiments are performed on different feature templates.Experimental results show that the performance by the below-context increases 13 percentage points than by the above-context.
出处 《计算机工程与应用》 CSCD 北大核心 2011年第4期117-120,共4页 Computer Engineering and Applications
基金 高等学校博士学科点专项科研基金项目(No.20050007023) 河南省高等学校青年骨干教师项目(No.2009GGJS-108)
关键词 汉语分词 上下文 条件随机场 词位标注 特征模板 Chinese word segmentation context conditional random fields word-position tagging feature template
  • 相关文献

参考文献10

二级参考文献48

共引文献509

同被引文献16

  • 1刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量:198
  • 2姜维,王晓龙,关毅,赵健.基于多知识源的中文词法分析系统[J].计算机学报,2007,30(1):137-145. 被引量:29
  • 3黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19. 被引量:250
  • 4黄昌宁,赵海.由字构词-中文分词新方法[A].中文信息学会二十五周年学术会议论文集[C].北京:清华大学出版社,2006. 被引量:2
  • 5赵海,揭春雨.基于有效子串标注的中文分词[J].中文信息学报,2007,21(5):8-13. 被引量:26
  • 6Lafferty J,Pereira lZ,McCallum A. Conditional random fields: probabilistic models for segmenting and labeling sequence data [A]. In Proceedings of 18th International Conference on Machine Learning [C], 2001 : 282-289. 被引量:1
  • 7PERERA L J, NOCEDAL J, SCHNABEL R B.Representations of quasi-Newton matrices and their use in limited memory methods [J]. Mathematical Progra:nming, 1994,62(2) : 129-156. 被引量:1
  • 8M CCALLUM A. Efficiently inducting features of conditional random fields [C] //Proceedings of Uncertainty in Artificial Intelligence. 403-41 the N- ineteenth Conference on Houston, USA : IEEE Press, 2003 :403-41. 被引量:1
  • 9Nianwen Xue.Chinese Word Segmentation as Character Tagging[J].Computational Linguistics and Chinese Language Processing,2003,8 (1):29-48. 被引量:1
  • 10Lafferty J,Pereira F,McCallum A.Conditional random fields:probabilistic models for segmenting and labeling sequence data[A].In Proceedings of 18th International Conference on Machine Learning[C],2001:282-289. 被引量:1

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部