摘要
本文就制定《信息处理用现代汉语词类标记集规范》阐述我们对于规范问题的看法和做法。规范不是强制的 ,只规范加工结果 ,不规范加工过程。《规范》研制的目的在于为中文信息处理研究提供一套可以作为国家规范的现代汉语词类标记集体系 ,以便各个中文信息处理系统能够使用规范统一的词类标记集。这个《规范》试图解决词类标记的统一问题 ,该《规范》的特点是 :继承性 ,单功能性 ,通用性和可扩充性。本文还主要讨论了关于研制规范的一些原则性问题和小类标记问题 ,最后给出基于《规范》的词性标注在真实语料中的覆盖实验数据。
This paper presents our comments on POS tag standardization and its methods. The standardization is by no means compulsory; it represents simply the output of processing and not the procedure. The main purpose for the standardization is to provide a POS tag as a norm for Chinese language processing, so that all the Chinese language processing can be normalized within this system. The characteristics for this standardization can be concluded as continuity, mono-functionality, generality and extensibility.The paper also discusses the problems of principle-setting and sub-categorization, and provides the experimental data of the coverage of the standardization-based POS tagging in corpus.
出处
《语言文字应用》
CSSCI
北大核心
2003年第4期16-24,共9页
Applied Linguistics
基金
国家语委"十五"重大项目<现代汉语语料库的建设及深加工>(项目编号:ZDA105-44)
863计划的"智能化中文信息处理平台"课题(编号2001AA114040)
973的"中文语料库建设"课题(编号G199803051A-05)资助。