摘要
针对传统基于转换的词性标注方法中规则学习速度过慢的问题提出了一种对训练语料库进行动态划分的算法。该算法根据规则之间的冲突和依赖关系对训练语料库进行动态划分,减小了搜索空间。在保证拉丁蒙文词性标注正确率的前提下提高了规则学习速度。经过10000拉丁蒙文句子语料库的对比测试,发现该方法在规则学习中所花费的时间仅为原方法的32%。
To solve the problem of rule learning time cost for traditional transformation based part of speech tagging method of Latin Mongolian, a dynamic partition algorithm was presented. It used rule conflict and rule dependency to dynamically partition the training corpus, reduced the searching space and increased the rule learning speed. In an open test of a Latin Mongolian corpus with 10000 sentences, the time that new algorithm cost was only 32% of the old one.
出处
《计算机应用》
CSCD
北大核心
2007年第4期963-965,共3页
journal of Computer Applications
基金
中国科学院知识创新工程重要方向项目资助(KGCX2-SW-511)
关键词
词性标注
转换
规则冲突
规则依赖
动态划分
part of speech tagging
transformation
rule conflict
rule dependency
dynamic partition