摘要
文中首先通过语言学特征表来对文本信息进行结构化处理 ,同时实现了对远距离约束的表示 ;然后借助于面向个体的数据泛化算法来去除语言学特征表中的冗余信息 ,并利用规则抽取算法过滤特征表中不一致的部分 ,从而为相应的自然语言处理任务建立了一个一致、高效的规则库。最后 ,本文研究了模型在汉语词义排歧以及音字转换中的应用 ,在采用了动态规则平滑算法后 ,分别获得了 0 .93和 0 .95的判别精度以及 0 .92和 0 .89的覆盖率 。
In the paper, a linguistic feature table (LFT) is first provided to structurize textural information and to represent long-distance constraints. Then, the redundant information in the LFT is wiped off by a kind of object-oriented data generalization algorithm, inconsistent objects are filtered through the rule extraction algorithm and a consistent and efficient rule base is constructed for the NLP application. At last, the applications in Chinese word sense disambiguation and Chinese pinyin-to-character conversion are presented. In the case of introducing a dynamic rule smoothing algorithm, our experiment achieves 0.93 and 0.95 of decision precisions and 0.92 and 0.89 of rule recall rates with respect to these two applications, which shows the good performance of the model.
出处
《计算机工程与科学》
CSCD
2004年第5期56-61,共6页
Computer Engineering & Science
基金
国家自然科学基金资助项目 ( 60 175 0 2 0 )