期刊文献+

一种基于粗糙集的大规模语料库语言学知识发现模型 被引量:1

A Model for Linguistic Knowledge Discovery from Large-Scale Corpuses Based on Rough Set Techniques
下载PDF
导出
摘要 文中首先通过语言学特征表来对文本信息进行结构化处理 ,同时实现了对远距离约束的表示 ;然后借助于面向个体的数据泛化算法来去除语言学特征表中的冗余信息 ,并利用规则抽取算法过滤特征表中不一致的部分 ,从而为相应的自然语言处理任务建立了一个一致、高效的规则库。最后 ,本文研究了模型在汉语词义排歧以及音字转换中的应用 ,在采用了动态规则平滑算法后 ,分别获得了 0 .93和 0 .95的判别精度以及 0 .92和 0 .89的覆盖率 。 In the paper, a linguistic feature table (LFT) is first provided to structurize textural information and to represent long-distance constraints. Then, the redundant information in the LFT is wiped off by a kind of object-oriented data generalization algorithm, inconsistent objects are filtered through the rule extraction algorithm and a consistent and efficient rule base is constructed for the NLP application. At last, the applications in Chinese word sense disambiguation and Chinese pinyin-to-character conversion are presented. In the case of introducing a dynamic rule smoothing algorithm, our experiment achieves 0.93 and 0.95 of decision precisions and 0.92 and 0.89 of rule recall rates with respect to these two applications, which shows the good performance of the model.
出处 《计算机工程与科学》 CSCD 2004年第5期56-61,共6页 Computer Engineering & Science
基金 国家自然科学基金资助项目 ( 60 175 0 2 0 )
关键词 浯言学知识发现 粗糙集 自动排歧 汉语音字转换 音字转换 动态规则平滑算法 linguistic knowledge discovery rough set disambiguation Chinese pinyin-to-character conversion
  • 相关文献

参考文献1

二级参考文献2

共引文献6

同被引文献8

  • 1Peter F Brown, Vincent J Della Pietra, Peter V deSouza, et al. Class based n-gram models of natural language[J]. Computational Linguistics, 1992,8(4) :467-479. 被引量:1
  • 2Zhou GuoDong, Lua KimTeng. Interpolation of n-gram and mutual information based trigger pair language models for mandarin speech recognition[J]. Computer Speech and Language, 1999,13 (2), 125-141. 被引量:1
  • 3Lawrence R Rabiner. A tutorial on hidden narkov models and selected applications in speech recognition [J]. Proceedings of the IEEE, 1989, 77(2), 257-286. 被引量:1
  • 4Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data [C]. In: Proc. 18th International Conf. on Machine Learning, 2001. 被引量:1
  • 5Andrew McCallum, Wei Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons [C]. In: Seventh Conference on Natural Language Learning (CoNLL), 2003. 被引量:1
  • 6Rosenfeld R. A maximum entropy approach to adaptive statistical language modeling[D]. Carnegie Mellon University, 1994. 被引量:1
  • 7Dong Zhen-dong, Dong Qiang. HowNet [EB/OL]. http:// www. how-net. com. 被引量:1
  • 8刘秉权,王晓龙,王宇颖.一种多知识源汉语语言模型的研究与实现[J].计算机研究与发展,2002,39(2):231-235. 被引量:8

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部