期刊文献+

以词为本的编码方案的探讨

Encoding Scheme Based on Words
下载PDF
导出
摘要 语言是人进行思维的主要工具,词是语言处理的基本单位。在计算机信息处理中,目前是按字设计编码。随着计算机信息处理技术的发展,这种完全按字编码的不足也日益显示出来。从信息处理的基本需求以及词的基本特性出发,提出字词综合考虑且以词为本的统一编码方案。该方案以现行的主要编码标准UTF-16为基础,维持现有的字编码,增加词编码;词编码以包括一定语义信息及语义关系的概念空间树进行逻辑组织,以适应聚类检索及语种间代码转换的原则进行空间组织。最后指出了需要进一步深入研究的几个疑难问题。 Language is the main tool of thinking. Words are the basic unit of language. Howev- er, character encoding is the present encoding method in computer information processing. With in-depth development of computer information processing, the disadvantages of character encoding increasingly appear. From the basic needs of information processing and the basic characteristics of the words, an unified encoding scheme on comprehensive consideration of word-character, and word-oriented is proposed. The scheme based on the existing coding standard UTF-16, maintains the existing character encoding, adds words coding; words encoding are logical organized with the concept space tree including some semantic information and semantic relationship, adapting to clus-ter retrieval and language code convert between two languages are the principles of spatial organiza-tion. At last, points out several problems which need further study.
作者 程元斌
出处 《江汉大学学报(自然科学版)》 2013年第2期47-52,共6页 Journal of Jianghan University:Natural Science Edition
关键词 词编码 UTF-16 聚类检索 概念空间树 自然语言处理 words encoding UTF-16 cluster retrieval concept space tree natural languageprocessing
  • 相关文献

参考文献5

二级参考文献20

共引文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部