摘要
分词词典是汉语信息处理系统的一个基本组成部分 ,其查询和更新效率将直接影响汉语信息处理系统的性能。本文采用PATRICIAtree的数据结构 ,设计了一种可以对词典词条进行快速查询、更新的分词词典机制 ,并从理论上初步分析了它的性能。最后通过实验 ,在时间效率上与逐字二分的分词词典机制进行了比较。结果表明 ,基于PATRICIAtree的分词词典机制具有更高的查询速度和更新效率 ,能满足大规模、开放文本处理系统的需求。
The dictionary mechanism is the basic component of Chinese informationprocessing systems,and its efficiency will greatly affect the performances of those systems.Based on the data structure of PATRICIA tree,this paper designed a new PATRICIA tree based dictionary mechanism.Firstly,the paper presents the primary function analysis of this PATRICIA tree based dictionary mechanism.Then a comparison is given between PATRICIA tree based and binary seek by characters dictionary mechanism.All the results prove that the PATRICIA tree based dictionary mechanism is better than recently used dictionary mechanisms in many aspects such as the efficiency of retireving and modifing the words and more suitable for the large scale Chinese text processing systems.
出处
《中文信息学报》
CSCD
北大核心
2001年第3期44-49,共6页
Journal of Chinese Information Processing
基金
8 6 3计划!(86 3- 30 6 -ZD0 2 - 0 2 - 7)