摘要
汉语词典机制很大程度上影响中文分词的效率。为了提高现有基于词典的分词机制的查询效率,在双哈希词典机制和整词二分法相结合的基础上提出了一种有效的中文分词词典机制——双哈希编码分词词典机制。首字采用散列表保存,剩余字符逐个拼接计算其编码后放入余词散列表中,并加入状态值来减少匹配次数。实验结果表明该分词机制节省了内存空间和提高了匹配速度,方便词典更新与维护。
Chinese dictionary mechanism has a great impact on Chinese word segmentation efficiency. In order to improve the existing word query efficiency,a new dictionary mechanism named double hash encode is proposed based on double-hash and whole-word dichotomy dictionary mechanism. It uses hash table to save first character,and the remaining characters are saved in another hash table after calculating the encode. What's more,this mechanism by joining the status value reduces the number of matching.The experimental results show that the word segmentation mechanism provided in this paper saves the memory space and improves the matching speed effectively. and it is convenient to update maintenance dictionary.
出处
《信息技术》
2016年第11期152-156,共5页
Information Technology
关键词
中文分词
词典机制
双哈希
Chinese word segmentation
dictionary mechanism
double hash