摘要
词典是汉语机械分词的一个重要组成部分,分词词典机制的优劣直接影响到中文分词的速度和效率。在分析了几种典型的词典机制优缺点的基础上,提出一种基于memcached的动态四字双向词典机制。该词典机制有效减少了文章分词过程中对词典的访问次数,同时具有维护方便及快速添加和删除临时词等优点,适合在Web上采用双向最大匹配算法进行中文分词。
Dictionary is an important component part of Chinese mechanical word segmentation. The dictionary mechanism in- fluences the speed and efficiency of word segmentation significantly. This paper analyzed the merit and demerit of some typical dictionary mechanisms, and then it provided a new dictionary mechanism named dynamic four-character bidirectional dictiona- ry mechanism based on memcached. In this dictionary mechanism, it reduced the mean frequency of visiting dictionary effectively. Also it has the feature of adding or deleting entries, and is suitable for Chinese word segmentation in bidirectional maximal matching algorithm on the Web.
出处
《计算机应用研究》
CSCD
北大核心
2011年第1期152-154,158,共4页
Application Research of Computers
基金
国家自然科学基金资助项目(60705015)
关键词
MEMCACHED
动态双向四字词典
中文分词
memcached
dynamic four-character bidirectional dictionary
Chinese word segmentation