摘要
针对网络课程答疑系统提出了一种新的分词词典和查询算法,借鉴了现有三类分词算法的优点,克服了它们的不足,所设计的分词词典包括专业词典和基础词典两部分,所设计的算法在分词词典中搜索时,先搜索基础词典,后搜索专业词典,如果在基础词典中搜索出单词,则不继续搜索专业词典,该算法大大降低了算法的时间复杂度。本文将分词词典设计成由首字和次字构成的二维索引矩阵,和全部词语的有序顺序表组成,将单字的内码作为其在矩阵中的下标,对有序顺序表采用顺序查找,减少了词典搜索次数。
This paper put forward a new word dictionary and query algorithms for network courses an- swering system, learn from the existing three types of sub - word segmentation algorithm of the advan- tages and overcome their shortcomings, the sub - word dictionary consists of two parts which are pro- fessional dictionary and the basic dictionary, the designed algorithm in the word dictionary search, searches firstly for the basic dictionary, searches secondly for the professional dictionaries. If finds words in the basic dictionary, the algorithm do not continue to search for specialized dictionaries, the algorithm greatly reduces the time complexity. In addition, word dictionary is designed by two - di- mensional index of words and word matrix, and a table of all the ordered sequence of words, word within the code as a subscript in the matrix, the ordered sequence table using a sequential search which could reduce the number of dictionary search.
出处
《河北工程大学学报(自然科学版)》
CAS
2012年第2期68-70,共3页
Journal of Hebei University of Engineering:Natural Science Edition
关键词
自然语言处理
答疑系统
分词
网络课程
natural language processing
question - answering system
word segmentation
web - based course