摘要
中文分词是搜索引擎、机器翻译、情感分析等自然语言处理的基础,分词的准确率和效率对后续的工作有着非常大的影响。目前性能比较好的分词算法是基于统计机器学习的方法,隐马尔可夫模型能够较好地描述词与词之间的前后关系。论述模型实现中文分词的基本原理,并给出模型的Python实现。
Chinese Word Segmentation is the basis of Natural Language Processing such as search engine,machine translation,emotional analysis,etc.The accuracy and efficiency of word segmentation have a great impact on subsequent work.The current segmentation algorithm with better performance is based on statistical machine learning,Hidden Markov Model can better describe the relationship between words.Dis?cusses the basic principle of Chinese Word Segmentation based on HMM,and presents the Python implementation of the model.
作者
吴帅
潘海珍
WU Shuai;PAN Hai-zhen(School of Mathematics and Computer Science, Shangrao Normal University, Shangrao 334001)
出处
《现代计算机》
2018年第22期25-28,共4页
Modern Computer
关键词
隐马尔可夫模型
中文分词
分词算法
PYTHON
Hidden Markov Model
Chinese Word Segmentation
Word Segmentation Algorithm
Python