摘要
提出了一种基于改进位置成词概率的新词识别算法.该算法在位置成词概率的基础上,结合新词内部模式的特征提出了改进的位置成词概率,然后再综合互信息、邻接类别等统计量对新词进行识别.采用小说语料进行测试,实验结果表明该算法在一定程度上能有效提取新词.
This paper proposes a new method for Chinese new word identification based on the improved ( position word probability, PWP). Different from the traditional PWP, the improved PWP that proposes in this paper took the pattern of a string into consideration. At the meanwhile, we also used AV and MI statistics to identify Chinese new words. Experimental results show that this method is effective in recognition of Chinese new words.
出处
《福州大学学报(自然科学版)》
CAS
CSCD
北大核心
2011年第1期43-48,共6页
Journal of Fuzhou University(Natural Science Edition)
基金
福建省科技创新平台计划资助项目(2009J1007)
福建省教育厅科研资助项目(JA04161)
福建省发展改革委员会基金资助项目(SX2004-29)