摘要
随着人类知识体系的不断拓展和深化,很多组合词(多个词或语素组成的词)被创造出来用于表达新的概念。由于无法及时把组合词收录进词库,分词系统无法识别它们。为此,从文本中提取组合词成为智能计算领域的一个热门的研究方向。借鉴人类的认知心理模式,提出一种基于词序列频率有向网的组合词抽取算法,以识别自由文本中的组合词。算法首先建立描述文本中的词序列出现频率的有向网,然后通过独特的矩阵运算,逐步把组合词提取出来。算法的优点是无须借助专业的语言知识,在实验分析中,算法显示了较好的效果。
Inspired by one of human being' s cognition patterns, this paper proposed a new detection algorithm based on directed net of word-sequence frequency to discover combined-words from free texts. The algorithm created a directed net which implicated frequency of any word-sequence in a certain text first, then extracted combined-words from the directed net along with our special matrix operations. The algorithm is free from any linguistics knowledge. In the experiment analysis, the algorithm achieved very good result.
出处
《计算机应用研究》
CSCD
北大核心
2009年第10期3746-3749,共4页
Application Research of Computers
基金
广东省自然科学基金资助项目(07006474)
广东省科技攻关资助项目(2007B010200044)
关键词
有向图
组合词
词序列
认知心理模式
directed net
combined-word
word-sequence
cognition pattern