摘要
数据挖掘领域的一个活跃分支就是序列模式的发现,即在序列数据库中找出所有的频繁子序列。介绍序列模式挖掘的基本概念,然后对序列模式中的经典算法PrefixSpan算法和基于PrefixSpan框架的闭合序列模式CloSpan算法进行了描述,并对它们的执行过程及其特点进行了分析与比较,总结了各自的优缺点,指出PrefixSpan算法适用于短序列方面挖掘,而CloSpan算法在长序列或者阈值较低时胜过PrefixSpan算法且CloSpan算法挖掘大型的数据库有更好的性能,得出的结果对序列模式挖掘的设计有重要的参考价值。
An active research in data mining area is the discovery of sequential patterns,which finds all frequent sub - sequences in a sequence database. Firstly introduces the basic concept of sequential pattern mining, then describes PrefixSpan algorithm and CloSpan which is based on PrefixSpan framework algorithm. The execution process and features of the sequencial mining classic algorithms were finally compared and analysed each other. It shows that PrefixSpan algorithm adapts to mine short sequences,but CloSpan outperforms PrefixSpan when the minimum support is low and sequence is long , furthermore CloSpan has better performance when minning longer frequent sequneces in a large data set. The result gained can be of important value as reference to the design of sequence mining.
出处
《计算机技术与发展》
2008年第1期70-73,76,共5页
Computer Technology and Development