摘要
针对分布式环境下的序列模式挖掘问题,提出了一种分布式序列模式挖掘(DSPM)算法。DSPM以PrefixSpan算法为基础,使用抽样检测技术平衡了任务负载,将挖掘任务分解后分配到多台计算机上以多进程、多线程并行执行。另外采用了伪投影技术来降低生成投影数据库的开销。实验结果表明,DSPM算法能够快速有效地挖掘分布式环境下的全局序列模式。
In order to mine sequential patterns in distributed environment, Distributed Sequential Pattern Mining (DSPM) algorithm based on prefixSpan was proposed. Sample dataset was detected to balance the workload. Mining tasks were decomposed and distributed to many other computers. Pesudo-projected techniques were used to reduce the cost and the parallel was advanced by muhithreading. The experimental results show that DSPM algorithm can mine global sequential patterns effectively and quickly.
出处
《计算机应用》
CSCD
北大核心
2008年第11期2964-2966,2974,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(60572112)
江苏省高技术重大项目资助(BG2007028)
江苏省六大人才高峰项目(07-E-025)
江苏省教育厅项目(06KJB120051)
关键词
数据挖掘
序列模式
分布式
模式增长
data mining
sequential pattern
distributed
pattern growth