摘要
随着互联网技术的快速发展与普及,尤其是Web2.0时代的来临,网络数据量与日俱增.对新增数据进行挖掘成为数据挖掘领域研究的热点之一.基于存在全局站点的分布式数据挖掘思想,提出了一种存在全局站点的分布式增量挖掘算法.首先对局部站点的增量数据进行全局挖掘,有效减少了局部站点对原有数据的扫描次数;然后在全局站点采用新的剪枝策略,极大地降低了产生的候选集数量;最后通过实例验证了所给算法切实可行,并具有较好的挖掘效率.
With the rapid growth and widespread of Internet techniques,especially the coming of web2.0,the amount of network data has increased sharply.The mining of increment data has become a hot spot of data mining area.Based on distributed data mining with global site,this paper puts forward an incremental mining algorithm for distributed database with global site(IMADG).Firstly,IMADG applies global mining to the data of local site and reduces the scan times to the original local data.Secondly,IMADG requires far less candidate item sets by using new pruning strategy on global site.Finally,IMADG is verified effective through an example.
出处
《辽宁大学学报(自然科学版)》
CAS
2013年第1期41-47,共7页
Journal of Liaoning University:Natural Sciences Edition
基金
教育部人文社会科学研究青年基金(12YJCZH048)
辽宁‘百千万人才工程’培养经费资助
关键词
增量挖掘
分布式数据库
全局站点
剪枝策略
incremental mining
distributed database
global site
pruning strategy