摘要
提出一种称为FIMS(fastincrementalminingofsequentialpatterns)的序列模式增量式更新算法,处理因数据库的更新而引起的序列模式的维护问题.主要思想是利用原先的序列模式挖掘结果,通过建立一个投影数据库来减少对整个数据库的扫描次数和候选序列的生成,从而提高挖掘的效率.实验结果显示在更新数据量远小于整个数据库的大小时,FIMS算法的性能优于GSP算法4~7倍.
An incremental updating technique for discovering sequential patterns called FIMS (fast incremental mining of sequential patterns) is proposed in order to deal with the maintenance of discovered sequential patterns resulted from the updating of database. The main idea is to utilize the results acquired during an earlier mining process to cut down on the cost of finding new sequential patterns in the updated database. Firstly, scan the whole database which is composed of the original database and the incremental database twice and construct a projected database from the whole database. Then, mine the projected database to get all the new candidate sequential patterns. lastly, scan the whole database once to get all the new sequential patterns. Since the algorithm FIMS only needs to scan the whole database three times in all and the projected database is much smaller than the whole database, the scan of the database and the growth of candidate sequences are greatly reduced. As a result, the efficiency of mining is improved. Our experiments show that the algorithm FIMS is greatly outperforming the algorithm GSP by a factor of 4 to 7 when the amount of the updated data is only a small portion of the whole database.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2003年第2期165-171,共7页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金(70171052
60075015)