大型数据库中的高效序列模式增量式更新算法被引量：10

An Efficient Incremental Updating Algorithm for DiscoveringSequential Patterns in Large Database

下载PDF

导出

摘要　提出一种称为FIMS(fastincrementalminingofsequentialpatterns)的序列模式增量式更新算法,处理因数据库的更新而引起的序列模式的维护问题.主要思想是利用原先的序列模式挖掘结果,通过建立一个投影数据库来减少对整个数据库的扫描次数和候选序列的生成,从而提高挖掘的效率.实验结果显示在更新数据量远小于整个数据库的大小时,FIMS算法的性能优于GSP算法4～7倍. An incremental updating technique for discovering sequential patterns called FIMS (fast incremental mining of sequential patterns) is proposed in order to deal with the maintenance of discovered sequential patterns resulted from the updating of database. The main idea is to utilize the results acquired during an earlier mining process to cut down on the cost of finding new sequential patterns in the updated database. Firstly, scan the whole database which is composed of the original database and the incremental database twice and construct a projected database from the whole database. Then, mine the projected database to get all the new candidate sequential patterns. lastly, scan the whole database once to get all the new sequential patterns. Since the algorithm FIMS only needs to scan the whole database three times in all and the projected database is much smaller than the whole database, the scan of the database and the growth of candidate sequences are greatly reduced. As a result, the efficiency of mining is improved. Our experiments show that the algorithm FIMS is greatly outperforming the algorithm GSP by a factor of 4 to 7 when the amount of the updated data is only a small portion of the whole database.

作者邹翔张巍蔡庆生王清毅

机构地区中国科技大学计算机系

出处《南京大学学报（自然科学版）》 CAS CSCD 北大核心 2003年第2期165-171,共7页 Journal of Nanjing University（Natural Science）

基金国家自然科学基金(70171052 60075015)

关键词数据库增量式更新算法数据挖掘序列模式扫描次数侯选序列 data mining, sequential pattern, incremental updating

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论] TP18 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

参考文献15

1Agrawal R, Srikant R. Mining sequential patterns. Proceedings of the International Conference on Data Engineering. IEEE Computer Society, 1995: 3-14. 被引量：1
2Agrawal R, Srikant R. Mining sequential patterns: Generalizations and performance improvements.Proceeding of the International Conference on Extending Database Technology. New York: Springer-Verlag, 1996: 3-17. 被引量：1
3Bettini C, Sean Wang X, Jajodia S. Mining temporal relationships with multiple granularities in time sequences. Data Engineering Bulletin, 1998, 21: 32-38. 被引量：1
4Ozden B, Ramaswamy S, Silberschatz A. Cyclic association rules. Proceedings of the International Conference on Data Engineering. IEEE Press, 1998: 412-421. 被引量：1
5Garofalakis M, Rastogi R, Shim K. Spirit: Sequential pattern mining with regular expression constraints.Proceedings of the International Conference on Very Large DataBases. San Franciso: Morgan Kaufmann Publishers Inc, 1999: 223-234. 被引量：1
6Han J, Pei J, Mortazavi-Asl B, et al. Freespan: Frequent pattern-projected sequential pattern mining.Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM, 2000:355-359. 被引量：1
7Han J, Pei J, Mortazavi-Asl B, et al. PrefixSpan: Mining sequential patterns effieiently by prefix-projected pattern growth. Proceedings of the International Conference on Data Engineering. IEEE Press,2001 : 215-226. 被引量：1
8Cheung D W, Han J, Ng V T, et al. Maintenance of discovered association rules: An incremental update technique. Proceedings of the 12th International Conference on Data Engineering. IEEE Press, 1996:106-114. 被引量：1
9Cheung D W, Lee S D, Kao B. A general incremental technique for maintaining discovered associationrules. Proceedings of the Fifth International Conference on Database Systems for Advanced Applications.Singapore: World Scientific Publishing, 1997: 185-194. 被引量：1
10Wang K. Discovering patterns from large and dynamic sequential data. Journal of Intelligent Information System, 1997: 8-33. 被引量：1

二级参考文献3

1Han J，Proceedings of the 21st VLDB Conference，1995年被引量：1
2Cheung D W，Proc Int Conf Data Engineering，1996年，106页被引量：1
3欧阳为民,蔡庆生.在数据库中自动发现广义序贯模式[J].软件学报,1997,8(11):864-870. 被引量：12

共引文献33

1李天瑞,潘无名,杨宁,徐扬.序列模式的性质研究[J].复旦学报（自然科学版）,2004,43(5):758-760. 被引量：1
2刘月波,陆阶平,刘同明.基于CTID序列模式的一种改进算法[J].微机发展,2005,15(3):20-22. 被引量：1
3周斌,周键,高洪奎,吴泉源.数据挖掘技术在用户操作模式抽取中的应用[J].计算机科学,2000,27(9):29-31.
4陈军,李向军,孟志青.一种多维时态关联规则的挖掘算法[J].西安文理学院学报（自然科学版）,2005,8(3):59-62. 被引量：1
5赵峰,李庆华,赵彦斌.一种基于贝叶斯方法的序列模式挖掘算法[J].计算机工程,2006,32(14):17-19.
6任家东,周晓磊.一种挖掘序列模式的增量式更新算法[J].燕山大学学报,2007,31(6):476-480. 被引量：1
7胡学钢,张圆圆.一种序列模式发现的新方法[J].计算机应用研究,2008,25(4):1003-1005. 被引量：1
8程录庆,张智光.车辆路由问题中序列扩展法的一种改进算法[J].计算机与现代化,2008(11):122-125.
9周斌,刘亚萍,吴泉源.一个面向电子商务的数据挖掘系统的设计与实现[J].计算机工程,2000,26(6):18-20. 被引量：13
10杨学兵,高俊波,蔡庆生.可增量更新的关联规则挖掘算法[J].小型微型计算机系统,2000,21(6):611-613. 被引量：3

同被引文献70

1王创新.关联规则提取中对Apriori算法的一种改进[J].计算机工程与应用,2004,40(34):183-185. 被引量：32
2陈凯,冯全源.基于矩阵伪投影策略的频繁项集挖掘方法[J].微计算机信息,2005,21(11X):85-87. 被引量：8
3沈国强,覃征.一种新的多维关联规则挖掘算法[J].小型微型计算机系统,2006,27(2):291-294. 被引量：18
4朱玉全,陈耿,杨鹤标.正负关联规则挖掘算法研究[J].计算机科学,2006,33(3):188-190. 被引量：10
5彭佳红.一种新的多层关联规则算法[J].计算机工程,2006,32(9):70-71. 被引量：2
6朱孝宇,王理冬,汪光阳.一种改进的Apriori挖掘关联规则算法[J].计算机技术与发展,2006,16(12):89-90. 被引量：11
7张健沛,杨悦,刘卓.一种新的关联规则增量式挖掘算法[J].计算机工程,2006,32(23):43-44. 被引量：6
8Han J, Kamber M. Data Mining: Concepts and Techniques[M]. Beijing: Higher Education Press, 2001. 被引量：1
9[1]Agrawal R,Imielinski T,Swami A.Mining Association Rules between Sets of Items in Large D-atabases[M]//Washington,D.C:Proceedings of the ACM SIGMOD Conference on Management of Data,1993:207-216. 被引量：1
10[2]Hart J,Fu Y.Discovery of multiple-level assooation rules from large databases[J].IEEE Transactions onKnowledge and Data Engineering,1999,11(5):798-805. 被引量：1

引证文献10

1邹翔,张巍,肖明军,蔡庆生.分布式环境下的序列模式发现研究[J].复旦学报（自然科学版）,2004,43(5):737-741. 被引量：1
2杨洪志.村官申利洲[J].中国大学生就业,2005(12):25-26. 被引量：1
3邹翔,张巍,刘洋,蔡庆生.分布式序列模式发现算法的研究[J].软件学报,2005,16(7):1262-1269. 被引量：19
4郭志勇,杨炳儒,王璐.抽样技术在序列模式增量更新中的应用[J].微计算机信息,2006,22(08X):4-6. 被引量：2
5张嘉赢,刘井莲,赵卫绩.一种基于半布尔矩阵的混合维关联规则算法[J].沈阳大学学报,2008,20(2):19-21. 被引量：3
6吴少莹,乔梅,楼佳.一种新的多维关联挖掘智能方法[J].天津理工大学学报,2008,24(4):78-81. 被引量：2
7赵建松,陈在平.基于SQL的Apriori改进算法研究[J].天津理工大学学报,2009,25(2):48-51. 被引量：1
8朱玉,张虹,孔令东.基于免疫遗传算法的多维多层关联规则挖掘[J].计算机工程,2009,35(23):181-183. 被引量：6
9付仲良,陈楠.一种序列模式增量式挖掘算法[J].武汉大学学报（信息科学版）,2010,35(7):763-767. 被引量：1
10刘君强,王勋,孙晓莹.多维多层关联规则有效挖掘的新算法[J].南京大学学报（自然科学版）,2003,39(2):205-210. 被引量：9

二级引证文献44

1刘德喜,何炎祥,邢显黎.基于下钻操作的多层关联规则挖掘算法研究[J].三峡大学学报（自然科学版）,2006,28(2):169-173.
2刘德喜,邢显黎,孙南海.关联规则的上探研究[J].襄樊学院学报,2006,27(5):54-58.
3张长海,胡孔法,陈凌.序列模式挖掘算法综述[J].扬州大学学报（自然科学版）,2007,10(1):41-46. 被引量：5
4宋卫林,徐惠民.基于最大频繁项目序列集挖掘DMFIA算法的改进[J].计算机工程与设计,2007,28(7):1493-1496. 被引量：1
5龚振志,胡孔法,达庆利,张长海.DMGSP:一种快速分布式全局序列模式挖掘算法[J].东南大学学报（自然科学版）,2007,37(4):574-579. 被引量：2
6王红侠,胡学钢.基于分布式概念格的序列模式挖掘[J].合肥学院学报（自然科学版）,2007,17(4):35-40.
7胡孔法,张长海,陈崚,宋爱波,达庆利.分布式环境下全局序列模式挖掘技术研究[J].计算机集成制造系统,2007,13(11):2229-2235. 被引量：2
8王结臣,李永全,钱晨晖.GIS中弧段数据结构的扩展与应用[J].南京大学学报（自然科学版）,2008,44(1):77-84.
9邓铁军,杨庆祥.基于事务标识符序列的频繁集发现方法[J].安阳工学院学报,2008,7(2):48-51.
10吴楠.通过增量聚类预处理分区的一种序列模式挖掘方法[J].宿州学院学报,2008,23(2):102-103.

1刘秉毅.FIMS：一个面向文本的DBMS[J].计算机杂志,1993,21(2):6-10. 被引量：2
2刘立军,崔杰,梅红岩.GSP与PrefixSpan算法的比较与分析[J].辽宁工学院学报,2006,26(5):300-302. 被引量：4
3司应硕,杨世平.一种基于改进的AprioriAll算法的Web路径模式挖掘[J].广西师范大学学报（自然科学版）,2007,25(4):172-175. 被引量：4
4刘秉毅.面向文本数据库管理系统FIMS授权服务器的设计[J].计算机工程与设计,1994,15(1):48-57.
5赵峰,李庆华,赵彦斌.网络入侵检测中序列模式挖掘技术研究[J].计算机科学,2004,31(3):75-79. 被引量：2
6王子卿,樊楠.基于GSP算法的Web用户访问序列模式挖掘[J].电脑知识与技术,2015,11(10X):217-218.
7邹翔,张巍,肖明军,蔡庆生.分布式环境下的序列模式发现研究[J].复旦学报（自然科学版）,2004,43(5):737-741. 被引量：1
8李斌,韩坤.混合型入侵检测引擎技术及其应用[J].宁波职业技术学院学报,2008,12(5):44-47.
9刘秉毅.面向文本数据库管理系统FIMS的授权需求及模型[J].光子学报,1994,23(5):476-482.
10刘秉毅.面向文本数据库管理系统FIMS的文本索引及检索[J].软件,1994,15(3):20-25. 被引量：1

南京大学学报（自然科学版）

2003年第2期

浏览历史

内容加载中请稍等...

大型数据库中的高效序列模式增量式更新算法被引量：10

参考文献15

二级参考文献3

共引文献33

同被引文献70

引证文献10

二级引证文献44

相关作者

相关机构

相关主题

浏览历史

大型数据库中的高效序列模式增量式更新算法 被引量：10

参考文献15

二级参考文献3

共引文献33

同被引文献70

引证文献10

二级引证文献44

相关作者

相关机构

相关主题

浏览历史

大型数据库中的高效序列模式增量式更新算法被引量：10