基于数据流的序列模式挖掘算法

Mining Frequent Sequence from Data Stream

下载PDF

导出

摘要为了实现对数据流的序列模式挖掘,提出了基于数据流的序列模式挖掘算法MFSDS-1和MFSDS-2,它们均通过调整入选度的大小来调整保存信息的粒度.算法MFSDS-2利用分层存储结构,不仅能更好地保存序列信息,而且可以通过与全局序列模式的对比得到当前活动的一些异常序列模式.实验结果表明,基于分层存储的算法MFSDS-2的效率比算法MSFDS-1高. Although sequence pattern mining has been deeply studied, it is a challenge to extend to data streams. In the paper, algorithm MFSDS-1 and MFSDS-2 are presented for mining sequential patterns from data stream. Both of the algorithms use the storage of elected sequences which are accommodated by the elected rate. In algorithm MFSDS-2 a new structure based on levels is proposed, which not only can store the sequence well, but also can be used to find the abnormal sequence. The experiments results show that MFSDS- 2 is more efficient than MFSDS-1.

作者俞单庆吉根林

机构地区南京师范大学数学与计算机科学学院

出处《江南大学学报（自然科学版）》 CAS 2007年第6期763-768,共6页 Joural of Jiangnan University (Natural Science Edition)　

基金江苏省自然科学基金项目(BK2005135)

关键词序列模式挖掘频繁序列数据流 sequence pattern mining frequent sequenee data stream

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1Agrawal R, Srikant R. Mining sequential patterns[C]// YU P S. CHEN A. Proceedings of the 11th international conference on data engineering (ICDE'95). Taipei: IEEE Computer Society Press, 1995: 3-14. 被引量：1
2Bonfield J K, Staden R. ZTR: a new format for DNA sequence trace data[J]. Bioinformatics, 2002, 18(1) : 3-10. 被引量：1
3Srikant R, Agrawal R. Mining sequential patterns: generalizations and performance improvements[C]// Apers P, Bouzeghoub M, Gardarin G. Proceedings of The 5th International Conference on Extending Database Technology (EDBT '96). Avignon: Springer Verlag, 1996: 3-17. 被引量：1
4Zaki M J. SPADE: an efficient algorithm for mining frequent sequences [J]. Machine Learning, 2001, 42(1) :31-60. 被引量：1
5Pei J, Han J W, Mortazavi B, et al. PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth [C]// Reuter A. Proceedings of The 17th International Conference on Data Engineering (ICDE'01) . Heidelberg: IEEE Computer Society Press, 2001: 215-224. 被引量：1
6金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量：161
7Muthukrishnan S. Data streams algorithms and applications[C]// Littleton R. Proceedings of The 14th Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia: Society for Industrial and Applied Mathematics, 2003: 413-413. 被引量：1
8Manku G, Motwani R. Approximate frequency counts over data streams[C]// Bernstein A. Proceeding of The 28th Conference on Very Large Data Bases (VLDB'02). Hong Kong: Morgan Kaufmann Publishers, 2002:346-357. 被引量：1
9Giannella J, Han J, Pei J. Mining frequent patterns in data streams at multiple time granularities[C]//Kargupta H, Joshi A, Sivakumar K, et al. Next Generation Data Mining. Cambridge: MIT Press, 2003:191-212. 被引量：1
10IBM Almaden Research Center. Quest data mining project [EB/OL]. (1996-03-12) [2007-05-26]. http://www. almaden. ibm. com/cs/quest/syndata. html. 被引量：1

二级参考文献52

1Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data streams. In: Popa L, ed. Proc. of the 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems. Madison: ACM Press, 2002. 1～16. 被引量：1
2Terry D, Goldberg D, Nichols D, Oki B. Continuous queries over append-only databases. SIGMOD Record, 1992,21(2):321-330. 被引量：1
3Avnur R, Hellerstein J. Eddies: Continuously adaptive query processing. In: Chen W, Naughton JF, Bernstein PA, eds. Proc. of the 2000 ACM SIGMOD Int'l Conf. on Management of Data. Dallas: ACM Press, 2000. 261～272. 被引量：1
4Hellerstein J, Franklin M, Chandrasekaran S, Deshpande A, Hildrum K, Madden S, Raman V, Shah MA. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 2000,23(2):7-18. 被引量：1
5Carney D, Cetinternel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik S. Monitoring streams?A new class of DBMS applications. Technical Report, CS-02-01, Providence: Department of Computer Science, Brown University, 2002. 被引量：1
6Guha S, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. In: Blum A, ed. The 41st Annual Symp. on Foundations of Computer Science, FOCS 2000. Redondo Beach: IEEE Computer Society, 2000. 359-366. 被引量：1
7Domingos P, Hulten G. Mining high-speed data streams. In: Ramakrishnan R, Stolfo S, Pregibon D, eds. Proc. of the 6th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Boston: ACM Press, 2000. 71-80. 被引量：1
8Domingos P, Hulten G, Spencer L. Mining time-changing data streams. In: Provost F, Srikant R, eds. Proc. of the 7th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. San Francisco: ACM Press, 2001. 97～106. 被引量：1
9Zhou A, Cai Z, Wei L, Qian W. M-Kernel merging: Towards density estimation over data streams. In: Cha SK, Yoshikawa M, eds. The 8th Int'l Conf. on Database Systems for Advanced Applications (DASFAA 2003). Kyoto: IEEE Computer Society, 2003. 285～292. 被引量：1
10Gibbons PB, Matias Y. Synopsis data structures for massive data sets. In: Tarjan RE, Warnow T, eds. Proc. of the 10th Annual ACM-SIAM Symp. on Discrete Algorithms. Baltimore: ACM/SIAM, 1999. 909-910. 被引量：1

共引文献160

1田李,王乐,贾焰,邹鹏,李爱平.分布式数据流上低通信开销的连续极值查询方法研究[J].计算机研究与发展,2007,44(z3):61-66.
2陈飞波,钱卫宁,周傲英.基于最窄平行四边形的数据流突变检测算法[J].计算机研究与发展,2007,44(z3):505-510.
3何月梅,杜海艳,王保民.分形技术与矢量量化相结合的网络流量异常检测研究[J].邯郸学院学报,2009,19(3):73-76.
4秦林新,刘奇志.一种乱序数据流上的偏倚抽样算法[J].计算机研究与发展,2011,48(S3):298-303.
5张明明,芦琳.电能计量中的异常数据研究[J].电气应用,2013,0(S1):42-46. 被引量：2
6金澈清,崇志宏,周傲英.一种实时监控最近邻的近似算法[J].计算机科学与探索,2007,1(2):146-159.
7杨宜东,孙志挥,张净.基于核密度估计的分布数据流离群点检测[J].计算机研究与发展,2005,42(9):1498-1504. 被引量：9
8杜威,邹先霞.基于数据流的滑动窗口机制的研究[J].计算机工程与设计,2005,26(11):2922-2924. 被引量：11
9刘赏,黄亚楼,倪维健.流数据聚类模型变化检测策略[J].计算机工程与应用,2006,42(5):15-18.
10彭宏,刘洋,邓维维,郑启伦.股票数据流的相关性计算方法[J].华南理工大学学报（自然科学版）,2006,34(1):86-89. 被引量：9

1张毓芬.数据采集器同计算机的一种接口方法[J].声学与电子工程,1998(4):42-45.
2胡孔法,张长海,陈崚,宋爱波,达庆利.分布式环境下全局序列模式挖掘技术研究[J].计算机集成制造系统,2007,13(11):2229-2235. 被引量：2
3声音的回忆[J].通信技术,2005(2):13-13.
4满都呼,宋展.基于分层存储理论模型的近似字符串匹配并行算法研究[J].集成技术,2016,5(1):33-43.
5龚振志,胡孔法,达庆利,张长海.DMGSP:一种快速分布式全局序列模式挖掘算法[J].东南大学学报（自然科学版）,2007,37(4):574-579. 被引量：2
6徐春辉.DS-28型定时机构测试仪的研制[J].华东交通大学学报,1997,14(2):10-15.
7吴勇华,张宗山.高清影像清晰再现——海康威视DS-2CD892PF网络摄像机实测[J].A&S（安防工程商）,2008(8):28-29.
8彭宇,仲雪洁,王少军.基于FPGA线性方程组的存储优化设计[J].计算机工程,2013,39(4):287-290. 被引量：3
9徐嘉丽,刘玉杰,朱连章.石油勘探开发中的数据场可视化技术应用研究[J].计算机仿真,2005,22(8):190-193. 被引量：4
10黄仁贵.留得身前身后两重影信马由缰即合一3款宽动态摄像机评测专辑——海康威视（HIKVISION）120dB超宽动态范围兼顾各类智能应用DS-2CD4012FVVD-（A）130万日夜型枪型超宽动态网络摄像机[J].A&S（安防工程商）,2013(10):75-77.

江南大学学报（自然科学版）

2007年第6期

浏览历史

内容加载中请稍等...

基于数据流的序列模式挖掘算法

参考文献10

二级参考文献52

共引文献160

相关作者

相关机构

相关主题

浏览历史