期刊文献+

基于高斯函数的衰减因子设置方法研究 被引量:4

A Method to Set Decay Factor Based on Gaussian Function
下载PDF
导出
摘要 数据流是随着时间顺序快速变化的和连续的,其包含的知识会随着时间的改变而不同.在一些数据流应用中,通常认为最新的数据具有最大的价值.因此,会采用时间衰减模型来挖掘数据流中的频繁模式.已有的衰减因子设计方式通常具有随机性,使得到的结果集具有不稳定性;或仅考虑算法的高查全率或查准率,而忽略了算法对应的高查准率或查全率.为了平衡算法的高查全率和高查准率同时保证结果集的稳定性,设计了均值衰减因子设置方式.为了更进一步地增加最新事务的权重、减少历史事务的权重,设计了采用高斯函数设置高斯衰减因子的方式.为了比较不同衰减因子设计方式的优劣,研究并设计了4种方式的时间衰减模型,并采用这4种模型挖掘数据流闭合频繁模式.通过对高密度和低密度数据流分别进行频繁挖掘的实验结果分析可以得出,采用均值衰减因子设置方式可以平衡高查全率和高查准率;采用高斯衰减因子设置方式与其他方法相比,可以得到更优的算法性能. Data stream is a continuous and time changed sequence of data elements,and contained information is different over time.In some data stream applications,the information embedded in the data arriving in the new recent time period is of particular value.Therefore,time decay model(TDM)is used for mining frequent patterns on data stream.Existing methods to design time decay factor have the characteristics of randomness,so the result set is unsteady.Or,the methods just consider 100%recall or 100% precision of the algorithm,while they ignore the corresponding high precision or recall.In order to balance high recall and high precision of the algorithm and ensure the stability of the result set,a novel way to set average decay factor is designed.To further increase the weights of the latest transactions and reduce the weights of historical transactions,another novel way to design decay factor based on Gaussian function is proposed.For comparing the pros and cons of different time factors,four time decay models are researched and designed.The algorithms based on these four models are designed to discover closed frequent patterns over data streams.The performance of the proposed methods to mine the frequent patterns on the high-density or low-density data streams is evaluated via experiments.Results show that using the average time decay factor balances the high recall and high precision of the algorithm.Compared with other ways,setting decay factor based on Gaussian function gets better performance than them.
出处 《计算机研究与发展》 EI CSCD 北大核心 2015年第12期2834-2843,共10页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61563001) 国家民委科研基金项目(14BFZ008) 北京市自然科学基金项目(4142042) 北方民族大学科研基金项目(2013QZP02)
关键词 衰减因子 时间衰减模型 高斯函数 查全率 查准率 频繁模式挖掘 数据流挖掘 decay factor time decay model Gaussian function recall precision frequent pattern mining data streams mining
  • 相关文献

参考文献16

  • 1李国徽,陈辉.挖掘数据流任意滑动时间窗口内频繁模式[J].软件学报,2008,19(10):2585-2596. 被引量:45
  • 2Chen Hui, Shu L, Xia Jiali, et al. Mining frequent patterns in a varying-size sliding window of online transactional data streams [J]. Information Sciences, 2012, 215:15-36. 被引量:1
  • 3李海峰,章宁,朱建明,曹怀虎.时间敏感数据流上的频繁项集挖掘算法[J].计算机学报,2012,35(11):2283-2293. 被引量:29
  • 4Chi Yun, Wang Haixun, Yu P S, et al. Catch the moment.- Maintaining closed frequent itemsets over a data stream sliding window [J]. Knowledge and Information Systems, 2006, 10(3): 265-294. 被引量:1
  • 5Yen S J, Lee Y, Wu Chengwei, et al. An efficient algorithm for maintaining frequent closed itemsets over data stream [G] //Next-Generation Applied Intelligence. Berlin: Springer, 2009 : 767-776. 被引量:1
  • 6Tang Keming, Dai Caiyan, Chen Ling. A novel strategy for mining frequent closed itemsets in data streams [J]. Journal of Computers, 2012, 7(7): 1564-1572. 被引量:1
  • 7Noria F, Deypir M, Sadreddini M H. A sliding window based algorithm for frequent closed itemset mining over data streams [J]. Journal of Systems and Software, 2013, 86(3) : 615-623. 被引量:1
  • 8Cheng J, Ke Yiping, Ng W. Maintaining frequent closed itemsets over a sliding window [J]. Journal of Intelligent Information Systems, 2008, 31(3): 191-215. 被引量:1
  • 9Yen S, Wu Chengwei, Lee Y, et al. A fast algorithm for mining frequent closed itemsets over stream sliding window [C] //Proc of 2011 IEEE Int Conf on Fuzzy Systems. Piscataway, NJ: IEEE, 2011:996-1002. 被引量:1
  • 10HewaNadungodage C, Xia Yuni, Lee J J, et al. Hyper- structure mining of frequent patterns in uncertain data streams [J]. Knowledge and Information Systems, 2013, 37 (1): 219-244. 被引量:1

二级参考文献50

  • 1李建中 于戈 周傲英.不确定性数据管理的要求与挑战[J].中国计算机学会通讯,2009,5(4):6-14. 被引量:8
  • 2Gaber MM, Zaslavsky A, Krishnaswamy S. Mining data streams: A review. ACM SIGMOD Record, 2005,34(2): 18-26. 被引量:1
  • 3Jiang N, Gruenwald L. Research issues in data stream association rule mining. ACM SIGMOD Record, 2006,35(1):14-19. 被引量:1
  • 4Garofalakis MN, Gehrke J. Querying and mining data streams: You only get one look a tutorial. In: Franklin MJ, Moon B, Ailamaki A, eds. Proc. of the 2002 ACM SIGMOD Int'l Conf. on Management of Data. Madison: ACM Press, 2002. 635-635. 被引量:1
  • 5Giannella C, Han J, Pei J, Yan X, Yu PS. Mining frequent patterns in data streams at multiple time granularities. In: Data Mining: Next Generation Challenges and Future Directions. 2004. 191-212. 被引量:1
  • 6Chang JH, Lee WS. Finding recent frequent itemsets adaptively over online data streams. In: Lise G, Ted ES, Pedro D, Christos F, eds. Proc. of the 9th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Washington: ACM Press, 2003. 487-492. 被引量:1
  • 7Jiang N, Gruenwald L. CFI-Stream: Mining closed frequent itemsets in data streams. In: Roberto B, Kristin PB, Gautam D, Dimitrios G, Johannes G, eds. Proc. of the 12th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Philadelphia: ACM Press, 2006. 592-597. 被引量:1
  • 8Yu JX, Chong Z, Lu H, Zhang Z, Zhou A. A false negative approach to mining frequent itemsets from high speed transactional data streams, Information Sciences, 2006,176(4):1986-2015. 被引量:1
  • 9Leung CKS, Khan QI. DStree: A tree structure for the mining of frequent sets from data streams. In: Clifton CW, Zhong N, Liu JM, Wah BW, Wu XD, eds. Proc. of the 6th Int'l Conf. on Data Mining. Hong Kong: IEEE Press, 2006. 928-932. 被引量:1
  • 10Wong RCW, Fu AWC. Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery, 2006,13(2): 193-217. 被引量:1

共引文献79

同被引文献32

引证文献4

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部