期刊文献+

基于频繁序列挖掘的文件系统缓存算法设计 被引量:2

File system caching algorithm based on frequent sequence mining
下载PDF
导出
摘要 传统缓存算法存在命中率低、交换率高等问题,且现有缓存算法在分布式大数据存储系统中并不适用,为此提出了一种基于频繁序列挖掘的自适应缓存策略。该方法使用数据挖掘算法挖掘历史访问窗口内的频繁序列,将频繁序列模糊合并后构建匹配模式集合以供查询。当新的访问来临时,将固定访问长度内的子序列与匹配模式集合进行匹配,然后根据匹配结果预取数据,同时结合修改后的S4LRU(4-segmented least recently used)数据结构进行缓存数据换出。在公开的大数据处理trace集上进行了仿真实验,实验结果表明,在不同的缓存大小下,提出算法与现有典型缓存算法相比,平均命中率提高了0.327倍,平均交换率降低了0.33倍,同时具有低开销和高时效的特点。此结果表明,该方法较传统替换算法而言是一个更为有效的缓存策略。 Traditional cache algorithms have problems such as low hit rate and high exchange rate. And the existing caching algorithm is not applicable in the distributed big data storage system. This paper proposed an adaptive caching strategy based on frequent sequence mining. This method used a data mining algorithm to mine the frequent sequences in the historical access window, and merged the frequent sequences to construct a set of matching patterns for query. When a new access coming, matched the subsequence within the fixed access length with the matching pattern set, and then prefetched the data according to the matching result, and combined with the modified S4 LRU(4-segmented least recently used) data structure for cache data exchange out. This paper conducted simulation experiments on the public big data processing trace set. The experimental results show that, under different cache sizes, compared with the existing typical cache algorithms, the proposed algorithm increases the average hit rate by 0.327 times and the average exchange rate reduces by 0.33 times, at the same time has the characteristics of low overhead and high time efficiency. This result shows that the proposed method is a more effective caching strategy than the traditional replacement algorithm.
作者 杜科星 张小芳 张晓 赵晓南 Du Kexing;Zhang Xiaofang;Zhang Xiao;Zhao Xiaonan b(College of Software,Northwestern Polytechnical University,Xi’an 710072,China;College of Computer,Northwestern Polytechnical University,Xi’an 710072,China)
出处 《计算机应用研究》 CSCD 北大核心 2022年第3期831-835,共5页 Application Research of Computers
基金 国家重点研发计划资助项目(2018YFB1004401) 北京市自然科学基金——海淀原始创新联合基金资助项目(L192027) 陕西省重点产业链项目(2021ZDLGY03-02,2021ZDLGY03-08)。
关键词 缓存算法 频繁序列挖掘 分布文件系统优化 caching algorithm frequent sequence mining distributed file system optimization
  • 相关文献

参考文献3

  • 1陈友旭..分布式文件系统中元数据管理优化[D].中国科学技术大学,2019:
  • 2钱能武,郭卫斌,范贵生.基于关联规则挖掘的分布式小文件存储方法[J].华东理工大学学报(自然科学版),2016,42(5):708-714. 被引量:8
  • 3于跃..基于Hadoop平台的并行化分布式关联规则挖掘算法研究[D].吉林大学,2017:

二级参考文献2

共引文献7

同被引文献25

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部