摘要
传统缓存算法存在命中率低、交换率高等问题,且现有缓存算法在分布式大数据存储系统中并不适用,为此提出了一种基于频繁序列挖掘的自适应缓存策略。该方法使用数据挖掘算法挖掘历史访问窗口内的频繁序列,将频繁序列模糊合并后构建匹配模式集合以供查询。当新的访问来临时,将固定访问长度内的子序列与匹配模式集合进行匹配,然后根据匹配结果预取数据,同时结合修改后的S4LRU(4-segmented least recently used)数据结构进行缓存数据换出。在公开的大数据处理trace集上进行了仿真实验,实验结果表明,在不同的缓存大小下,提出算法与现有典型缓存算法相比,平均命中率提高了0.327倍,平均交换率降低了0.33倍,同时具有低开销和高时效的特点。此结果表明,该方法较传统替换算法而言是一个更为有效的缓存策略。
Traditional cache algorithms have problems such as low hit rate and high exchange rate. And the existing caching algorithm is not applicable in the distributed big data storage system. This paper proposed an adaptive caching strategy based on frequent sequence mining. This method used a data mining algorithm to mine the frequent sequences in the historical access window, and merged the frequent sequences to construct a set of matching patterns for query. When a new access coming, matched the subsequence within the fixed access length with the matching pattern set, and then prefetched the data according to the matching result, and combined with the modified S4 LRU(4-segmented least recently used) data structure for cache data exchange out. This paper conducted simulation experiments on the public big data processing trace set. The experimental results show that, under different cache sizes, compared with the existing typical cache algorithms, the proposed algorithm increases the average hit rate by 0.327 times and the average exchange rate reduces by 0.33 times, at the same time has the characteristics of low overhead and high time efficiency. This result shows that the proposed method is a more effective caching strategy than the traditional replacement algorithm.
作者
杜科星
张小芳
张晓
赵晓南
Du Kexing;Zhang Xiaofang;Zhang Xiao;Zhao Xiaonan b(College of Software,Northwestern Polytechnical University,Xi’an 710072,China;College of Computer,Northwestern Polytechnical University,Xi’an 710072,China)
出处
《计算机应用研究》
CSCD
北大核心
2022年第3期831-835,共5页
Application Research of Computers
基金
国家重点研发计划资助项目(2018YFB1004401)
北京市自然科学基金——海淀原始创新联合基金资助项目(L192027)
陕西省重点产业链项目(2021ZDLGY03-02,2021ZDLGY03-08)。
关键词
缓存算法
频繁序列挖掘
分布文件系统优化
caching algorithm
frequent sequence mining
distributed file system optimization