P2P流媒体cache是一种有效减少带宽开销、提高对象利用率的技术,通常采用FIFO,LRU等算法置换内容.然而,流媒体不同于Web对象,P2P网络也有别于客户/服务器模式.在分布式应用中这些算法可能影响系统的性能,为此,分析了FIFO和LRU置换算法,...P2P流媒体cache是一种有效减少带宽开销、提高对象利用率的技术,通常采用FIFO,LRU等算法置换内容.然而,流媒体不同于Web对象,P2P网络也有别于客户/服务器模式.在分布式应用中这些算法可能影响系统的性能,为此,分析了FIFO和LRU置换算法,提出了基于供求关系的SD算法,以及基于分片副本数量的REP算法,并对其进行评估和比较.针对不同的节点到达间隔,将SD和REP同FIFO,LRU进行比较,发现在启动延迟、媒体副本数量和根节点依赖度方面SD和REP几乎均优于FIFO和LRU.同LSB(least sent bytes)算法相比,某些场景中SD的启动延迟减少了约40%,而REP在副本数量方面远远超过LSB的结果,说明在P2P网络流媒体服务中使用SD和REP缓存置换算法有助于提高系统性能.展开更多
Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking d...Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking data access delay. However, current client-initiated prefetching strategies, where a computing processor initiates prefetching instructions, have many limitations. They do not work well for applications with complex, non-contiguous data access patterns. While technology advances continue to increase the gap between computing and data access performance, trading computing power for reducing data access delay has become a natural choice. In this paper, we present a serverbased data-push approach and discuss its associated implementation mechanisms. In the server-push architecture, a dedicated server called Data Push Server (DPS) initiates and proactively pushes data closer to the client in time. Issues, such as what data to fetch, when to fetch, and how to push are studied. The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching. Simulation results show that L1 Cache miss rate can be reduced by up to 97% (71% on average) over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates.展开更多
在现代处理器体系架构中,缓存是解决存储墙瓶颈的重要手段,但是缓存访问需求是随程序甚至是程序片段的切换而变化的,这导致传统的固定参数配置的缓存架构难以在长时间或在程序间依然保持高效性能。文中提出一种缓存组相联度的自适应扩...在现代处理器体系架构中,缓存是解决存储墙瓶颈的重要手段,但是缓存访问需求是随程序甚至是程序片段的切换而变化的,这导致传统的固定参数配置的缓存架构难以在长时间或在程序间依然保持高效性能。文中提出一种缓存组相联度的自适应扩展方法,能根据程序运行时缓存组活跃状态,利用短时非活跃缓存组的存储空间,来扩展当前活跃缓存组的组相联数目,并可实时动态调整组与组之间的扩展互联关系,有效提升缓存空间的整体利用效率。文中在Gem5软件中对所提出的缓存组相联自适应扩展架构进行了仿真,并基于SPEC CPU 2017基准测试集进行了性能测试,结果显示所提方法明显改善了缓存组访问的均匀性,对典型程序缓存组使用频次的均匀性最大提升23.14%左右,降低缓存访问缺失数最大可达54.2%。硬件实现和仿真结果显示,与HY-Way等低功耗可重构缓存架构相比,文中所述缓存架构资源消耗减少了7.66%以上,在嵌入式处理器设计中有较大的应用价值。展开更多
文摘P2P流媒体cache是一种有效减少带宽开销、提高对象利用率的技术,通常采用FIFO,LRU等算法置换内容.然而,流媒体不同于Web对象,P2P网络也有别于客户/服务器模式.在分布式应用中这些算法可能影响系统的性能,为此,分析了FIFO和LRU置换算法,提出了基于供求关系的SD算法,以及基于分片副本数量的REP算法,并对其进行评估和比较.针对不同的节点到达间隔,将SD和REP同FIFO,LRU进行比较,发现在启动延迟、媒体副本数量和根节点依赖度方面SD和REP几乎均优于FIFO和LRU.同LSB(least sent bytes)算法相比,某些场景中SD的启动延迟减少了约40%,而REP在副本数量方面远远超过LSB的结果,说明在P2P网络流媒体服务中使用SD和REP缓存置换算法有助于提高系统性能.
基金This research was supported in part by the National Science Foundation of U.S.A.under NSF Grant Nos. EIA-0224377,CNS-0406328,CNS-0509118,and CCF-0621435.
文摘Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking data access delay. However, current client-initiated prefetching strategies, where a computing processor initiates prefetching instructions, have many limitations. They do not work well for applications with complex, non-contiguous data access patterns. While technology advances continue to increase the gap between computing and data access performance, trading computing power for reducing data access delay has become a natural choice. In this paper, we present a serverbased data-push approach and discuss its associated implementation mechanisms. In the server-push architecture, a dedicated server called Data Push Server (DPS) initiates and proactively pushes data closer to the client in time. Issues, such as what data to fetch, when to fetch, and how to push are studied. The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching. Simulation results show that L1 Cache miss rate can be reduced by up to 97% (71% on average) over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates.
文摘在现代处理器体系架构中,缓存是解决存储墙瓶颈的重要手段,但是缓存访问需求是随程序甚至是程序片段的切换而变化的,这导致传统的固定参数配置的缓存架构难以在长时间或在程序间依然保持高效性能。文中提出一种缓存组相联度的自适应扩展方法,能根据程序运行时缓存组活跃状态,利用短时非活跃缓存组的存储空间,来扩展当前活跃缓存组的组相联数目,并可实时动态调整组与组之间的扩展互联关系,有效提升缓存空间的整体利用效率。文中在Gem5软件中对所提出的缓存组相联自适应扩展架构进行了仿真,并基于SPEC CPU 2017基准测试集进行了性能测试,结果显示所提方法明显改善了缓存组访问的均匀性,对典型程序缓存组使用频次的均匀性最大提升23.14%左右,降低缓存访问缺失数最大可达54.2%。硬件实现和仿真结果显示,与HY-Way等低功耗可重构缓存架构相比,文中所述缓存架构资源消耗减少了7.66%以上,在嵌入式处理器设计中有较大的应用价值。