The instruction fetch unit (IFU) usually dissipates a considerable portion of total chip power. In traditional IFU architectures, as soon as the fetch address is generated, it needs to be sent to the instruction cac...The instruction fetch unit (IFU) usually dissipates a considerable portion of total chip power. In traditional IFU architectures, as soon as the fetch address is generated, it needs to be sent to the instruction cache and TLB arrays for instruction fetch. Since limited work can be done by the power-saving logic after the fetch address generation and before the instruction fetch, previous power-saving approaches usually suffer from the unnecessary restrictions from traditional IFU architectures. In this paper, we present CASA, a new power-aware IFU architecture, which effectively reduces the unnecessary restrictions on the power-saving approaches and provides sufficient time and information for the power-saving logic of both instruction cache and TLB. By analyzing, recording, and utilizing the key information of the dynamic instruction flow early in the front-end pipeline, CASA brings the opportunity to maximize the power efficiency and minimize the performance overhead. Compared to the baseline configuration, the leakage and dynamic power of instruction cache is reduced by 89.7% and 64.1% respectively, and the dynamic power of instruction TLB is reduced by 90.2%. Meanwhile the performance degradation in the worst case is only 0.63%. Compared to previous state-of-the-art power-saving approaches, the CASA-based approach saves IFU power more effectively, incurs less performance overhead and achieves better scalability. It is promising that CASA can stimulate further work on architectural solutions to power-efficient IFU designs.展开更多
循环展开是一种常用的编译优化技术,能够有效减少循环开销,提升指令级并行程度和数据局部性,提升循环的执行效能。然而,过度的循环展开会造成指令Cache溢出,增大寄存器压力,循环展开次数太少又会浪费潜在的性能提升机会,因此寻找恰当的...循环展开是一种常用的编译优化技术,能够有效减少循环开销,提升指令级并行程度和数据局部性,提升循环的执行效能。然而,过度的循环展开会造成指令Cache溢出,增大寄存器压力,循环展开次数太少又会浪费潜在的性能提升机会,因此寻找恰当的展开因子是研究循环展开问题的核心。基于GCC开源编译器,面向循环展开问题开展深入的分析与研究,针对指令Cache和寄存器资源对循环展开的影响,提出了一种基于指令Cache和寄存器压力的循环展开因子计算方法,并在GCC编译器中实现了该计算方法。申威和海光平台上的实验结果显示,相较于目前GCC中存在的其它展开因子计算方法,所提出的方法可以获得更为有效的循环展开因子,提升了程序性能。在SPEC CPU 2006测试集上的平均性能分别提升了2.7%和3.1%,在NPB-3.3.1测试集上的分别为5.4%和6.1%。展开更多
Demands for low-energy microcontrollers have been increasing in recent years. Since most microcontrollers achieve user programmability by integrating nonvolatile (NV) memories such as flash memories for storing their ...Demands for low-energy microcontrollers have been increasing in recent years. Since most microcontrollers achieve user programmability by integrating nonvolatile (NV) memories such as flash memories for storing their programs, the large power consumption required in accessing an NV memory has become a major problem. This problem becomes critical when the power supply voltage of NV microcontrollers is decreased. We can solve this problem by introducing an instruction cache, thus reducing the access frequency of the NV memory. Unlike general-purpose microprocessors, microcontrollers used for real-time applications in embedded systems must accurately calculate program execution time prior to its execution. Therefore, we introduce a “transparent” instruction cache, which does not change the existing NV microcontroller’s cycle-level execution time, for reducing power and energy consumption, but not for improving the processing speed. We have conducted detailed microar chitecture design based on the architecture of a major industrial microcontroller, and we evaluated power and energy consumption for several benchmark programs. Our evaluation shows that the proposed instruction cache can successfully reduce energy consumption in a fairly wide range of practical NV microcontroller configurations.展开更多
基金Supported by the National High Technology Development 863 Program of China under Grant No.2004AAIZ1010.
文摘The instruction fetch unit (IFU) usually dissipates a considerable portion of total chip power. In traditional IFU architectures, as soon as the fetch address is generated, it needs to be sent to the instruction cache and TLB arrays for instruction fetch. Since limited work can be done by the power-saving logic after the fetch address generation and before the instruction fetch, previous power-saving approaches usually suffer from the unnecessary restrictions from traditional IFU architectures. In this paper, we present CASA, a new power-aware IFU architecture, which effectively reduces the unnecessary restrictions on the power-saving approaches and provides sufficient time and information for the power-saving logic of both instruction cache and TLB. By analyzing, recording, and utilizing the key information of the dynamic instruction flow early in the front-end pipeline, CASA brings the opportunity to maximize the power efficiency and minimize the performance overhead. Compared to the baseline configuration, the leakage and dynamic power of instruction cache is reduced by 89.7% and 64.1% respectively, and the dynamic power of instruction TLB is reduced by 90.2%. Meanwhile the performance degradation in the worst case is only 0.63%. Compared to previous state-of-the-art power-saving approaches, the CASA-based approach saves IFU power more effectively, incurs less performance overhead and achieves better scalability. It is promising that CASA can stimulate further work on architectural solutions to power-efficient IFU designs.
文摘循环展开是一种常用的编译优化技术,能够有效减少循环开销,提升指令级并行程度和数据局部性,提升循环的执行效能。然而,过度的循环展开会造成指令Cache溢出,增大寄存器压力,循环展开次数太少又会浪费潜在的性能提升机会,因此寻找恰当的展开因子是研究循环展开问题的核心。基于GCC开源编译器,面向循环展开问题开展深入的分析与研究,针对指令Cache和寄存器资源对循环展开的影响,提出了一种基于指令Cache和寄存器压力的循环展开因子计算方法,并在GCC编译器中实现了该计算方法。申威和海光平台上的实验结果显示,相较于目前GCC中存在的其它展开因子计算方法,所提出的方法可以获得更为有效的循环展开因子,提升了程序性能。在SPEC CPU 2006测试集上的平均性能分别提升了2.7%和3.1%,在NPB-3.3.1测试集上的分别为5.4%和6.1%。
文摘Demands for low-energy microcontrollers have been increasing in recent years. Since most microcontrollers achieve user programmability by integrating nonvolatile (NV) memories such as flash memories for storing their programs, the large power consumption required in accessing an NV memory has become a major problem. This problem becomes critical when the power supply voltage of NV microcontrollers is decreased. We can solve this problem by introducing an instruction cache, thus reducing the access frequency of the NV memory. Unlike general-purpose microprocessors, microcontrollers used for real-time applications in embedded systems must accurately calculate program execution time prior to its execution. Therefore, we introduce a “transparent” instruction cache, which does not change the existing NV microcontroller’s cycle-level execution time, for reducing power and energy consumption, but not for improving the processing speed. We have conducted detailed microar chitecture design based on the architecture of a major industrial microcontroller, and we evaluated power and energy consumption for several benchmark programs. Our evaluation shows that the proposed instruction cache can successfully reduce energy consumption in a fairly wide range of practical NV microcontroller configurations.