The Godson project is the first attempt to design high performancegeneral-purpose microprocessors in China. This paper introduces the microarchitecture of theGodson-2 processor which is a 64-bit, 4-issue, out-of-order...The Godson project is the first attempt to design high performancegeneral-purpose microprocessors in China. This paper introduces the microarchitecture of theGodson-2 processor which is a 64-bit, 4-issue, out-of-order execution RISC processor that implementsthe 64-bit MIPS-like instruction set. The adoption of the aggressive out-of-order executiontechniques (such as register mapping, branch prediction, and dynamic scheduling) and cachetechniques (such as non-blocking cache, load speculation, dynamic memory disambiguation) helps theGodson-2 processor to achieve high performance even at not so high frequency. The Godson-2 processorhas been physically implemented on a 6-metal 0.18 μm CMOS technology based on the automaticplacing and routing flow with the help of some crafted library cells and macros. The area of thechip is 6,700 micrometers by 6,200 micrometers and the clock cycle at typical corner is 2.3 ns.展开更多
现有的分支预测模型无法完全准确预测处理器中各种指令的行为,导致处理效率受限。为此提出了两种混合预测解决方案,旨在结合多种分支预测模型,以提高预测的准确性和处理器的执行效率。将TAGE(tagged geometric history length)分支预测...现有的分支预测模型无法完全准确预测处理器中各种指令的行为,导致处理效率受限。为此提出了两种混合预测解决方案,旨在结合多种分支预测模型,以提高预测的准确性和处理器的执行效率。将TAGE(tagged geometric history length)分支预测模型与BATAGE(Bayesian tagged geometric history length)分支预测模型的预测结果转交Hybrid模型。在预测阶段中,Hybrid模型会根据TAGE和BATAGE的历史表现去选择表现最佳分支预测模型的预测结果。而在更新阶段中,Hybrid模型会根据设计的混合预测策略对需要更新条目的饱和计数器进行更新。在CBP(championship branch prediction)软件仿真平台提供的440个测试程序上进行实验,实验结果表明:与多种最新主流分支预测模型相比,两种混合预测解决方案的预测错误率均低于它们。该研究为预测所有指令模式行为问题提供了有效解决方案。在实际CPU的分支指令预测,该研究提供了一些实用价值。展开更多
In theory, branch predictors with more compli- cated algorithms and larger data structures provide more accurate predictions. Unfortunately, overly large structures and excessively complicated algorithms cannot be imp...In theory, branch predictors with more compli- cated algorithms and larger data structures provide more accurate predictions. Unfortunately, overly large structures and excessively complicated algorithms cannot be implemented because of their long access delay. To date, many strategies have been proposed to balance delay with accuracy, but none has completely solved the issue. The architecture for ahead branch prediction (A2BP) separates traditional pre- dictors into two parts. First is a small table located at the front-end of the pipeline, which makes the prediction brief enough even for some aggressive processors. Second, oper- ations on complicated algorithms and large data structures for accurate predictions are all moved to the back-end of the pipeline. An effective mechanism is introduced for ahead branch prediction in the back-end and small table update in the front. To substantially improve prediction accuracy, an indirect branch prediction algorithm based on branch history and target path (BHTP) is implemented in AZBE Experiments with the standard performance evaluation corpora- tion (SPEC) benchmarks on gem5/SimpleScalar simulators demonstrate that AzBP improves average performance by 2.92% compared with a commonly used branch target bufferbased predictor. In addition, indirect branch misses with the BHTP algorithm are reduced by an average of 28.98% com- pared with the traditional algorithm.展开更多
在理论上,越来越复杂的分支预测算法和更大的存储结构会使分支预测精度不断提高,但当前复杂算法和庞大数据结构所引发的分支预测时延已无法满足流水线单周期运行要求.针对分支预测精度和时延的矛盾,设计提出提前分支预测结构(ahead bran...在理论上,越来越复杂的分支预测算法和更大的存储结构会使分支预测精度不断提高,但当前复杂算法和庞大数据结构所引发的分支预测时延已无法满足流水线单周期运行要求.针对分支预测精度和时延的矛盾,设计提出提前分支预测结构(ahead branch prediction architecture,ABPA).ABPA为流水线前端取指部件提供简单的分支预测表,以实现快速分支预测;复杂的预测算法和较大的存储结构均被移至流水线后端实现,从而保证了分支预测精度.对于一直难以准确预测的多目标间接分支指令,设计提出基于分支历史和目标路径的间接分支预测算法(indirect branch prediction algorithm based on branch history and target path,BHTP algorithm).提前分支预测算法采用改进的高精度分支预测算法和BHTP算法的混合.嵌入提前分支预测算法的分支预测引擎实现流水线后端的分支推测和目标预测,以及流水线前端的分支预测表更新.实验结果表明:采用ABPA结构和BHTP算法的分支预测系统平均精度达到94.27%.设计不仅实现了快速、高精度分支预测,更为分支预测的深入研究提供了条件.展开更多
文摘The Godson project is the first attempt to design high performancegeneral-purpose microprocessors in China. This paper introduces the microarchitecture of theGodson-2 processor which is a 64-bit, 4-issue, out-of-order execution RISC processor that implementsthe 64-bit MIPS-like instruction set. The adoption of the aggressive out-of-order executiontechniques (such as register mapping, branch prediction, and dynamic scheduling) and cachetechniques (such as non-blocking cache, load speculation, dynamic memory disambiguation) helps theGodson-2 processor to achieve high performance even at not so high frequency. The Godson-2 processorhas been physically implemented on a 6-metal 0.18 μm CMOS technology based on the automaticplacing and routing flow with the help of some crafted library cells and macros. The area of thechip is 6,700 micrometers by 6,200 micrometers and the clock cycle at typical corner is 2.3 ns.
文摘现有的分支预测模型无法完全准确预测处理器中各种指令的行为,导致处理效率受限。为此提出了两种混合预测解决方案,旨在结合多种分支预测模型,以提高预测的准确性和处理器的执行效率。将TAGE(tagged geometric history length)分支预测模型与BATAGE(Bayesian tagged geometric history length)分支预测模型的预测结果转交Hybrid模型。在预测阶段中,Hybrid模型会根据TAGE和BATAGE的历史表现去选择表现最佳分支预测模型的预测结果。而在更新阶段中,Hybrid模型会根据设计的混合预测策略对需要更新条目的饱和计数器进行更新。在CBP(championship branch prediction)软件仿真平台提供的440个测试程序上进行实验,实验结果表明:与多种最新主流分支预测模型相比,两种混合预测解决方案的预测错误率均低于它们。该研究为预测所有指令模式行为问题提供了有效解决方案。在实际CPU的分支指令预测,该研究提供了一些实用价值。
文摘In theory, branch predictors with more compli- cated algorithms and larger data structures provide more accurate predictions. Unfortunately, overly large structures and excessively complicated algorithms cannot be implemented because of their long access delay. To date, many strategies have been proposed to balance delay with accuracy, but none has completely solved the issue. The architecture for ahead branch prediction (A2BP) separates traditional pre- dictors into two parts. First is a small table located at the front-end of the pipeline, which makes the prediction brief enough even for some aggressive processors. Second, oper- ations on complicated algorithms and large data structures for accurate predictions are all moved to the back-end of the pipeline. An effective mechanism is introduced for ahead branch prediction in the back-end and small table update in the front. To substantially improve prediction accuracy, an indirect branch prediction algorithm based on branch history and target path (BHTP) is implemented in AZBE Experiments with the standard performance evaluation corpora- tion (SPEC) benchmarks on gem5/SimpleScalar simulators demonstrate that AzBP improves average performance by 2.92% compared with a commonly used branch target bufferbased predictor. In addition, indirect branch misses with the BHTP algorithm are reduced by an average of 28.98% com- pared with the traditional algorithm.
文摘在理论上,越来越复杂的分支预测算法和更大的存储结构会使分支预测精度不断提高,但当前复杂算法和庞大数据结构所引发的分支预测时延已无法满足流水线单周期运行要求.针对分支预测精度和时延的矛盾,设计提出提前分支预测结构(ahead branch prediction architecture,ABPA).ABPA为流水线前端取指部件提供简单的分支预测表,以实现快速分支预测;复杂的预测算法和较大的存储结构均被移至流水线后端实现,从而保证了分支预测精度.对于一直难以准确预测的多目标间接分支指令,设计提出基于分支历史和目标路径的间接分支预测算法(indirect branch prediction algorithm based on branch history and target path,BHTP algorithm).提前分支预测算法采用改进的高精度分支预测算法和BHTP算法的混合.嵌入提前分支预测算法的分支预测引擎实现流水线后端的分支推测和目标预测,以及流水线前端的分支预测表更新.实验结果表明:采用ABPA结构和BHTP算法的分支预测系统平均精度达到94.27%.设计不仅实现了快速、高精度分支预测,更为分支预测的深入研究提供了条件.