The Godson project is the first attempt to design high performancegeneral-purpose microprocessors in China. This paper introduces the microarchitecture of theGodson-2 processor which is a 64-bit, 4-issue, out-of-order...The Godson project is the first attempt to design high performancegeneral-purpose microprocessors in China. This paper introduces the microarchitecture of theGodson-2 processor which is a 64-bit, 4-issue, out-of-order execution RISC processor that implementsthe 64-bit MIPS-like instruction set. The adoption of the aggressive out-of-order executiontechniques (such as register mapping, branch prediction, and dynamic scheduling) and cachetechniques (such as non-blocking cache, load speculation, dynamic memory disambiguation) helps theGodson-2 processor to achieve high performance even at not so high frequency. The Godson-2 processorhas been physically implemented on a 6-metal 0.18 μm CMOS technology based on the automaticplacing and routing flow with the help of some crafted library cells and macros. The area of thechip is 6,700 micrometers by 6,200 micrometers and the clock cycle at typical corner is 2.3 ns.展开更多
现有的分支预测模型无法完全准确预测处理器中各种指令的行为,导致处理效率受限。为此提出了两种混合预测解决方案,旨在结合多种分支预测模型,以提高预测的准确性和处理器的执行效率。将TAGE(tagged geometric history length)分支预测...现有的分支预测模型无法完全准确预测处理器中各种指令的行为,导致处理效率受限。为此提出了两种混合预测解决方案,旨在结合多种分支预测模型,以提高预测的准确性和处理器的执行效率。将TAGE(tagged geometric history length)分支预测模型与BATAGE(Bayesian tagged geometric history length)分支预测模型的预测结果转交Hybrid模型。在预测阶段中,Hybrid模型会根据TAGE和BATAGE的历史表现去选择表现最佳分支预测模型的预测结果。而在更新阶段中,Hybrid模型会根据设计的混合预测策略对需要更新条目的饱和计数器进行更新。在CBP(championship branch prediction)软件仿真平台提供的440个测试程序上进行实验,实验结果表明:与多种最新主流分支预测模型相比,两种混合预测解决方案的预测错误率均低于它们。该研究为预测所有指令模式行为问题提供了有效解决方案。在实际CPU的分支指令预测,该研究提供了一些实用价值。展开更多
In theory, branch predictors with more compli- cated algorithms and larger data structures provide more accurate predictions. Unfortunately, overly large structures and excessively complicated algorithms cannot be imp...In theory, branch predictors with more compli- cated algorithms and larger data structures provide more accurate predictions. Unfortunately, overly large structures and excessively complicated algorithms cannot be implemented because of their long access delay. To date, many strategies have been proposed to balance delay with accuracy, but none has completely solved the issue. The architecture for ahead branch prediction (A2BP) separates traditional pre- dictors into two parts. First is a small table located at the front-end of the pipeline, which makes the prediction brief enough even for some aggressive processors. Second, oper- ations on complicated algorithms and large data structures for accurate predictions are all moved to the back-end of the pipeline. An effective mechanism is introduced for ahead branch prediction in the back-end and small table update in the front. To substantially improve prediction accuracy, an indirect branch prediction algorithm based on branch history and target path (BHTP) is implemented in AZBE Experiments with the standard performance evaluation corpora- tion (SPEC) benchmarks on gem5/SimpleScalar simulators demonstrate that AzBP improves average performance by 2.92% compared with a commonly used branch target bufferbased predictor. In addition, indirect branch misses with the BHTP algorithm are reduced by an average of 28.98% com- pared with the traditional algorithm.展开更多
文摘The Godson project is the first attempt to design high performancegeneral-purpose microprocessors in China. This paper introduces the microarchitecture of theGodson-2 processor which is a 64-bit, 4-issue, out-of-order execution RISC processor that implementsthe 64-bit MIPS-like instruction set. The adoption of the aggressive out-of-order executiontechniques (such as register mapping, branch prediction, and dynamic scheduling) and cachetechniques (such as non-blocking cache, load speculation, dynamic memory disambiguation) helps theGodson-2 processor to achieve high performance even at not so high frequency. The Godson-2 processorhas been physically implemented on a 6-metal 0.18 μm CMOS technology based on the automaticplacing and routing flow with the help of some crafted library cells and macros. The area of thechip is 6,700 micrometers by 6,200 micrometers and the clock cycle at typical corner is 2.3 ns.
文摘现有的分支预测模型无法完全准确预测处理器中各种指令的行为,导致处理效率受限。为此提出了两种混合预测解决方案,旨在结合多种分支预测模型,以提高预测的准确性和处理器的执行效率。将TAGE(tagged geometric history length)分支预测模型与BATAGE(Bayesian tagged geometric history length)分支预测模型的预测结果转交Hybrid模型。在预测阶段中,Hybrid模型会根据TAGE和BATAGE的历史表现去选择表现最佳分支预测模型的预测结果。而在更新阶段中,Hybrid模型会根据设计的混合预测策略对需要更新条目的饱和计数器进行更新。在CBP(championship branch prediction)软件仿真平台提供的440个测试程序上进行实验,实验结果表明:与多种最新主流分支预测模型相比,两种混合预测解决方案的预测错误率均低于它们。该研究为预测所有指令模式行为问题提供了有效解决方案。在实际CPU的分支指令预测,该研究提供了一些实用价值。
文摘In theory, branch predictors with more compli- cated algorithms and larger data structures provide more accurate predictions. Unfortunately, overly large structures and excessively complicated algorithms cannot be implemented because of their long access delay. To date, many strategies have been proposed to balance delay with accuracy, but none has completely solved the issue. The architecture for ahead branch prediction (A2BP) separates traditional pre- dictors into two parts. First is a small table located at the front-end of the pipeline, which makes the prediction brief enough even for some aggressive processors. Second, oper- ations on complicated algorithms and large data structures for accurate predictions are all moved to the back-end of the pipeline. An effective mechanism is introduced for ahead branch prediction in the back-end and small table update in the front. To substantially improve prediction accuracy, an indirect branch prediction algorithm based on branch history and target path (BHTP) is implemented in AZBE Experiments with the standard performance evaluation corpora- tion (SPEC) benchmarks on gem5/SimpleScalar simulators demonstrate that AzBP improves average performance by 2.92% compared with a commonly used branch target bufferbased predictor. In addition, indirect branch misses with the BHTP algorithm are reduced by an average of 28.98% com- pared with the traditional algorithm.