On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design...On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.展开更多
现代GPU不仅是功能强劲的图形处理引擎,也是具有强大计算性能和存储带宽的高度并行可编程器件,能够与CPU构建完整的异构处理系统。而将GPU用于图形处理以外的计算,一般称之为GPU通用计算(General-Purpose computing on Graphics Process...现代GPU不仅是功能强劲的图形处理引擎,也是具有强大计算性能和存储带宽的高度并行可编程器件,能够与CPU构建完整的异构处理系统。而将GPU用于图形处理以外的计算,一般称之为GPU通用计算(General-Purpose computing on Graphics Processing Unit,GPGPU)。对GPU通用计算的概念及分类、硬件架构及工作机制、软件环境及处理模型进行详细的研究,期望为GPU通用计算在航空嵌入式计算领域的进一步应用提供参考。展开更多
The rapid development of wearable computing technologies has led to an increased involvement of wearable devices in the daily lives of people.The main power sources of wearable devices are batteries;so,researchers mus...The rapid development of wearable computing technologies has led to an increased involvement of wearable devices in the daily lives of people.The main power sources of wearable devices are batteries;so,researchers must ensure high performance while reducing power consumption and improving the battery life of wearable devices.The purpose of this study is to analyze the new features of an Energy-Aware Scheduler(EAS)in the Android 7.1.2 operating system and the scarcity of EAS schedulers in wearable application scenarios.Also,the paper proposed an optimization scheme of EAS scheduler for wearable applications(Wearable-Application-optimized Energy-Aware Scheduler(WAEAS)).This scheme improves the accuracy of task workload prediction,the energy efficiency of central processing unit core selection,and the load balancing.The experimental results presented in this paper have verified the effectiveness of a WAEAS scheduler.展开更多
基于异构计算概念,使用GPU和Open CL加速了一个高复杂度的自适应图像去马赛克算法,并在AMD Bald Eagle和Fire Pro W8100组成的异构计算平台上完成了功能和性能测试。实验结果表明,该异构平台能取得良好的图像重建效果,W8100处理图像的...基于异构计算概念,使用GPU和Open CL加速了一个高复杂度的自适应图像去马赛克算法,并在AMD Bald Eagle和Fire Pro W8100组成的异构计算平台上完成了功能和性能测试。实验结果表明,该异构平台能取得良好的图像重建效果,W8100处理图像的速率超过了100 f/s,每帧图像有1 920×1 080个像素,证明异构计算平台及Open CL可满足医疗、网络监控等应用领域对高帧率、高清图像影像的需求。展开更多
基于Cell处理器的异构多核架构及软件显式管理的多级存储层次,使其面临编程困难和性能难以有效发挥等问题.现有基于Cell/B.E.的编程模型多侧重于支持类似于流处理的"批量访存"(bulk data transfer)应用,传统非规则访存应用性...基于Cell处理器的异构多核架构及软件显式管理的多级存储层次,使其面临编程困难和性能难以有效发挥等问题.现有基于Cell/B.E.的编程模型多侧重于支持类似于流处理的"批量访存"(bulk data transfer)应用,传统非规则访存应用性能较低.通过扩展Cell/B.E.访存库增强协处理单元的自主作用,以协处理单元为中心建立Cell计算平台上的MPI和弱一致性Pthread分层并行编程运行时支持.分层的运行时支持结构及扩展后的Cell/B.E.访存库使模型具有更好的效率和可扩展性,并且提高了非规则应用的性能;模型中的MPI方便了大量传统并行应用向新架构的移植及开发,而弱一致性Pthread则为MPI提供高效的任务运行时管理支持及为系统级用户提供对架构全面控制的编程接口.实验结果表明,提出的运行时支持技术不仅可适应不同应用的要求,同时借助访存库中的剖分优化机制可有效地挖掘Cell/B.E.架构性能.展开更多
基金Acknowledgements This work was partially supported by the Na- tional High-tech R&D Program of China (863 Program) (2012AA01A301), and the National Natural Science Foundation of China (Grant No. 61120106005). The MilkyWay-2 project is a great team effort and benefits from the cooperation of many individuals at NUDT. We thank all the people who have contributed to the system in a variety of ways.
文摘On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.
文摘The rapid development of wearable computing technologies has led to an increased involvement of wearable devices in the daily lives of people.The main power sources of wearable devices are batteries;so,researchers must ensure high performance while reducing power consumption and improving the battery life of wearable devices.The purpose of this study is to analyze the new features of an Energy-Aware Scheduler(EAS)in the Android 7.1.2 operating system and the scarcity of EAS schedulers in wearable application scenarios.Also,the paper proposed an optimization scheme of EAS scheduler for wearable applications(Wearable-Application-optimized Energy-Aware Scheduler(WAEAS)).This scheme improves the accuracy of task workload prediction,the energy efficiency of central processing unit core selection,and the load balancing.The experimental results presented in this paper have verified the effectiveness of a WAEAS scheduler.
文摘基于异构计算概念,使用GPU和Open CL加速了一个高复杂度的自适应图像去马赛克算法,并在AMD Bald Eagle和Fire Pro W8100组成的异构计算平台上完成了功能和性能测试。实验结果表明,该异构平台能取得良好的图像重建效果,W8100处理图像的速率超过了100 f/s,每帧图像有1 920×1 080个像素,证明异构计算平台及Open CL可满足医疗、网络监控等应用领域对高帧率、高清图像影像的需求。
文摘基于Cell处理器的异构多核架构及软件显式管理的多级存储层次,使其面临编程困难和性能难以有效发挥等问题.现有基于Cell/B.E.的编程模型多侧重于支持类似于流处理的"批量访存"(bulk data transfer)应用,传统非规则访存应用性能较低.通过扩展Cell/B.E.访存库增强协处理单元的自主作用,以协处理单元为中心建立Cell计算平台上的MPI和弱一致性Pthread分层并行编程运行时支持.分层的运行时支持结构及扩展后的Cell/B.E.访存库使模型具有更好的效率和可扩展性,并且提高了非规则应用的性能;模型中的MPI方便了大量传统并行应用向新架构的移植及开发,而弱一致性Pthread则为MPI提供高效的任务运行时管理支持及为系统级用户提供对架构全面控制的编程接口.实验结果表明,提出的运行时支持技术不仅可适应不同应用的要求,同时借助访存库中的剖分优化机制可有效地挖掘Cell/B.E.架构性能.