We are entering a new era of computing, characterized by the need to handle over one zettabyte (1021 bytes, or ZB) of data. The world's capacities to sense, transmit, store, and process information need to grow thr...We are entering a new era of computing, characterized by the need to handle over one zettabyte (1021 bytes, or ZB) of data. The world's capacities to sense, transmit, store, and process information need to grow three orders of magnitude, while maintain an energy consumption level similar to that of the year 2010. In other words, we need to produce thousand-fold improvement in performance per watt. To face this challenge, in 2012 the Chinese Academy of Sciences launched a 10-year strategic priority research initiative called the Next Generation Information and Communication Technology initiative (the NICT initiative). A research thrust of the NICT program is the Cloud-Sea Computing Systems project. The main idea is to augment conventional cloud computing by cooperation and integration of the cloud-side systems and the sea-side systems, where the "sea-side" refers to an augmented client side consisting of human facing and physical world facing devices and subsystems. The Cloud-Sea Computing Systems project consists of four research tasks: a new computing model called REST 2.0 which extends the REST (representational state transfer) architectural style of Web computing to cloud-sea computing, a three-tier storage system architecture capable of managing ZB of data, a billion-thread datacenter server with high energy efficiency, and an elastic processor aiming at energy efficiency of one trillion operations per second per watt. This special section contains 12 papers produced by the Cloud-Sea Computing Systems project team, presenting research results relating to sensing and REST 2.0, the elastic processor, the hyperparallel server, and the cloud-sea storage.展开更多
Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which h...Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which has become a bottleneck in many-core processors. In this paper, we present a novel heterogeneous many-core processor architecture named deeply fused many-core (DFMC) for high performance computing systems. DFMC integrates management processing ele- ments (MPEs) and computing processing elements (CPEs), which are heterogeneous processor cores for different application features with a unified ISA (instruction set architecture), a unified execution model, and share-memory that supports cache coherence. The DFMC processor can alleviate the memory wall problem by combining a series of cooperative computing techniques of CPEs, such as multi-pattern data stream transfer, efficient register-level communication mechanism, and fast hardware synchronization technique. These techniques are able to improve on-chip data reuse and optimize memory access performance. This paper illustrates an implementation of a full system prototype based on FPGA with four MPEs and 256 CPEs. Our experimental results show that the effect of the cooperative computing techniques of CPEs is significant, with DGEMM (double-precision matrix multiplication) achieving an efficiency of 94%, FFT (fast Fourier transform) obtaining a performance of 207 GFLOPS and FDTD (finite-difference time-domain) obtaining a performance of 27 GFLOPS.展开更多
基金Supported by the Strategic Priority Program of the Chinese Academy of Sciences under Grant No.XDA06010401 the Guangdong Talents Program of China under Grant No.201001D0104726115
文摘We are entering a new era of computing, characterized by the need to handle over one zettabyte (1021 bytes, or ZB) of data. The world's capacities to sense, transmit, store, and process information need to grow three orders of magnitude, while maintain an energy consumption level similar to that of the year 2010. In other words, we need to produce thousand-fold improvement in performance per watt. To face this challenge, in 2012 the Chinese Academy of Sciences launched a 10-year strategic priority research initiative called the Next Generation Information and Communication Technology initiative (the NICT initiative). A research thrust of the NICT program is the Cloud-Sea Computing Systems project. The main idea is to augment conventional cloud computing by cooperation and integration of the cloud-side systems and the sea-side systems, where the "sea-side" refers to an augmented client side consisting of human facing and physical world facing devices and subsystems. The Cloud-Sea Computing Systems project consists of four research tasks: a new computing model called REST 2.0 which extends the REST (representational state transfer) architectural style of Web computing to cloud-sea computing, a three-tier storage system architecture capable of managing ZB of data, a billion-thread datacenter server with high energy efficiency, and an elastic processor aiming at energy efficiency of one trillion operations per second per watt. This special section contains 12 papers produced by the Cloud-Sea Computing Systems project team, presenting research results relating to sensing and REST 2.0, the elastic processor, the hyperparallel server, and the cloud-sea storage.
文摘Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which has become a bottleneck in many-core processors. In this paper, we present a novel heterogeneous many-core processor architecture named deeply fused many-core (DFMC) for high performance computing systems. DFMC integrates management processing ele- ments (MPEs) and computing processing elements (CPEs), which are heterogeneous processor cores for different application features with a unified ISA (instruction set architecture), a unified execution model, and share-memory that supports cache coherence. The DFMC processor can alleviate the memory wall problem by combining a series of cooperative computing techniques of CPEs, such as multi-pattern data stream transfer, efficient register-level communication mechanism, and fast hardware synchronization technique. These techniques are able to improve on-chip data reuse and optimize memory access performance. This paper illustrates an implementation of a full system prototype based on FPGA with four MPEs and 256 CPEs. Our experimental results show that the effect of the cooperative computing techniques of CPEs is significant, with DGEMM (double-precision matrix multiplication) achieving an efficiency of 94%, FFT (fast Fourier transform) obtaining a performance of 207 GFLOPS and FDTD (finite-difference time-domain) obtaining a performance of 27 GFLOPS.