3-D Networks-on-Chip (NoC) emerge as a potent solution to address both the interconnection and design complexity problems facing future Multiprocessor System-on-Chips (MPSoCs). Effective run-time mapping on such 3...3-D Networks-on-Chip (NoC) emerge as a potent solution to address both the interconnection and design complexity problems facing future Multiprocessor System-on-Chips (MPSoCs). Effective run-time mapping on such 3-D NoC-based MPSoCs can be quite challenging, as the arrival order and task graphs of the target applications are typically not known a priori, which can be further complicated by stringent energy requirements for NoC systems. This paper thus presents an energy-aware run-time incremental mapping algorithm (ERIM) for 3-D NoC which can minimize the energy consumption due to the data communications among processor cores, while reducing the fragmentation effect on the incoming applications to be mapped, and simultaneously satisfying the thermal constraints imposed on each incoming application. Specifically, incoming applications are mapped to cuboid tile regions for lower energy consumption of communication and the minimal routing. Fragment tiles due to system fragmentation can be gleaned for better resource utilization. Extensive experiments have been conducted to evaluate the performance of the proposed algorithm ERIM, and the results are compared against the optimal mapping algorithm (branch-and-bound) and two heuristic algorithms (TB and TL). The experiments show that ERIM outperforms TB and TL methods with significant energy saving (more than 10%), much reduced average response time, and improved system utilization.展开更多
Networks-on-chip (NoC), a new system on chip (SoC) paradigm, has become a great focus of research by many groups during the last few years. Among all the NoC architectures that have been proposed until now, 2D-Mes...Networks-on-chip (NoC), a new system on chip (SoC) paradigm, has become a great focus of research by many groups during the last few years. Among all the NoC architectures that have been proposed until now, 2D-Mesh has proved to be the best architecture for implementation due to its regular and simple interconnection structure. In this paper, we propose a new interconnect architecture called 2D-diagonal mesh (2DDgl-Mesh) for on-chip communication. The 2DDglMesh is almost similar to traditional 2D-Mesh in aspects of cost, area, and implementation, but it can outperform the later in delay. The both architectures are compared by using NS-2 (a network simulator) and CINS1M (a component based interconnection simulator) under the same traffic models and parametric conditions. The results of comparison show that under the proposed architecture, the packets can almost always be routed to their destinations in less time. In addition, our archi- tecture can sometimes perform better than 2D-Mesh in drop ratio for special fixed traffic models.展开更多
Networks-on-chip (NoC) is a promising communication architecture for next generation SoC. The size of buffer used in on-chip routers impacts the silicon area and power consumption of NoC dominantly. It is important ...Networks-on-chip (NoC) is a promising communication architecture for next generation SoC. The size of buffer used in on-chip routers impacts the silicon area and power consumption of NoC dominantly. It is important to plan the total buffer-size and each router buffer-allocation carefully for an efficient NoC design. In this paper, we propose two buffer planning algorithms for application-specific NoC design. More precisely, given the traffic parameters and performance constraints of target application, the proposed algorithms automatically determine minimal buffer budget and assign the buffer depth for each input channel in different routers. The experimental results show that the proposed algorithms can significantly reduce total buffer usage and guarantee the performance requirements.展开更多
The quest for energy efficiency has growing importance in high performance many-core systems. However, in current practices, the power slacks, which are the differences observed between the input power budget and the ...The quest for energy efficiency has growing importance in high performance many-core systems. However, in current practices, the power slacks, which are the differences observed between the input power budget and the actual power consumed in the many-core systems, are typically ignored, thus leading to poor energy efficiency. In this paper, we propose a scheme to effectively power the on-chip communications by exploiting the available power slack that is totally wasted in current many-core systems. As so, the demand for extra energy from external power sources (e.g., batteries) is minimized, which helps improve the overall energy efficiency. In essence, the power slack is stored at each node and the proposed routing algorithm uses a dynamic programming network to find the globally optimal path, along which the total energy stored on the nodes is the maximum. Experimental results have confirmed that the proposed scheme, with low hardware overhead, can reduce latency and extra energy consumption by 44% and 48%, respectively, compared with the two competing routing methods.展开更多
The networks-on-chip (NoC) communication has an increasingly larger impact on the system power consumption and performance. Emerging technologies, like surface wave, are believed to have lower transmission latency a...The networks-on-chip (NoC) communication has an increasingly larger impact on the system power consumption and performance. Emerging technologies, like surface wave, are believed to have lower transmission latency and power consumption over the conventional wireless NoC. Therefore, this paper studies how to optimize the network performance and power consumption by giving the packet-switching fabric and traffic pattern of each application. Compared with the conventional method of wire-linked, which adds wireless transceivers by using the genetic algorithm (GA), the proposed maximal declining sorting algorithm (MDSA) can effectively reduce time consumption by as much as 20.4% to 35.6%. We also evaluate the power consumption and configuration time to prove the effective of the proposed algorithm.展开更多
基金This work is supported by the National Natural Science Foundation of China under Grant Nos. 60873112 and 61028004, the National Natural Science Foundation of USA under Grant No. CNS-1126688.
文摘3-D Networks-on-Chip (NoC) emerge as a potent solution to address both the interconnection and design complexity problems facing future Multiprocessor System-on-Chips (MPSoCs). Effective run-time mapping on such 3-D NoC-based MPSoCs can be quite challenging, as the arrival order and task graphs of the target applications are typically not known a priori, which can be further complicated by stringent energy requirements for NoC systems. This paper thus presents an energy-aware run-time incremental mapping algorithm (ERIM) for 3-D NoC which can minimize the energy consumption due to the data communications among processor cores, while reducing the fragmentation effect on the incoming applications to be mapped, and simultaneously satisfying the thermal constraints imposed on each incoming application. Specifically, incoming applications are mapped to cuboid tile regions for lower energy consumption of communication and the minimal routing. Fragment tiles due to system fragmentation can be gleaned for better resource utilization. Extensive experiments have been conducted to evaluate the performance of the proposed algorithm ERIM, and the results are compared against the optimal mapping algorithm (branch-and-bound) and two heuristic algorithms (TB and TL). The experiments show that ERIM outperforms TB and TL methods with significant energy saving (more than 10%), much reduced average response time, and improved system utilization.
基金supported by the National Natural Science Foundation of China under Grant No.60425413
文摘Networks-on-chip (NoC), a new system on chip (SoC) paradigm, has become a great focus of research by many groups during the last few years. Among all the NoC architectures that have been proposed until now, 2D-Mesh has proved to be the best architecture for implementation due to its regular and simple interconnection structure. In this paper, we propose a new interconnect architecture called 2D-diagonal mesh (2DDgl-Mesh) for on-chip communication. The 2DDglMesh is almost similar to traditional 2D-Mesh in aspects of cost, area, and implementation, but it can outperform the later in delay. The both architectures are compared by using NS-2 (a network simulator) and CINS1M (a component based interconnection simulator) under the same traffic models and parametric conditions. The results of comparison show that under the proposed architecture, the packets can almost always be routed to their destinations in less time. In addition, our archi- tecture can sometimes perform better than 2D-Mesh in drop ratio for special fixed traffic models.
基金Supported by the National Natural Science Foundation of China (Grant No. 60803018)
文摘Networks-on-chip (NoC) is a promising communication architecture for next generation SoC. The size of buffer used in on-chip routers impacts the silicon area and power consumption of NoC dominantly. It is important to plan the total buffer-size and each router buffer-allocation carefully for an efficient NoC design. In this paper, we propose two buffer planning algorithms for application-specific NoC design. More precisely, given the traffic parameters and performance constraints of target application, the proposed algorithms automatically determine minimal buffer budget and assign the buffer depth for each input channel in different routers. The experimental results show that the proposed algorithms can significantly reduce total buffer usage and guarantee the performance requirements.
基金supported by the National Natural Science Foundation of China under Grant No.61376024 and No.61306024Natural Science Foundation of Guangdong Province under Grant No.S2013040014366Basic Research Program of Shenzhen under Grant No.JCYJ20140417113430642 and No.JCYJ 20140901003939020
文摘The quest for energy efficiency has growing importance in high performance many-core systems. However, in current practices, the power slacks, which are the differences observed between the input power budget and the actual power consumed in the many-core systems, are typically ignored, thus leading to poor energy efficiency. In this paper, we propose a scheme to effectively power the on-chip communications by exploiting the available power slack that is totally wasted in current many-core systems. As so, the demand for extra energy from external power sources (e.g., batteries) is minimized, which helps improve the overall energy efficiency. In essence, the power slack is stored at each node and the proposed routing algorithm uses a dynamic programming network to find the globally optimal path, along which the total energy stored on the nodes is the maximum. Experimental results have confirmed that the proposed scheme, with low hardware overhead, can reduce latency and extra energy consumption by 44% and 48%, respectively, compared with the two competing routing methods.
基金supported by the National Natural Science Foundation of China under Grant No.61376024 and No.61306024the Natural Science Foundation of Guangdong Province under Grant No.S2013040014366Basic Research Program of Shenzhen No.JCYJ20140417113430642 and JCYJ20140901003939020
文摘The networks-on-chip (NoC) communication has an increasingly larger impact on the system power consumption and performance. Emerging technologies, like surface wave, are believed to have lower transmission latency and power consumption over the conventional wireless NoC. Therefore, this paper studies how to optimize the network performance and power consumption by giving the packet-switching fabric and traffic pattern of each application. Compared with the conventional method of wire-linked, which adds wireless transceivers by using the genetic algorithm (GA), the proposed maximal declining sorting algorithm (MDSA) can effectively reduce time consumption by as much as 20.4% to 35.6%. We also evaluate the power consumption and configuration time to prove the effective of the proposed algorithm.