Energy-proportional computing is one of the foremost constraints in the design of next generation exascale systems. These systems must have a very high FLOP-per-watt ratio to be sustainable, which requires tremendous ...Energy-proportional computing is one of the foremost constraints in the design of next generation exascale systems. These systems must have a very high FLOP-per-watt ratio to be sustainable, which requires tremendous improvements in power efficiency for modern computing systems. This paper focuses on the processor—as still the biggest contributor to the power usage—by considering both its core and uncore power subsystems. The uncore describes those processor functions that are not handled by the core, such as L3 cache and on-chip interconnect, and contributes significantly to the total system power. The uncore frequency scaling (UFS) capability has been available to the user since the Intel Haswell processor generation. In this paper, performance and power models are proposed to use both the UFS and dynamic voltage and frequency scaling (DVFS) to reduce the energy consumption in parallel applications. Then, these models are incorporated into a runtime strategy that performs processor frequency scaling during parallel application execution. The strategy can be implemented at the kernel/firmware level, which makes it suitable for improving the energy efficiency of exascale design. Experiments on a 20-core Haswell-EP machine using the quantum chemistry application GAMESS and NAS benchmark resulted in up to 24% energy savings with as little as 2% performance loss.展开更多
Energy efficiency and energy-proportional computing have become a central focus in modern supercomputers. These supercomputers should provide high throughput per unit of power to be sustainable in terms of operating c...Energy efficiency and energy-proportional computing have become a central focus in modern supercomputers. These supercomputers should provide high throughput per unit of power to be sustainable in terms of operating cost and failure rates. In this paper, a power-bounded strategy is proposed that maximizes parallel application performance under a given power constraint. The strategy dynamically allocates power to core, uncore, and memory power domains within a node to maximize performance under a given power budget. Experiments on a 20-core Haswell-EP platform for a real-world parallel application GAMESS demonstrate that the proposed strategy delivers performance within 4% of the best possible performance for as much as 25% reduction in the minimum power budget required for maximum performance.展开更多
To improve the power consumption of parallel applications at the runtime, modern processors provide frequency scaling and power limiting capabilities. In this work, a runtime strategy is proposed to distribute a given...To improve the power consumption of parallel applications at the runtime, modern processors provide frequency scaling and power limiting capabilities. In this work, a runtime strategy is proposed to distribute a given power allocation among the cluster nodes assigned to the application while balancing their performance change. The strategy operates in a timeslice-based manner to estimate the current application performance and power usage per node followed by power redistribution across the nodes. Experiments, performed on four nodes (112 cores) of a modern computing platform interconnected with Infiniband showed that even a significant power budget reduction of 20% may result in a performance degradation of as low as 1% under the proposed strategy compared with the execution in the unlimited power case.展开更多
为了解决车辆行驶中面对各种复杂环境车道线检测算法精度不高的问题,提出一种基于改进的UFS网络检测算法(Ultra Fast Structure-aware Deep Lane Detection,UFS).首先,采用改进的Gamma校正对待检路面图像进行校正,降低光照、阴影等的影...为了解决车辆行驶中面对各种复杂环境车道线检测算法精度不高的问题,提出一种基于改进的UFS网络检测算法(Ultra Fast Structure-aware Deep Lane Detection,UFS).首先,采用改进的Gamma校正对待检路面图像进行校正,降低光照、阴影等的影响,以提升夜间图像纹理特征。然后引入非局部神经网络模块(Non-Local Block),充分提取图像全局特征,以提高检测可靠性。最后对改进后的算法使用Tusimple、CULane数据集进行测试。结果表明:改进后的模型在物体遮挡、光照变化、阴影干扰等复杂场景下,提升了对复杂噪声与多元场景的处理能力,车道分割的准确率有所改善,具有较好的鲁棒性。展开更多
Seasonal freeze–thaw processes have led to severe soil erosion in the middle and high latitudes.The area affected by freeze–thaw erosion in China exceeds 13%of the national territory.So understanding the effect of f...Seasonal freeze–thaw processes have led to severe soil erosion in the middle and high latitudes.The area affected by freeze–thaw erosion in China exceeds 13%of the national territory.So understanding the effect of freeze–thaw on erosion process is of great significance for soil and water conservation as well as for ecological engineering.In this study,we designed simulated rainfall experiments to investigate soil erosion processes under two soil conditions,unfrozen slope(UFS)and frozen slope(FS),and three rainfall intensities of 0.6,0.9 and 1.2 mm/min.The results showed that the initial runoff time of FS occurred much earlier than that of the UFS.Under the same rainfall intensity,the runoff of FS is 1.17–1.26 times that of UFS;and the sediment yield of FS is 6.48–10.49 times that of UFS.With increasing rainfall time,rills were produced on the slope.After the appearance of the rills,the sediment yield on the FS accounts for 74%–86%of the total sediment yield.Rill erosion was the main reason for the increase in soil erosion rate on FS,and the reduction in water percolation resulting from frozen layers was one of the important factors leading to the advancement of rills on slope.A linear relationship existed between the cumulative runoff and the sediment yield of UFS and FS(R2>0.97,P<0.01).The average mean weight diameter(MWD)on the slope erosion particles was as follows:UFS0.9(73.84μm)>FS0.6(72.30μm)>UFS1.2(72.23μm)>substrate(71.23μm)>FS1.2(71.06μm)>FS0.9(70.72μm).During the early stage of the rainfall,the MWD of the FS was relatively large.However,during the middle to late rainfall,the particle composition gradually approached that of the soil substrate.Under different rainfall intensities,the mean soil erodibility(MK)of the FS was 7.22 times that of the UFS.The ratio of the mean regression coefficient C2(MC2)between FS and UFS was roughly correspondent with MK.Therefore,the parameter C2 can be used to evaluate soil erodibility after the appearance of the rills.This article explored the influence展开更多
文摘Energy-proportional computing is one of the foremost constraints in the design of next generation exascale systems. These systems must have a very high FLOP-per-watt ratio to be sustainable, which requires tremendous improvements in power efficiency for modern computing systems. This paper focuses on the processor—as still the biggest contributor to the power usage—by considering both its core and uncore power subsystems. The uncore describes those processor functions that are not handled by the core, such as L3 cache and on-chip interconnect, and contributes significantly to the total system power. The uncore frequency scaling (UFS) capability has been available to the user since the Intel Haswell processor generation. In this paper, performance and power models are proposed to use both the UFS and dynamic voltage and frequency scaling (DVFS) to reduce the energy consumption in parallel applications. Then, these models are incorporated into a runtime strategy that performs processor frequency scaling during parallel application execution. The strategy can be implemented at the kernel/firmware level, which makes it suitable for improving the energy efficiency of exascale design. Experiments on a 20-core Haswell-EP machine using the quantum chemistry application GAMESS and NAS benchmark resulted in up to 24% energy savings with as little as 2% performance loss.
文摘Energy efficiency and energy-proportional computing have become a central focus in modern supercomputers. These supercomputers should provide high throughput per unit of power to be sustainable in terms of operating cost and failure rates. In this paper, a power-bounded strategy is proposed that maximizes parallel application performance under a given power constraint. The strategy dynamically allocates power to core, uncore, and memory power domains within a node to maximize performance under a given power budget. Experiments on a 20-core Haswell-EP platform for a real-world parallel application GAMESS demonstrate that the proposed strategy delivers performance within 4% of the best possible performance for as much as 25% reduction in the minimum power budget required for maximum performance.
文摘To improve the power consumption of parallel applications at the runtime, modern processors provide frequency scaling and power limiting capabilities. In this work, a runtime strategy is proposed to distribute a given power allocation among the cluster nodes assigned to the application while balancing their performance change. The strategy operates in a timeslice-based manner to estimate the current application performance and power usage per node followed by power redistribution across the nodes. Experiments, performed on four nodes (112 cores) of a modern computing platform interconnected with Infiniband showed that even a significant power budget reduction of 20% may result in a performance degradation of as low as 1% under the proposed strategy compared with the execution in the unlimited power case.
文摘为了解决车辆行驶中面对各种复杂环境车道线检测算法精度不高的问题,提出一种基于改进的UFS网络检测算法(Ultra Fast Structure-aware Deep Lane Detection,UFS).首先,采用改进的Gamma校正对待检路面图像进行校正,降低光照、阴影等的影响,以提升夜间图像纹理特征。然后引入非局部神经网络模块(Non-Local Block),充分提取图像全局特征,以提高检测可靠性。最后对改进后的算法使用Tusimple、CULane数据集进行测试。结果表明:改进后的模型在物体遮挡、光照变化、阴影干扰等复杂场景下,提升了对复杂噪声与多元场景的处理能力,车道分割的准确率有所改善,具有较好的鲁棒性。
基金the National Key Research and Development Program of China(2017YFC0403605)the National Natural Science Foundation of China(413517033)+1 种基金the State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin,China Institute of Water Resources and Hydropower Research(SKL2018CG04)the Shaanxi Province Innovation Talent Promotion Plan Project Technology Innovation Team(2018TD-037)。
文摘Seasonal freeze–thaw processes have led to severe soil erosion in the middle and high latitudes.The area affected by freeze–thaw erosion in China exceeds 13%of the national territory.So understanding the effect of freeze–thaw on erosion process is of great significance for soil and water conservation as well as for ecological engineering.In this study,we designed simulated rainfall experiments to investigate soil erosion processes under two soil conditions,unfrozen slope(UFS)and frozen slope(FS),and three rainfall intensities of 0.6,0.9 and 1.2 mm/min.The results showed that the initial runoff time of FS occurred much earlier than that of the UFS.Under the same rainfall intensity,the runoff of FS is 1.17–1.26 times that of UFS;and the sediment yield of FS is 6.48–10.49 times that of UFS.With increasing rainfall time,rills were produced on the slope.After the appearance of the rills,the sediment yield on the FS accounts for 74%–86%of the total sediment yield.Rill erosion was the main reason for the increase in soil erosion rate on FS,and the reduction in water percolation resulting from frozen layers was one of the important factors leading to the advancement of rills on slope.A linear relationship existed between the cumulative runoff and the sediment yield of UFS and FS(R2>0.97,P<0.01).The average mean weight diameter(MWD)on the slope erosion particles was as follows:UFS0.9(73.84μm)>FS0.6(72.30μm)>UFS1.2(72.23μm)>substrate(71.23μm)>FS1.2(71.06μm)>FS0.9(70.72μm).During the early stage of the rainfall,the MWD of the FS was relatively large.However,during the middle to late rainfall,the particle composition gradually approached that of the soil substrate.Under different rainfall intensities,the mean soil erodibility(MK)of the FS was 7.22 times that of the UFS.The ratio of the mean regression coefficient C2(MC2)between FS and UFS was roughly correspondent with MK.Therefore,the parameter C2 can be used to evaluate soil erodibility after the appearance of the rills.This article explored the influence