Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking d...Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking data access delay. However, current client-initiated prefetching strategies, where a computing processor initiates prefetching instructions, have many limitations. They do not work well for applications with complex, non-contiguous data access patterns. While technology advances continue to increase the gap between computing and data access performance, trading computing power for reducing data access delay has become a natural choice. In this paper, we present a serverbased data-push approach and discuss its associated implementation mechanisms. In the server-push architecture, a dedicated server called Data Push Server (DPS) initiates and proactively pushes data closer to the client in time. Issues, such as what data to fetch, when to fetch, and how to push are studied. The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching. Simulation results show that L1 Cache miss rate can be reduced by up to 97% (71% on average) over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates.展开更多
Performance variability,stemming from nondeterministic hardware and software behaviors or deterministic behaviors such as measurement bias,is a well-known phenomenon of computer systems which increases the difficulty ...Performance variability,stemming from nondeterministic hardware and software behaviors or deterministic behaviors such as measurement bias,is a well-known phenomenon of computer systems which increases the difficulty of comparing computer performance metrics and is slated to become even more of a concern as interest in Big Data analytic increases.Conventional methods use various measures(such as geometric mean)to quantify the performance of different benchmarks to compare computers without considering this variability which may lead to wrong conclusions.In this paper,we propose three resampling methods for performance evaluation and comparison:a randomization test for a general performance comparison between two computers,bootstrapping confidence estimation,and an empirical distribution and five-number-summary for performance evaluation.The results show that for both PARSEC and highvariance BigDataBench benchmarks 1)the randomization test substantially improves our chance to identify the difference between performance comparisons when the difference is not large;2)bootstrapping confidence estimation provides an accurate confidence interval for the performance comparison measure(e.g.,ratio of geometric means);and 3)when the difference is very small,a single test is often not enough to reveal the nature of the computer performance due to the variability of computer systems.We further propose using empirical distribution to evaluate computer performance and a five-number-summary to summarize computer performance.We use published SPEC 2006 results to investigate the sources of performance variation by predicting performance and relative variation for 8,236 machines.We achieve a correlation of predicted performances of 0.992 and a correlation of predicted and measured relative variation of 0.5.Finally,we propose the utilization of a novel biplotting technique to visualize the effectiveness of benchmarks and cluster machines by behavior.We illustrate the results and conclusion through detailed Monte Carlo simulation st展开更多
以韶关发电厂300MW机组的分散控制系统为背景,以解决主控和副控多功能处理器(multiple function processor,MFP)模件中的1个或2个同时离线问题为目的,通过分析主控和副控MFP模件之间的通信及预制电缆,找到了造成MFP模件故障的原因。在论...以韶关发电厂300MW机组的分散控制系统为背景,以解决主控和副控多功能处理器(multiple function processor,MFP)模件中的1个或2个同时离线问题为目的,通过分析主控和副控MFP模件之间的通信及预制电缆,找到了造成MFP模件故障的原因。在论证BRC-300控制器与MFP模件能够兼容工作的可行性之后,将29对MFP模件中的10对故障MFP模件升级为BRC-300控制器,升级控制系统组态软件,核查BRC-300控制器组态的完整性,同时将网络处理模件的子板芯片进行升级,避免BRC-300控制器与其他MFP模件之间的数据传输中断,利用控制通道总线实现了BRC-300控制器与MFP模件之间的良好通信。通过实际运行证明了设计方案的正确性,实现了BRC-300控制器与MFP模件在INFI90系统中的兼容性,彻底解决了互为冗余的MFP模件离线问题,保证了机组安全运行。展开更多
基金This research was supported in part by the National Science Foundation of U.S.A.under NSF Grant Nos. EIA-0224377,CNS-0406328,CNS-0509118,and CCF-0621435.
文摘Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking data access delay. However, current client-initiated prefetching strategies, where a computing processor initiates prefetching instructions, have many limitations. They do not work well for applications with complex, non-contiguous data access patterns. While technology advances continue to increase the gap between computing and data access performance, trading computing power for reducing data access delay has become a natural choice. In this paper, we present a serverbased data-push approach and discuss its associated implementation mechanisms. In the server-push architecture, a dedicated server called Data Push Server (DPS) initiates and proactively pushes data closer to the client in time. Issues, such as what data to fetch, when to fetch, and how to push are studied. The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching. Simulation results show that L1 Cache miss rate can be reduced by up to 97% (71% on average) over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates.
基金This work was supported in part by the National High Technology Research and Development Program of China(2015AA015303)the National Natural Science Foundation of China(Grant No.61672160)+2 种基金Shanghai Science and Technology Development Funds(17511102200)National Science Foundation(NSF)(CCF-1017961,CCF-1422408,and CNS-1527318)We acknowledge the computing resources provided by the Louisiana Optical Network Initiative(LONI)HPC team.Finally,we appreciate invaluable comments from anonymous reviewers.
文摘Performance variability,stemming from nondeterministic hardware and software behaviors or deterministic behaviors such as measurement bias,is a well-known phenomenon of computer systems which increases the difficulty of comparing computer performance metrics and is slated to become even more of a concern as interest in Big Data analytic increases.Conventional methods use various measures(such as geometric mean)to quantify the performance of different benchmarks to compare computers without considering this variability which may lead to wrong conclusions.In this paper,we propose three resampling methods for performance evaluation and comparison:a randomization test for a general performance comparison between two computers,bootstrapping confidence estimation,and an empirical distribution and five-number-summary for performance evaluation.The results show that for both PARSEC and highvariance BigDataBench benchmarks 1)the randomization test substantially improves our chance to identify the difference between performance comparisons when the difference is not large;2)bootstrapping confidence estimation provides an accurate confidence interval for the performance comparison measure(e.g.,ratio of geometric means);and 3)when the difference is very small,a single test is often not enough to reveal the nature of the computer performance due to the variability of computer systems.We further propose using empirical distribution to evaluate computer performance and a five-number-summary to summarize computer performance.We use published SPEC 2006 results to investigate the sources of performance variation by predicting performance and relative variation for 8,236 machines.We achieve a correlation of predicted performances of 0.992 and a correlation of predicted and measured relative variation of 0.5.Finally,we propose the utilization of a novel biplotting technique to visualize the effectiveness of benchmarks and cluster machines by behavior.We illustrate the results and conclusion through detailed Monte Carlo simulation st
文摘以韶关发电厂300MW机组的分散控制系统为背景,以解决主控和副控多功能处理器(multiple function processor,MFP)模件中的1个或2个同时离线问题为目的,通过分析主控和副控MFP模件之间的通信及预制电缆,找到了造成MFP模件故障的原因。在论证BRC-300控制器与MFP模件能够兼容工作的可行性之后,将29对MFP模件中的10对故障MFP模件升级为BRC-300控制器,升级控制系统组态软件,核查BRC-300控制器组态的完整性,同时将网络处理模件的子板芯片进行升级,避免BRC-300控制器与其他MFP模件之间的数据传输中断,利用控制通道总线实现了BRC-300控制器与MFP模件之间的良好通信。通过实际运行证明了设计方案的正确性,实现了BRC-300控制器与MFP模件在INFI90系统中的兼容性,彻底解决了互为冗余的MFP模件离线问题,保证了机组安全运行。