期刊文献+

基于硬件签名的循环式内存竞争记录算法 被引量:2

A Cyclic Memory Race Recording Algorithm Implemented with Hardware Signatures
下载PDF
导出
摘要 多核程序的执行存在不确定性,内存竞争记录是实现多核程序确定性重演的关键技术.针对现有内存竞争记录机制记录日志较大、重演速度受限等问题,提出了一种新型的循环式点到点内存竞争记录算法.该算法用当前发生序表示内存冲突,用硬件签名实现冲突检测,无需修改原有的cache结构;引入冲突方向检测机制,约减连续同向的当前发生序,记录循环发生序到内存竞争日志.该算法中,内存竞争日志中所记录的任意两线程间的内存竞争呈循环状,大大减少了冗余,并用增量计数器优化循环发生序,更大程度上减小了内存竞争日志.仿真结果表明该算法能够在引入较少硬件资源的前提下有效地减小内存竞争日志.同时,内存竞争日志也具有较好的可扩展性. Shared-memory multithreaded programs running on chip multiprocessors tend to be nondeterministic. Two-phase deterministic record-replay is an effective approach to resolve this problem. Memory race recording is the key technology to replay multithreaded programs deterministically. It is significant to develop an efficient memory race recording scheme with both low log growth rate and rapid replay speed. A cyclic memory race recording algorithm based on point-to- point logging approach, named CyelicMR, is proposed. CyclicMR presents each memory race by using a new current dependency, uses hardware signatures with small size to detect memory races instead of cache memory, reduces the continuous memory races with same direction by a conflict direction detecting mechanism, and records an innovative cyclic dependency which can achieve much more transitivity. In this algorithm, all memory races recorded between two threads are loop-shaped, significantly reducing the redundancy of memory races. At the same time, cyclic dependency is further optimized by an incremental instruction counter, and the size of memory race is reduced a lot. Using an 8-core chip multiprocessor system, an exact comparison with earlier mainstream approaches is performed. The analysis results show that CyclicMR achieves small log growth rate, low hardware overhead and low bandwidth overhead. And it also has good scalability in memory race log.
出处 《计算机研究与发展》 EI CSCD 北大核心 2014年第5期1149-1157,共9页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61173024) 国家“九七三”重点基础研究发展计划基金项目(2011CB302501)
关键词 片上多核处理器 多核程序 确定性重演 内存竞争记录 冲突检测 硬件签名 chip multiprocessor multi-core program deterministic replay memory race recording conflict detection hardware signature
  • 相关文献

参考文献16

  • 1Pancake C' M,Paula S U. A bibliography of paralleldebuggers [J]. ACM SIGPLAN Notices, 1991,26(1) : 21-37. 被引量:1
  • 2Bhansali S, Chen W, De J S, et al. Framework forinstruction-level tracing and analysis of programs [C]//Procof the 2nd Int Conf on Virtual ExecutionEnvironments(VEE,06). New York:ACM, 2006: 154-163. 被引量:1
  • 3Netzer R H B. Optimal tracing and replay for debuggingshared-memory parallel programs [C]//Proc of the 1993ACM/ONR Workshop on Parallel and DistributedDebugging(PADD'93). New York: ACM, 1993: 1-11. 被引量:1
  • 4Srinivasan S, Kandula S, Andrews C. Flashback: Alightweight extension for rollback and deterministic replay forsoftware debugging [C]//Procof theAnnual Conf onUSH!NIX Annual Technical Conference( ATEC04 ).Berkeley : USENIX Association, 2004 : 3. 被引量:1
  • 5Dunlap G,Lucchetti D,Fetterman M, et al. Executionreplay of multiprocessor virtual machines [C]//Proc of the4thACM SIGPLAN/SIGOPS Int Confon Virtual ExecutionEnvironments C VEE'08). New York: ACM, 2008; 121-130. 被引量:1
  • 6Xu M, Bodik R, Hill M D. A “flight data recorder” forenabling full-system multiprocessor deterministic replay [C〕//Proc of the 30th Annual int Symp on ComputerArchitecture (ISCA'03). New York: ACM, 2003 : 122-135. 被引量:1
  • 7Prvulovic M. CORD: Cost-effective (and nearly overhead-free) order recording and data race detection [C]//Proc ofthe 12th Int Symp on High-PerformanceComputerArchitecture (HPCA'06). New York: ACM,2006; 232-243. 被引量:1
  • 8Xu M, Bodik R, Hill M D. A regulated transitive reduction(RTR) for longer memory race recording [C]//Proc of the12th Int Conf on ArchitecturalSupportfor ProgrammingLanguages andOperating Systems( ASPLOS,06 ). NewYork: ACM, 2006: 49-60. 被引量:1
  • 9朱素霞,季振洲,刘涛,王庆,张浩.面向多核程序确定性重演的内存竞争记录机制研究[J].电子学报,2011,39(12):2748-2754. 被引量:3
  • 10Narayanasamy S,Pereira C,Calder B. Recording sharedmemory dependencies using strata [C]//Proc of the 12th IntConf on Architectural Supportfor Programming Languagesand OperatingSystems (ASPLOS,06). New York: ACM,2006: 229-240. 被引量:1

二级参考文献15

  • 1C M Pancake, R Netzer.A bibliography of parallel debuggers, 1993 edition[ A] .Proceedings of the ACM/ONR Workshop on Parallel and Distdbuted Debugging (PADD) [ C ]. New York, USA: ACM, 1993.169 - 186. 被引量:1
  • 2T J Leblanc, J M Mellor-Crummey. Debugging paraUel pro- grams with instant replay [ J ]. IEEE Transactions on Comput- ers, 1987, C-36(4) :471 - 482. 被引量:1
  • 3L Lewouw, K Audenaert. Minimizing the log size for execution replay of shared-memory programs[ A]. Thaird Joint Internation- al Conference on Vector and Parallel Processing[ C ]. Linz, Austria: Springer-Vedag, 1994.76 - 87. 被引量:1
  • 4D Lucchetti, S K Reinhardt, P M Chen. ExtraVirt:detecting and recovering from transient processor faults[ A ]. 2005 Symp on Operating System Principles Work-in-Progress Session [ C ]. Bdehton. United Kingdom: ACM.2005.1 - 8. 被引量:1
  • 5S Srinivasan, S Kandula, C Andrews, Y Zhou. Flashback: a lightweight extension for rollback and deterministic replay for software debugging [ A ]. Proceedings of the USENIX Annual Technical Conference [ C ]. Boston, Madison, USA: USENIX, 2(104.29 - 44. 被引量:1
  • 6R H B Netzer. Optimal tracing and replay for debugging shared-memory parallel programs[ A] .Proc of the ACM/ONR Workshop on Parallel and Distributed Debugging (PADD) [C]. San Diego, California, USA: ACM, 1993.1 - 11. 被引量:1
  • 7M Xu,R Bodik, M D Hill. A flight data recorder for enabling filll-system multiprocessor deterministic replay[ A]. Proc of the 30th Annual International Symposium on Computer Architecture [ C]. San Diego, CA: ACM, 2003.122 - 133. 被引量:1
  • 8M Prvulovic. CORD: Cost-effective (and nearly overhead-free) order recording and data race detection[ A]. Proc of the 12th IEEE Symp on High-Performance Computer Architecture[ C]. New York, USA: IEEE Computer Society,2006. 232 - 243. 被引量:1
  • 9M Xu, R Bodik, M D Hill. A regulated transitive reduction (RTR) for longer memory race recording[ A].Proc of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems [C ]. San Jose, California, USA: ACM, 2006.49 - 60. 被引量:1
  • 10S Narayanasamy, C Pereira, B Calder. Recording shared mem- ory dependencies using stmta[ A]. Proc of the 12th Interna- tional Conference on Architectural Support for Programming Languages and Op-erating Systems[ C]. San Jose, California, USA: ACM, 2006.229 - 240. 被引量:1

共引文献2

同被引文献25

  • 1Aciicmez O, Seifert J. Cheap hardware parallelism implies cheap security [C] //Proc of the 4th Workshop on FDTC 2007. Los Alamitos, CA: IEEE Computer Society, 2007.. 80-91. 被引量:1
  • 2Xu M, Bodik R, Hill M D. A "light data reeorder" for enabling full system multiproeessor deterministic replay [C] //Proc of the 30th Int Symp on Computer Architecture (ISCA'03). New York= ACM, 2008:122-135. 被引量:1
  • 3Montesinos P, Hicks M, King S T, et al. Capo: A software- hardware interface for practical deterministic multiprocessor replay [C] //Proe of the 14th Int Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09). New York= ACM, 2009= 23-84. 被引量:1
  • 4Nima H, Josep T. Replay debugging: Leveraging record and replay for program debugging [C]//Proc of the 41st Int Symp on Computer Architecture (1SCA'14). New York= ACM, ZOI4:455-456. 被引量:1
  • 5Xu M, Bodik R, Hill M D. A regulated transitive reduction (RTR) for longer memory race recording [C] //Proc of the 12th Int Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS'06). New York: ACM, 2006:49-60. 被引量:1
  • 6Hower D R, Hill M D. Rerun: Exploiting episodes for lightweight memory race recording [C] //Proc of the 35th Int Syrup on Computer Architecture (ISCA'08). New York: ACM, 2008:265-267. 被引量:1
  • 7Pokam G, Pereira C, Danne K, et al. Architeeting a chunk- based memory race recorder in modern CMPs [C] //Proe of the 42nd Int Syrup on Mieroarchitecture (MICRO'09). New York: ACM, 2009:576-585. 被引量:1
  • 8Arkaprava B, Jayaram B, Hill M D. Karma= Sealable deterministic reeord-rcplay [C] //Proe of the Int Conf on Supercomputing (ICS'll). New York= ACM, 2011= 359- 368. 被引量:1
  • 9Zhu Suxia, Ji Zhenzhou, Liu Tao, et al. CCTR: An efficient point to-point memory race recorder implemented in chunks [J], Microprocessors and Microsystems, 2012, 36(6).. 510- 519. 被引量:1
  • 10Zhu Suxia, Ji Zhenzhou, Wang Qing. An efficient deterministic record-replay with separate dependencies [J]. Computers 8 Electrical Engineering, 2013, 39(2): 175-189. 被引量:1

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部