System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism 被引量：2

System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism

导出

摘要 Pipeline parallelism is a popular parallel programming pattern for emerging applications. However, program- ming pipelines directly on conventional multithreaded shared memory is difficult and error-prone. We present DStream, a C library that provides high-level abstractions of deterministic threads and streams for simply representing pipeline stage work- ers and their communications. The deterministic stream is established atop our proposed single-producer/multi-consumer （SPMC） virtual memory, which integrates synchronization with the virtual memory model to enforce determinism on shared memory accesses. We investigate various strategies on how to efficiently implement DStream atop the SPMC memory, so that an infinite sequence of data items can be asynchronously published （fixed） and asynchronously consumed in order among adjacent stage workers. We have successfully transformed two representative pipeline applications ferret and dedup using DStream, and conclude conversion rules. An empirical evaluation shows that the converted ferret performed on par with its Pthreads and TBB counterparts in term of running time, while the converted dedup is close to 2.56X, 7.05X faster than the Pthreads counterpart and 1.06X, 3.9X faster than the TBB counterpart on 16 and 32 CPUs, respectively. Pipeline parallelism is a popular parallel programming pattern for emerging applications. However, program- ming pipelines directly on conventional multithreaded shared memory is difficult and error-prone. We present DStream, a C library that provides high-level abstractions of deterministic threads and streams for simply representing pipeline stage work- ers and their communications. The deterministic stream is established atop our proposed single-producer/multi-consumer （SPMC） virtual memory, which integrates synchronization with the virtual memory model to enforce determinism on shared memory accesses. We investigate various strategies on how to efficiently implement DStream atop the SPMC memory, so that an infinite sequence of data items can be asynchronously published （fixed） and asynchronously consumed in order among adjacent stage workers. We have successfully transformed two representative pipeline applications ferret and dedup using DStream, and conclude conversion rules. An empirical evaluation shows that the converted ferret performed on par with its Pthreads and TBB counterparts in term of running time, while the converted dedup is close to 2.56X, 7.05X faster than the Pthreads counterpart and 1.06X, 3.9X faster than the TBB counterpart on 16 and 32 CPUs, respectively.

作者张昱李兆鹏曹慧芳

机构地区 School of Computer Science and Technology

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第1期57-73,共17页 计算机科学技术学报（英文版）

基金 This work was supported in part by the National High Technology Research and Development 863 Program of China under Grant No. 2012AA010901, the National Natural Science Foundation of China under Grant No. 61229201, and the China Postdoctoral Science Foundation under Grant No. 2012M521250.

关键词 deterministic parallelism pipeline parallelism single-producer/multi-consumer virtual memory deterministic parallelism, pipeline parallelism, single-producer/multi-consumer, virtual memory

分类号 TP301.6 [自动化与计算机技术—计算机系统结构] TE988.2 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

参考文献19

1McCool M, Reinders J, Robison A D. Structured Parallel Programming: Patterns for Efficient Computation. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2012. 被引量：1
2Artho C, Havelund K, Biere A. High-level data races. In Proc. the 1st International Workshop on Verification and Validation of Enterprise Information Systems, April 2003, pp.82-93. 被引量：1
3Lee E. The problem with threads. Computer, 2006, 39(5): 33-42. 被引量：1
4Lu S, Park S, Seo E, Zhou Y. Learning from mistakes-- A comprehensive study on real world concurrency bug characteristics. In Proc. the 13th International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS), March 2008, pp.329-339. 被引量：1
5Zhang Y, Ford B. A virtual memory foundation for scalable deterministic parallelism. In Proc. the 2nd APSys, July 2011, pp.7:1-7:5. 被引量：1
6Zhang Y, Ford B. Lazy tree mapping: Generalizing and scaling deterministic parallelism. In Proc. the 4th AsiaPacific Workshop on Systems (APSys), July 2013, pp.20:1- 20:7. 被引量：1
7Bienia C, Kumar S, Singh J P et al. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. the 17th PACT, October 2008, pp.72-81. 被引量：1
8Reed E C, Chen N, Johnson R E. Expressing pipeline parallelism using TBB constructs: A case study on what works and what doesn't. In Proc. SPLASH, October 2011, pp.133- 138. 被引量：1
9Liu T, Curtsinger C, Berger E. Dthreads: Efficient deterministic multithreading. In Proc. the 23rd SOSP, Oct. 2011, pp.327-336. 被引量：1
10Aviram A, Weng S C, Hu S, Ford B. Efficient systemenforced deterministic parallelism. In Proc. the 9th OSDI, October 2010, pp.193-206. 被引量：1

同被引文献3

1马超,尹杰,江凌波,甄凯.基于长并行距离优先的确定性多线程调度[J].小型微型计算机系统,2012,33(10):2177-2181. 被引量：4
2周旭,卢凯,陈沉.确定性并行技术[J].计算机学报,2015,38(5):973-986. 被引量：2
3曹慧芳,张昱.确定性多线程编程模型的可编程性及其实现性能的探索[J].小型微型计算机系统,2016,37(6):1126-1131. 被引量：2

引证文献2

1曹慧芳,张昱.确定性多线程编程模型的可编程性及其实现性能的探索[J].小型微型计算机系统,2016,37(6):1126-1131. 被引量：2
2张其良,张昱.并发多播队列的实现框架及其多种实现的性能分析[J].小型微型计算机系统,2017,38(6):1237-1242.

二级引证文献2

1张其良,张昱.并发多播队列的实现框架及其多种实现的性能分析[J].小型微型计算机系统,2017,38(6):1237-1242.
2陈健康,张昱.虚拟内存密集型多线程程序的性能改进方法[J].小型微型计算机系统,2018,39(5):924-929.

1张习民,贾克斌,卓东风.BP神经网络在图像边缘检测中的应用[J].计算机工程与设计,2011,32(6):2146-2149. 被引量：11
2叶长青.EGA／VGA图形显示卡编程技巧[J].武汉食品工业学院学报,1994(3):59-65.
3莫礼平,张兆海.VB中基于ADO的数据库访问方法[J].电脑开发与应用,2004,17(6):33-34. 被引量：11
4江龙,孙国喜.直接编程DSP,后台播放MIDI文件[J].电脑编程技巧与维护,1997(9):25-29.
5徐立萍,张健.Windows2000平台下精确定时研究[J].微型电脑应用,2005,21(4):46-48. 被引量：4
6极简风格的微控制器[J].实用影音技术,2005(12):17-17.
7刘亚伟,李传伦,张闻宇.Windows95/98下调制解调器的直接编程[J].东北电力学院学报,2001,21(4):81-85. 被引量：1
8陈铭,陈俊.基于单片机AT89C51的数据采集系统设计[J].中国水运（下半月）,2008,8(10):132-133.
9曹慧芳,张昱.确定性多线程编程模型的可编程性及其实现性能的探索[J].小型微型计算机系统,2016,37(6):1126-1131. 被引量：2
10邵林,陈辉.高速异步串行通讯接口程序设计[J].连云港职业技术学院学报,2000,13(4):4-6.

Journal of Computer Science & Technology

2015年第1期

浏览历史

内容加载中请稍等...

System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism 被引量：2

参考文献19

同被引文献3

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史