Many garbage collection algorithms have been proposed, but few address the special needs of long-running server programs. Server applications usually run for years and spawn many threads, so they create and discard ...Many garbage collection algorithms have been proposed, but few address the special needs of long-running server programs. Server applications usually run for years and spawn many threads, so they create and discard thousands of objects. Therefore, efficient garbage collection is especially important for those applications. This paper presents a class-based garbage collector for object-oriented programming environments that classifies objects by their types to achieve better gradualness. Grouping objects of the same type into a group, with a limited type-lock, a mutator cache and the lease protocol will reduce memory fragmentation, which is especially important for the efficiency of long-running server applications. This class-based collector partitions the heap space by type, which provides better concurrency than the traditional mark-sweep collector, and its reusable garbaged object pool also reduces the object allocation overhead. This paper also discusses the implementation details, such as the mutator cache and the lease protocol, and techniques to achieve better accuracy.展开更多
Token protocol provides a new coherence framework for shared-memory multiprocessor systems. It avoids indirections of directory protocols for common cache-to-cache transfer misses, and achieves higher interconnect ban...Token protocol provides a new coherence framework for shared-memory multiprocessor systems. It avoids indirections of directory protocols for common cache-to-cache transfer misses, and achieves higher interconnect bandwidth and lower interconnect latency compared with snooping protocols. However, the broadcasting increases network traffic, limiting the scalability of token protocol. This paper describes an efficient technique to reduce the token protocol network traffic, called sharing relation cache. This cache provides destination set information for cache-to-cache miss requests by caching directory information for recent shared data. This paper introduces how to implement the technique in a token protocol. Simulations using SPLASH-2 benchmarks show that in a 16-core chip multiprocessor system, the cache reduced the network traffic by 15% on average.展开更多
在CC-NUMA架构系统中,为了减少缓存一致性维护的开销,大规模CC-NUMA系统通常采用多级缓存一致性域设计,降低平均一致性维护操作数量,从而有效缓解系统性能扩展与一致性维护开销的矛盾.传统的MESI,MESIF,MOESI协议主要是针对单级一致性...在CC-NUMA架构系统中,为了减少缓存一致性维护的开销,大规模CC-NUMA系统通常采用多级缓存一致性域设计,降低平均一致性维护操作数量,从而有效缓解系统性能扩展与一致性维护开销的矛盾.传统的MESI,MESIF,MOESI协议主要是针对单级一致性域优化设计,并且没有考虑到大型数据库应用中查询(数据读访问)业务量占据主导地位的特点,故该类一致性协议在多级缓存一致性域场景下存在着跨域操作频度高、执行效率低等缺点.针对上述问题,提出了一种基于共享转发态的多级缓存一致性协议MESI-SF.该协议创建了一个共享转发态Share-F,允许多个一致性域内同时存在远端数据副本的可读可转发状态,从而能够为同一域内同地址的读请求直接提供共享数据,有效减少了跨域操作,提升系统性能.SPLASH-2程序集模拟结果表明,对于两级Cache一致性域系统,相比MESI协议,MESI-SF能够减少23.0%跨结点访问次数,指令平均执行周期数(cycles per instruction,CPI)降低7.5%;相比MESIF协议,MESI-SF能够减少12.2%跨结点访问次数,指令平均执行周期数降低5.95%.展开更多
This paper analyzes cache coherency mechanism from the view of system. It firstly discusses caehe-memory hierarchy of Pentium Ⅲ SMP system, including memory area distribution, cache attributes control and bus transac...This paper analyzes cache coherency mechanism from the view of system. It firstly discusses caehe-memory hierarchy of Pentium Ⅲ SMP system, including memory area distribution, cache attributes control and bus transaction. Secondly it analyzes hardware snoopy mechanism of P6 bus and MESI state transitions adopted by Pentium Ⅲ. Based on these, it focuses on how muhiprocessors and the P6 bus cooperate to ensure cache coherency of the whole system, and gives the key of cache coherency design.展开更多
基金the National High- Tech Research andDevelopm ent Program of China(No. 30 6 - 0 1- 0 3- 11- 9)
文摘Many garbage collection algorithms have been proposed, but few address the special needs of long-running server programs. Server applications usually run for years and spawn many threads, so they create and discard thousands of objects. Therefore, efficient garbage collection is especially important for those applications. This paper presents a class-based garbage collector for object-oriented programming environments that classifies objects by their types to achieve better gradualness. Grouping objects of the same type into a group, with a limited type-lock, a mutator cache and the lease protocol will reduce memory fragmentation, which is especially important for the efficiency of long-running server applications. This class-based collector partitions the heap space by type, which provides better concurrency than the traditional mark-sweep collector, and its reusable garbaged object pool also reduces the object allocation overhead. This paper also discusses the implementation details, such as the mutator cache and the lease protocol, and techniques to achieve better accuracy.
基金Supported by the National Natural Science Foundation of China (No. 60673145)the Basic Research Foundation of Tsinghua Na-tional Laboratory for Information Science and Technology (TNList)+1 种基金the Intel/University Sponsored Research, the National Key Basic Research and Development (973) Program of China (No. 2006CB303100)and the IBM China Research Laboratory
文摘Token protocol provides a new coherence framework for shared-memory multiprocessor systems. It avoids indirections of directory protocols for common cache-to-cache transfer misses, and achieves higher interconnect bandwidth and lower interconnect latency compared with snooping protocols. However, the broadcasting increases network traffic, limiting the scalability of token protocol. This paper describes an efficient technique to reduce the token protocol network traffic, called sharing relation cache. This cache provides destination set information for cache-to-cache miss requests by caching directory information for recent shared data. This paper introduces how to implement the technique in a token protocol. Simulations using SPLASH-2 benchmarks show that in a 16-core chip multiprocessor system, the cache reduced the network traffic by 15% on average.
文摘在CC-NUMA架构系统中,为了减少缓存一致性维护的开销,大规模CC-NUMA系统通常采用多级缓存一致性域设计,降低平均一致性维护操作数量,从而有效缓解系统性能扩展与一致性维护开销的矛盾.传统的MESI,MESIF,MOESI协议主要是针对单级一致性域优化设计,并且没有考虑到大型数据库应用中查询(数据读访问)业务量占据主导地位的特点,故该类一致性协议在多级缓存一致性域场景下存在着跨域操作频度高、执行效率低等缺点.针对上述问题,提出了一种基于共享转发态的多级缓存一致性协议MESI-SF.该协议创建了一个共享转发态Share-F,允许多个一致性域内同时存在远端数据副本的可读可转发状态,从而能够为同一域内同地址的读请求直接提供共享数据,有效减少了跨域操作,提升系统性能.SPLASH-2程序集模拟结果表明,对于两级Cache一致性域系统,相比MESI协议,MESI-SF能够减少23.0%跨结点访问次数,指令平均执行周期数(cycles per instruction,CPI)降低7.5%;相比MESIF协议,MESI-SF能够减少12.2%跨结点访问次数,指令平均执行周期数降低5.95%.
文摘This paper analyzes cache coherency mechanism from the view of system. It firstly discusses caehe-memory hierarchy of Pentium Ⅲ SMP system, including memory area distribution, cache attributes control and bus transaction. Secondly it analyzes hardware snoopy mechanism of P6 bus and MESI state transitions adopted by Pentium Ⅲ. Based on these, it focuses on how muhiprocessors and the P6 bus cooperate to ensure cache coherency of the whole system, and gives the key of cache coherency design.